SQL is one of the most essential skill for any Data scientist . Basic Data Mining , Uncovering the hidden parameters and much more daily routine task for every data scientist are easily possible with basic SQL knowledge . If you are a beginner in Data science and looking for the best way to learn SQL for Data science then it is going to be the best place for you . So enjoy the party here , I mean go through the complete article . I promise it will not take much time .
We have designed the complete learning into various learning steps
We will try to end up only fist and second step in this article . After it complex queries and result projection part will be done in next article .Before I go forward I will recommend you to go through the article Know why and how using SQL is Important in Data science
Make an assumption, you are doing any Data science project in which You are using relational database .Please keep in the mind that there could be two situations-
When you don’ t have the existing database , You need to create new one .
CREATE DATABASE testDB ; //Here testDB is database name
CREATE TABLE test_table (
Col1 Datatype Constraint ,
Col2 Datatype Constraint ,
Col3 Datatype Constraint ,
Col4 Datatype Constraint
);
Here Data types are varchar, int etc .
There are few other important DDL Commands like-
1.ALTER
2. DROP
3.TRUNCATE
When you create a table , There will be an option for you to add an constraint or not . Its completely situation oriented to add a constraint into DDL script . NOT NULL , UNIQUE , PRIMARY KEY , FORIEGN KEY , CHECK , DEFAULT , INDEX are possible constraint in SQL .
After successfully creation of schema . Its time to load data into it .This is one of the most easy task in SQL . There is a straight way command for this –
INSERT INTO table_name ( Column 1 , column 2 ,…..etc) VALUES (Val 1 , Val 2 ,……etc );
In case you want to insert values in every column of the Table , You may use-
INSERT INTO table_name VALUES (Val 1 , Val 2 ,……etc );
Now after insertion , If user / developer need to update any value in the Table , Here is the command for that –
UPDATE Table_name
SET Column_name = Value
Where condition ;
In the series of best way to learn SQL , We have reached at the place where we know how to create Database , Table and how to insert data into the Table . Its time to taste the real water . Basically above part of the post was used to give a preview of SQL prerequisites for Data scientist / Data Analyst .In this Post I will introduce you with most used query in SQL –
We use this command when we want to see the Data from the Table .There is some variation of this command –
SELECT *FROM table_name;
SELECT column_name FROM table_name;
SELECT DISTINCT column_name FROM table_name;
This statement is useful in setting order for result projection . I mean ascending and descending sorting in columns . You have to mention it and end part of select statement . I think syntax will help you understand better –
SELECT * FROM table_name WHERE condition ORDER BY column_name;
Here you can mention more than one column after ORDER BY statement . You can also add ASC or DESC after column name to make sorting more specific .
SELECT * FROM table_name WHERE condition ORDER BY column_name1 ASC ,column_name2 DESC ;
I don’t know why but it is mostly asked SQL queries in interviews. In SQL , There are some aggregate function exist like AVG(Column_name), SUM(Column_name), COUNT(Column_name) etc . Here you can group these result by specific column . Here I will give you a complete syntax for GROUP BY query integrated with HAVING query command .
SELECT AVG(column_name 1), column_name 2
FROM table_name
WHERE condition
GROUP BY column_name 2
HAVING condition
ORDER BY column_name 2;
Suppose there is a table of Reservation in Airlines where passengers are from different countries . Now you want to project no of passenger from every country on a particular journey date . Here you can achieve tour target by using GROUP BY command –
SELECT COUNT (passenge_id), country
FROM reservation
WHERE Journydate=’dd-mm-yyyy’
GROUP BY country;
If you want to make it more specific . Lets add some more complexity on it . If you need to SELECT only those country where no of passenger is greater than 1000 .Here is the query for this –
SELECT COUNT (passenge_id), country
FROM reservation
WHERE Journydate=’dd-mm-yyyy’
GROUP BY country
HAVING COUNT (passenge_id)>1000;
So here I end with essential for SQL in Data science . I have mentioned it as best way to learn SQL for Data science because this article can polish your SQL skills in five minutes . This will give 80% results in 20% efforts . If you need more assistance in understanding the requirement for SQL in Data science , Please go through the article Using SQL for Data Science : Know Why and How ?
Do not forget to make connection with us by just simple subscription in Data science Learner Newsletter . We help you to update with all recent happening in Data science world . You can also write your feed back to us about this article . We love to read the responses from reader side .
Data Science Learner Team.