SQL is one of the most essential skill for any Data scientist . Basic Data Mining , Uncovering the hidden parameters and much more daily routine task for every data scientist are easily possible with basic SQL knowledge . If you are a beginner in Data science and looking for the best way to learn SQL for Data science then it is going to be the best place for you . So enjoy the party here , I mean go through the complete article . I promise it will not take much time .
Best way to learn SQL for Data Scientist-
We have designed the complete learning into various learning steps
- Create the schema ( DDL)
- Insert /update the data into Data bases
- Complex queries and result projection
We will try to end up only fist and second step in this article . After it complex queries and result projection part will be done in next article .Before I go forward I will recommend you to go through the article Know why and how using SQL is Important in Data science
1.Create the schema ( DDL)
Make an assumption, you are doing any Data science project in which You are using relational database .Please keep in the mind that there could be two situations-
- You don’t have any data and you are getting the data from external source . Now you need to dump into some SQL based database .Here you need this Create the schema section .
- In the second situation , You have data in existing database . Only thing you have to do is run some query . In this case this section will no longer in use . You may directly jump into complex queries and result projection section .
When you don’ t have the existing database , You need to create new one .
SQL Query to create Database :
CREATE DATABASE testDB ; //Here testDB is database name
SQL Query to create Table :
CREATE TABLE test_table (
Col1 Datatype Constraint ,
Col2 Datatype Constraint ,
Col3 Datatype Constraint ,
Col4 Datatype Constraint
Here Data types are varchar, int etc .
Other Required DDL command –
There are few other important DDL Commands like-
What are Constraint in SQL –
When you create a table , There will be an option for you to add an constraint or not . Its completely situation oriented to add a constraint into DDL script . NOT NULL , UNIQUE , PRIMARY KEY , FORIEGN KEY , CHECK , DEFAULT , INDEX are possible constraint in SQL .
2.Insert /update the data into Data bases-
After successfully creation of schema . Its time to load data into it .This is one of the most easy task in SQL . There is a straight way command for this –
INSERT INTO table_name ( Column 1 , column 2 ,…..etc) VALUES (Val 1 , Val 2 ,……etc );
In case you want to insert values in every column of the Table , You may use-
INSERT INTO table_name VALUES (Val 1 , Val 2 ,……etc );
Now after insertion , If user / developer need to update any value in the Table , Here is the command for that –
SET Column_name = Value
Where condition ;
3.Complex queries and result projection-
In the series of best way to learn SQL , We have reached at the place where we know how to create Database , Table and how to insert data into the Table . Its time to taste the real water . Basically above part of the post was used to give a preview of SQL prerequisites for Data scientist / Data Analyst .In this Post I will introduce you with most used query in SQL –
We use this command when we want to see the Data from the Table .There is some variation of this command –
For selecting entire table –
SELECT *FROM table_name;
2. For selecting few column from entire table –
SELECT column_name FROM table_name;
3. From Projection of Distinct values –
SELECT DISTINCT column_name FROM table_name;
ORDER BY –
This statement is useful in setting order for result projection . I mean ascending and descending sorting in columns . You have to mention it and end part of select statement . I think syntax will help you understand better –
SELECT * FROM table_name WHERE condition ORDER BY column_name;
Here you can mention more than one column after ORDER BY statement . You can also add ASC or DESC after column name to make sorting more specific .
SELECT * FROM table_name WHERE condition ORDER BY column_name1 ASC ,column_name2 DESC ;
GROUP BY with HAVING clause –
I don’t know why but it is mostly asked SQL queries in interviews. In SQL , There are some aggregate function exist like AVG(Column_name), SUM(Column_name), COUNT(Column_name) etc . Here you can group these result by specific column . Here I will give you a complete syntax for GROUP BY query integrated with HAVING query command .
SELECT AVG(column_name 1), column_name 2
GROUP BY column_name 2
ORDER BY column_name 2;
Suppose there is a table of Reservation in Airlines where passengers are from different countries . Now you want to project no of passenger from every country on a particular journey date . Here you can achieve tour target by using GROUP BY command –
SELECT COUNT (passenge_id), country
GROUP BY country;
If you want to make it more specific . Lets add some more complexity on it . If you need to SELECT only those country where no of passenger is greater than 1000 .Here is the query for this –
SELECT COUNT (passenge_id), country
GROUP BY country
HAVING COUNT (passenge_id)>1000;
So here I end with essential for SQL in Data science . I have mentioned it as best way to learn SQL for Data science because this article can polish your SQL skills in five minutes . This will give 80% results in 20% efforts . If you need more assistance in understanding the requirement for SQL in Data science , Please go through the article Using SQL for Data Science : Know Why and How ?
Do not forget to make connection with us by just simple subscription in Data science Learner Newsletter . We help you to update with all recent happening in Data science world . You can also write your feed back to us about this article . We love to read the responses from reader side .
Data Science Learner Team.