Best Way to Learn SQL for Data Scientist : Part 1

SQL is one of the most essential skill for any Data scientist . Basic Data Mining , Uncovering the hidden parameters  and much more daily routine task for every data scientist are easily possible with basic SQL knowledge . If you are a beginner in Data science and looking for the best way to learn SQL for Data science then it is going to be the best place for you . So enjoy the party here , I mean go through the complete article . I promise it will not take much time .

Best way to learn SQL for Data Scientist-

We have designed the complete learning into various learning steps

  1. Create the schema ( DDL)
  2. Insert /update the data into Data bases
  3. Complex queries and result projection

We will try to end up only fist and second step in this article . After it complex queries and result projection  part will be done in next article .Before I go forward I will recommend you to go through the article Know why and how using SQL is Important in Data science

1.Create the schema ( DDL)

Make an assumption,  you are doing any Data science project in which You are using relational database .Please keep in the mind that there could be two situations-

  1. You don’t have any data and you are getting the data from external source . Now you need to dump into some SQL based database .Here you need this Create the schema section .
  2. In the second situation , You have data in existing database . Only thing you have to do is run some query . In this case this section will no longer in use . You may directly jump into complex queries and result projection section .

When you don’ t have the existing database , You need to create new one .

SQL Query to create Database :

CREATE DATABASE testDB ;               //Here testDB is database name

SQL Query to create Table :

CREATE TABLE test_table (

Col1  Datatype  Constraint ,

Col2  Datatype Constraint ,

Col3  Datatype Constraint ,

Col4  Datatype Constraint

);

Here Data types are varchar, int etc .

Other Required DDL command –

There are few other important DDL Commands like-

1.ALTER

2. DROP

3.TRUNCATE

What  are  Constraint in SQL –

When you create a table , There will be an option for you to add an constraint or not . Its completely situation oriented to add a constraint  into DDL script .  NOT NULL , UNIQUE , PRIMARY KEY , FORIEGN KEY , CHECK , DEFAULT , INDEX  are possible constraint  in SQL .

2.Insert /update the data into Data bases-

After successfully creation of schema . Its time to load  data into it  .This is one of the most easy task in SQL . There is a straight way command for this –

INSERT INTO table_name ( Column 1 , column 2  ,…..etc) VALUES (Val 1 , Val 2 ,……etc );

In case you want to insert values in every column of the Table , You may use-

INSERT INTO table_name VALUES (Val 1 , Val 2 ,……etc );

Now after insertion , If user / developer need to update any value in the Table , Here is the command for that –

UPDATE Table_name

SET Column_name = Value 

Where condition ;

3.Complex queries and result projection-

In the series of  best way to learn SQL , We have reached at the place where we know how to create Database , Table and how to insert data into the Table . Its time to taste the real water . Basically above part of the post was used to give a preview of SQL prerequisites for Data scientist / Data Analyst .In this Post I will introduce you with most used query in SQL –

SELECT –

We use this command when we want to see the Data from the Table .There is some variation of this command –

  1. For selecting entire table –

SELECT *FROM table_name;

   2. For selecting few column from entire table –

SELECT column_name FROM table_name;

     3. From Projection of Distinct values –

SELECT DISTINCT column_name FROM table_name;

ORDER BY –

This statement is useful in setting order for result projection . I mean ascending and descending sorting in columns . You have to mention it and end part of select statement . I think syntax will help you understand better –

SELECT * FROM table_name WHERE  condition ORDER BY column_name;

Here you can mention more than one column after ORDER BY statement . You can also add ASC or DESC after column name to make sorting more specific .

SELECT * FROM table_name WHERE  condition ORDER BY column_name1  ASC ,column_name2  DESC ;

GROUP BY with HAVING clause –

I don’t know why but it is mostly asked SQL queries in interviews. In SQL , There are some aggregate function exist like AVG(Column_name),  SUM(Column_name),  COUNT(Column_name) etc . Here you can group these result by specific column . Here I will give you a complete syntax for GROUP BY query integrated with HAVING query command .

SELECT AVG(column_name 1), column_name 2
FROM table_name
WHERE condition
GROUP BY column_name 2
HAVING condition
ORDER BY column_name 2;

Suppose there is a table of Reservation in Airlines where passengers are from different countries . Now you want to project no of passenger from every country on a particular journey date . Here you can achieve tour target by using GROUP BY command –

SELECT COUNT (passenge_id), country
FROM reservation
WHERE Journydate=’dd-mm-yyyy’
GROUP BY country;

If you want to make it more specific . Lets add some more complexity on it . If you need to SELECT only those country where no of passenger is greater than 1000 .Here is the query for this –

SELECT COUNT (passenge_id), country
FROM reservation
WHERE Journydate=’dd-mm-yyyy’
GROUP BY country

HAVING COUNT (passenge_id)>1000;

END NOTES-

So here I  end with essential for SQL in Data science . I have mentioned  it as best way to learn SQL for Data science because this article can polish your SQL skills in five minutes  . This will give 80% results in 20% efforts . If you need more assistance in understanding the requirement for SQL in Data science , Please go through the article Using SQL for Data Science : Know Why and How ?

Do not forget to make  connection with us by just simple subscription in Data science Learner Newsletter .  We help you to update with all recent happening in Data science world . You can also write your feed back to us about this article . We love to read the responses from reader side .

Data Science Learner Team.