Using SQL for Data Science : Know Why and How ?

Before I start writing article I will justify the importance of SQL in Data science  .The reason for this justification is you will grasp something when you have hunger for this .   In the first Paragraph, I will give you completely logical reason for using SQL in Data science .So lets start thinking  , If you are a Data Scientist , You have to play with data  Right ?  The Data can be in any form. It could be Structured or Unstructured data .Now we will discuss the importance of using SQL for both kind of Data formats ( Structured Data and Unstructured Data ) one by one .

Using SQL for Structured Data –

Do you completely know about structured data  ? Don’t worry if your answer is NO . Structured data mean which has predefined structure for example there is  a registration form in which there are fixed field for input. These field always give you certain data types as an input . In case user input wrong format , It will be filter first from outside using external validation . In this scenario relational Database is best option .Using SQL you can play different games with data . It will make you task easy in Data mining like finding hidden pattern etc . I think , This is enough to convince you for learning and using SQL in Data Science .Lets move towards Unstructured Data –

Using SQL for Unstructured Data –

This is going to be  a turning point for this article .First of all I will give you an short Introduction to  Unstructured Data.Unstructured Data means which does’t have any predefined format  like pdf text , Facebook feed , video stream etc. Suppose you are commenting on your friend’s birthday party pic . There you use simley and some text . In the same pic other friend may use any gif or video file .  When we have to play with such data , Most of the Data scientist prefer NoSQL Databases . Now you must be thinking if  NoSQL is preferable then why to learn SQL ? See unstructured Databases have similar syntax  and concept like SQL . This make easy to adopt any other  changes. For example Cassandra is very famous NoSQL database which has CQL  ( Cassandra Query Language ) . It is same like SQL .

Big Data Technologies like Hadoop framework have PIG and Hive components which is similar to SQL  in nature .  I will recommend you to please have a look on the article Relational Databases Vs Non Relational Databases for complete understanding .

Working with RDBMS using SQL-

There are so many RDBMS which use SQL as query languages like –

  1. MySQL
  2. MS SQL SERVER
  3. ORACLE
  4. MS ACCESS

I will pick one out of them for further discussion because If you know one , You can easy use other. Hey let me choose Oracle .

How to Download and Install Oracle –

With my experience of teaching and working on organization , I have seen a common doubt in most beginner . They do not understand client and server architecture . When you need to access any database , You must install a server . Now you have three option either to install it on local system , Any other external system or Cloud based . Apart from it , You need a client side software  to access it . Actually people  usually get confused because they install server and client on the same machine . Along with it they use complete package installer to install the server which automatically install the client side software  without any extra effort .

That is why I will explain you that oracle Database 12 c  comes with default client side SQL DEVELOPER . It is  three step process  –

  1. Download Oracle database 12 c .
  2. Install Oracle Database and set up your machine  .
  3. Start working with Oracle using SQL .

Download Oracle database 12 c-

Oracle 12 c is world first cloud based database . If want to see all its features visit the features page of oracle 12 c . You can down load it from the link Download oracle 12c .

using sql with oracle

Install Oracle Database and set up your machine –

Once you download the setup of oracle database , You may install it using the this  documentation .

Start working with Oracle using SQL –

Now you need a client side software to access the database from server . Here SQL DEVELOPER comes default with oracle database  . See , Here is an important point for you . Do not confuse here , If you are accessing the data base of any other external server , You need not to install oracle server just use SQL DEVELOPER it comes with exe file package . Once it will open you can make the connection  as shown in below image-

using sql developer

Here in the host name , You can drop the IP address of the server where the oracle is installed .If you have installed it on local machine just leave it as  localhost .Now once the connection is established your system is ready for SQL uses .

First Query In Oracle  using SQL-

Now the platform is ready , Its time to brush up you SQL skills for data science . If you want to take a reference to learn using SQL  , I will recommend to use W3 SCHOOL . Its my personal suggestion to start working on some data rather than just reading . In my upcoming article I am planing to give you some data . Our team is busy in making hands on tutorial of SQL for data science beginners like you .

Anyways , we can not end up the the training of SQL for Data science in just a single article so here is the complete road map for  further reading –

  1. SQL essentials for Data science part 1  ( Coverage from Data insertion to Manipulation )   .
  2. SQL essentials for Data science part 2 ( Some conceptual terms )
  3. Tips  for Database design in DBMS for High Performance ( specially when crawling Data from external source and dumping in own database )
  4. In case you are using training data set from Database ,You should follow the article – machine learning datasets designing – Best Practices ( Recommended only for Machine learning )

 

If you like this article or have any suggestion related to SQL  for data science , You can write back to us . Keep reading and stay connected .

Data Science Learner Team