Before I start writing article I will justify the importance of SQL in Data science .The reason for this justification is you will grasp something when you have hunger for this . In the first Paragraph, I will give you completely logical reason for using SQL in Data science .So lets start thinking , If you are a Data Scientist , You have to play with data Right ? The Data can be in any form. It could be Structured or Unstructured data .Now we will discuss the importance of using SQL for both kind of Data formats ( Structured Data and Unstructured Data ) one by one .
Using SQL for Structured Data –
Do you completely know about structured data ? Don’t worry if your answer is NO . Structured data mean which has predefined structure for example there is a registration form in which there are fixed field for input. These field always give you certain data types as an input . In case user input wrong format , It will be filter first from outside using external validation . In this scenario relational Database is best option .Using SQL you can play different games with data . It will make you task easy in Data mining like finding hidden pattern etc . I think , This is enough to convince you for learning and using SQL in Data Science .Lets move towards Unstructured Data –
Using SQL for Unstructured Data –
This is going to be a turning point for this article .First of all I will give you an short Introduction to Unstructured Data.Unstructured Data means which does’t have any predefined format like pdf text , Facebook feed , video stream etc. Suppose you are commenting on your friend’s birthday party pic . There you use simley and some text . In the same pic other friend may use any gif or video file . When we have to play with such data , Most of the Data scientist prefer NoSQL Databases . Now you must be thinking if NoSQL is preferable then why to learn SQL ? See unstructured Databases have similar syntax and concept like SQL . This make easy to adopt any other changes. For example Cassandra is very famous NoSQL database which has CQL ( Cassandra Query Language ) . It is same like SQL .
Big Data Technologies like Hadoop framework have PIG and Hive components which is similar to SQL in nature . I will recommend you to please have a look on the article Relational Databases Vs Non Relational Databases for complete understanding .
Working with RDBMS using SQL-
There are so many RDBMS which use SQL as query languages like –
I will pick one out of them for further discussion because If you know one , You can easy use other. Hey let me choose Oracle .
How to Download and Install Oracle –
With my experience of teaching and working on organization , I have seen a common doubt in most beginner . They do not understand client and server architecture . When you need to access any database , You must install a server . Now you have three option either to install it on local system , Any other external system or Cloud based . Apart from it , You need a client side software to access it . Actually people usually get confused because they install server and client on the same machine . Along with it they use complete package installer to install the server which automatically install the client side software without any extra effort .
- Download Oracle database 12 c .
- Install Oracle Database and set up your machine .
- Start working with Oracle using SQL .
Download Oracle database 12 c-
Install Oracle Database and set up your machine –
Once you download the setup of oracle database , You may install it using the this documentation .
Start working with Oracle using SQL –
Now you need a client side software to access the database from server . Here SQL DEVELOPER comes default with oracle database . See , Here is an important point for you . Do not confuse here , If you are accessing the data base of any other external server , You need not to install oracle server just use SQL DEVELOPER it comes with exe file package . Once it will open you can make the connection as shown in below image-
Here in the host name , You can drop the IP address of the server where the oracle is installed .If you have installed it on local machine just leave it as localhost .Now once the connection is established your system is ready for SQL uses .
First Query In Oracle using SQL-
Now the platform is ready , Its time to brush up you SQL skills for data science . If you want to take a reference to learn using SQL , I will recommend to use W3 SCHOOL . Its my personal suggestion to start working on some data rather than just reading . In my upcoming article I am planing to give you some data . Our team is busy in making hands on tutorial of SQL for data science beginners like you .
Anyways , we can not end up the the training of SQL for Data science in just a single article so here is the complete road map for further reading –
- SQL essentials for Data science part 1 ( Coverage from Data insertion to Manipulation ) .
- SQL essentials for Data science part 2 ( Some conceptual terms )
- Tips for Database design in DBMS for High Performance ( specially when crawling Data from external source and dumping in own database )
- In case you are using training data set from Database ,You should follow the article – machine learning datasets designing – Best Practices ( Recommended only for Machine learning )
If you like this article or have any suggestion related to SQL for data science , You can write back to us . Keep reading and stay connected .
Data Science Learner Team