Python Pandas Tutorial for Data Science with Examples: Part -1

Python Pandas Tutorial feature image

Python is one the the champion programming language for any task in Data Science . Most of our readers know this fact already . I think ,  Knowledge is incomplete without its back end theory  .You must know the reason behind it .The base behind the Python success is its Libraries and their community support . Pandas is also one the most useful library for python . This article is very useful for them who wants to learn Pandas .Basically  , This article is a First part for short Python Pandas Tutorial .

Comparing Different Programming Language for Data science is essential but it is out of scope from this article . In case , You want to go deeper into it , Please read Best Machine Learning Language for Data Science . It will help you in analyzing the best machine learning language for you.I have seen many programmer/ data scientist  who are 80 % clear for python but still remain confused. They should read the article  Python for Data Analysis  for clear understanding .

Alright , Lets read Pandas –

Introduction to Python Pandas

Pandas is a high performance , open source python library which is very helpful in different data structures manipulation and data analysis as well .If I list most admirable feature , So below is an overview –

1.Data Frame object makes data manipulation quite easy and simple .

2.Pandas API provides reading and writing data from in-memory database or secondary files like CSV ,text files etc.

3. Data manipulation and handling of missing data is easy with pandas.

4. Merging of external dataset is possible and hassle free .

Till now , we have seen Pros and cons of Python Pandas .Lets Explore it different Data Structures . Pandas support three type of Data Structures

1.Series

2.DataFrame

3. Panel

Data Structure Dimensions Descriptions
Series 1 1D Labeled homogeneous array, Size is Immutable
Data Frames 2 2d Labeled, Size is mutable, Have tabular structure with Potential heterogenously typed columns
Panel 3  3D labeled, size has mutable array

Among  three Data Structures of Pandas , Data Analyst and Data Scientist Mostly prefer to use Data Frame .So In This Python Pandas Tutorial , We will drill down the Data Frames . Hey Don’t Worry I will introduce you with Other two .Series is one dimensional and Size Immutable Data Structure in Pandas . In the different corner Panel is three Dimensional and size Mutable Data Structure in Pandas .It  time to start coding ground for Python Pandas tutorial

Coding Aspects In Python Pandas Tutorial :

  1. How to import Pandas :

Below Code will help in importing pandas and creating an empty pandas dataframe .

Python Pandas Tutorial 2
Python Pandas Tutorial 2

Actually here in above example We have not pass any argument in the constructor . lets understand about the arguments that pandas data frame supports

Python Pandas Tutorial 3
Python Pandas Tutorial 3
Python Pandas Tutorial 4
Python Pandas Tutorial 4

Here data is what you want to insert as values , Index is correspond name of the Row  . Columns is the name of the columns of data . Make sure these arguments are optional . I mean use it when you really need . There is also no hard as fast rule to use every arguments or parameters in this function .

Anyways Mostly , When you have data for analysis It must be a CSV or any other format external file  . So Lets understand how can you read the data from these files . Suppose we have a file in path (path = ‘F:/data/sms.tsv) .Let see how can we read data –

Python Pandas Tutorial 5
Python Pandas Tutorial 5

Let me explain the attribute which I used in the above syntax . Here path is the address of dataset  . If you don’t want to put your local path , You can use use URL in path . Header is first Row in Data frame . Usually if it is None , You can give your custom name.  as the same we did in the above code syntax example.One more Thing is important here , You can also use a argument [sep = ‘ /’] . Here sep stands for separator . You can use any character  or symbol under single quote . This separator in Pandas separates the column values into the files.

Working with Series derived from DataFrame :

I have already make you familiar with Data Structure  in Pandas. Do you know ,”You can derive series from a dataframe .”Actually Series is nothing but column in Dataframe  . See How ?

Python Pandas Tutorial 6
Python Pandas Tutorial 6

Here in above example , You can see that we have created a series name  “ser” which is nothing but a column of sms .

Here sms is a dataframe object . To make it confirm , I have added type(arg) for knowing the type of data structure .

There are so many parameters with the function  ( read_table ) .You can reach all of them in details from its documentation ( Python Pandas detail  documentation) .  You can also use (.) dot for accessing a particular column in dataframe . The issue comes when any column name have space in between . Then It is important to work with above used code style . ‘ Dot method ‘ also stuck if there is any column whose name conflicts with any inbuilt method .Lets have an example to understand ‘ Dot method ‘ –

Python Pandas Tutorial 7
Python Pandas Tutorial 7

How to create a New Column or Series in Pandas DataFrame :

Suppose you have dataframe object with five column . Now you want to derive a new column from any of two in dataframe and add into the current dataframe .Here you can understand how to achieve it .Before we go deep into code part lets see the data first . Here are three column ‘Summary’  , ‘CPC1 ‘ and ‘CPC2’ .Now I want to make a new column in the dataframe which adds CPC1 AND CPC2 .

Python Pandas Tutorial 8
Python Pandas Tutorial 8

 

First of all we have to read the data . So Lets do it first –

Python Pandas Tutorial 9
Python Pandas Tutorial 9

Now Lets see the output here-

Python Pandas Tutorial 10
Python Pandas Tutorial 10

What describe() do in Python Pandas :

If Pandas dataframe object have numeric column and you want to see some basic stats on them . this describe() function is very helpful for you-

Python Pandas Tutorial 11
Python Pandas Tutorial 11

How to see the shape of Pandas DataFrame Object :

The Dataframe object usually contains many rows and column . Suppose you need to put a loop to iterate the dataframe . In this situation you need the external limit . Now you can use –

Python Pandas Tutorial 12
Python Pandas Tutorial 12

It shows dataframe has 99 rows and 4 columns .Do not think its end of python pandas tutorial . Its a series so wait for the next part to release . You can subscribe us . It will update you for every new release article .

Data Science Learner Team 

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner