Python is one the the champion programming language for any task in Data Science . Most of our readers know this fact already . I think , Knowledge is incomplete without its back end theory .You must know the reason behind it .The base behind the Python success is its Libraries and their community support . Pandas is also one the most useful library for python . This article is very useful for them who wants to learn Pandas .Basically , This article is a First part for short Python Pandas Tutorial .
Comparing Different Programming Language for Data science is essential but it is out of scope from this article . In case , You want to go deeper into it , Please read Best Machine Learning Language for Data Science . It will help you in analyzing the best machine learning language for you.I have seen many programmer/ data scientist who are 80 % clear for python but still remain confused. They should read the article Python for Data Analysis for clear understanding .
Alright , Lets read Pandas –
Introduction to Python Pandas –
Pandas is a high performance , open source python library which is very helpful in different data structures manipulation and data analysis as well .If I list most admirable feature , So below is an overview –
1.Data Frame object makes data manipulation quite easy and simple .
2.Pandas API provides reading and writing data from in-memory database or secondary files like CSV ,text files etc.
3. Data manipulation and handling of missing data is easy with pandas.
4. Merging of external dataset is possible and hassle free .
Till now , we have seen Pros and cons of Python Pandas .Lets Explore it different Data Structures . Pandas support three type of Data Structures –
|Series||1||1D Labeled homogeneous array, Size is Immutable|
|Data Frames||2||2d Labeled, Size is mutable, Have tabular structure with Potential heterogenously typed columns|
|Panel||3||3D labeled, size has mutable array|
Among three Data Structures of Pandas , Data Analyst and Data Scientist Mostly prefer to use Data Frame .So In This Python Pandas Tutorial , We will drill down the Data Frames . Hey Don’t Worry I will introduce you with Other two .Series is one dimensional and Size Immutable Data Structure in Pandas . In the different corner Panel is three Dimensional and size Mutable Data Structure in Pandas .It time to start coding ground for Python Pandas tutorial
Coding Aspects In Python Pandas Tutorial :
How to import Pandas :
Below Code will help in importing pandas and creating an empty pandas dataframe .
Actually here in above example We have not pass any argument in the constructor . lets understand about the arguments that pandas data frame supports
Here data is what you want to insert as values , Index is correspond name of the Row . Columns is the name of the columns of data . Make sure these arguments are optional . I mean use it when you really need . There is also no hard as fast rule to use every arguments or parameters in this function .
Anyways Mostly , When you have data for analysis It must be a CSV or any other format external file . So Lets understand how can you read the data from these files . Suppose we have a file in path (path = ‘F:/data/sms.tsv) .Let see how can we read data –
Let me explain the attribute which I used in the above syntax . Here path is the address of dataset . If you don’t want to put your local path , You can use use URL in path . Header is first Row in Data frame . Usually if it is None , You can give your custom name. as the same we did in the above code syntax example.One more Thing is important here , You can also use a argument [sep = ‘ /’] . Here sep stands for separator . You can use any character or symbol under single quote . This separator in Pandas separates the column values into the files.
Working with Series derived from DataFrame :
I have already make you familiar with Data Structure in Pandas. Do you know ,”You can derive series from a dataframe .”Actually Series is nothing but column in Dataframe . See How ?
Here in above example , You can see that we have created a series name “ser” which is nothing but a column of sms .
Here sms is a dataframe object . To make it confirm , I have added type(arg) for knowing the type of data structure .
There are so many parameters with the function ( read_table ) .You can reach all of them in details from its documentation ( Python Pandas detail documentation) . You can also use (.) dot for accessing a particular column in dataframe . The issue comes when any column name have space in between . Then It is important to work with above used code style . ‘ Dot method ‘ also stuck if there is any column whose name conflicts with any inbuilt method .Lets have an example to understand ‘ Dot method ‘ –
How to create a New Column or Series in Pandas DataFrame :
Suppose you have dataframe object with five column . Now you want to derive a new column from any of two in dataframe and add into the current dataframe .Here you can understand how to achieve it .Before we go deep into code part lets see the data first . Here are three column ‘Summary’ , ‘CPC1 ‘ and ‘CPC2’ .Now I want to make a new column in the dataframe which adds CPC1 AND CPC2 .
First of all we have to read the data . So Lets do it first –
Now Lets see the output here-
What describe() do in Python Pandas :
If Pandas dataframe object have numeric column and you want to see some basic stats on them . this describe() function is very helpful for you-
How to see the shape of Pandas DataFrame Object :
The Dataframe object usually contains many rows and column . Suppose you need to put a loop to iterate the dataframe . In this situation you need the external limit . Now you can use –
It shows dataframe has 99 rows and 4 columns .Do not think its end of python pandas tutorial . Its a series so wait for the next part to release . You can subscribe us . It will update you for every new release article .
Data Science Learner Team