Logistic Regression is one of the popular Machine Learning Algorithm that predicts numerical categorical variables. It is a supervised Machine Learning Algorithm for the classification. You can think this machine learning model as Yes or No answers. For example, you have a customer dataset and based on the age group, city, you can create a Logistic Regression to predict the binary outcome of the Customer, that is they will buy or not. In this tutorial of How to, you will learn ” **How to Predict using Logistic Regression in Python** “.

## Difference Between the Linear and Logistic Regression

**Linear Regression**: In the Linear Regression you are predicting the numerical continuous values from the trained Dataset. That is the numbers are in a certain range.

**Logistic Regression**: In it, you are predicting the numerical categorical or ordinal values. It means predictions are of discrete values.

## Popular Use Cases of the Logistic Regression Model

There are many popular Use Cases for Logistic Regression. Some of them are the following :

**Purchase Behavior**: To check whether a customer will buy or not.

**Disaster Prediction**: Predict the possibility of Hazardous events like Floods, Cyclone e.t.c

**Diseases Prediction**: Possibilities of Cancer in a person or not.

**Handwriting recognition**

## Assumptions on the DataSet

The followings assumptions are applied before doing the Logistic Regression. You must remember these as a condition before modeling.

- There should be no missing values in the dataset.
- The target feature or the variable must be binary (only two values) or the ordinal ( Categorical Variable With the ordered values).
- All the other data variables should not have any relationship. It means they are independent and have no correlation between them.
- The data shall contain values not less than 50 observations for the reliable results.

## Step by Step for Predicting using Logistic Regression in Python

### Step 1: Import the necessary libraries

Before doing the logistic regression, load the necessary python libraries like numpy, pandas, scipy, matplotlib, sklearn e.t.c .

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pylab import rcParams
import seaborn as sb
import scipy
from scipy.stats import spearmanr
import sklearn
from sklearn import preprocessing
from sklearn.preprocessing import scale
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split
import sklearn.metrics as sm
```

Here you are importing for the following purposes

- rcParams for matplotlib visualization parameters.
- spearmanr for finding the spearman rank coefficient. It used for checking the dependent or independent variable.
- scale for normalization of the dataset.
- train_test_split for dividing the training and test dataset.
- sklearn metrics for accuracy report generation.

### Step 2: Define the Parameter for the Matplotlib

```
%matplotlib inline
rcParams["figure.figsize"] =10,5
sb.set_style("whitegrid")
```

It tells the python interpreter to show all the figures inline in Jupyter Notebook.

### Step 3: Load the Dataset

In this step, you will load and define the target and the input variable for your model. I am using the mtcars dataset. You can download from the GitHub URL.

```
address = "data/mtcars.csv"
cars= pd.read_csv(address)
cars.columns = ['car_names','mpg','cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']
data = cars.iloc[:,[5,11]].values
data_names = ["drat","carb"]
y = cars.iloc[:,[9]].values
```

### Step 4: Check for the independence of the variable.

```
drat= cars["drat"]
carb = cars["carb"]
#Find the Spearmen Cofficient.
spearmanr_coff, p_value = spearmanr(drat,carb)
spearmanr_coff
#negative no correlation
```

The Spearman rank’s coefficient is negative therefore we can say **drat** and the **carb** variable has no correlation. These two are independent of each other.

### Step 5: Check for the missing values

`cars.isnull().sum()`

You can see there are no missing values in the dataset that is good. If you find any missing values in the dataset then remove or replace it. Read the following tutorial for dealing with the missing values.

Steps to Deal with the missing values.

### Step 6: Data is binary or Ordinal? Check it

`sb.countplot(x="am",data=cars,palette="hls")`

From the figure, you can say the variables are binary that has only 0 and 1 values.

### Step 7: Deploy and check the accuracy of the model

```
x = scale(data)
LogReg = LogisticRegression()
#fit the model
LogReg.fit(x,y)
#print the score
print(LogReg.score(x,y))
```

After scaling the data you are fitting the LogReg model on the x and y. The LogReg.score(x,y) will output the model score that is R square value. In this case, the score is **0.8125** that is good. You can use the sklearn metrics for the classification report. If there are High recall and High

```
y_predict = LogReg.predict(x)
from sklearn.metrics import classification_report
report = classification_report(y,y_predict)
print(report)
```

The precision and recall of the above model are 0.81 that is adequate for the prediction. Just remember you look for the high recall and high precision for the best model.

## Conclusion:

Logistic Regression is the popular way to predict the values if the target is binary or ordinal. Only the requirement is that data must be clean and no missing values in it. You can use it any field where you want to manipulate the decision of the user. Just follow the above steps and you will master of it.

Hope this tutorial on How to Predict using Logistic Regression in Python? benefited you in the deployment of the model on your own dataset. If you have any query regarding this then please contact or message on our official data science learner page.

#### Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.