Logistic Regression is one of the popular Machine Learning Algorithm that predicts numerical categorical variables. It is a supervised Machine Learning Algorithm for the classification. You can think this machine learning model as Yes or No answers. For example, you have a customer dataset and based on the age group, city, you can create a Logistic Regression to predict the binary outcome of the Customer, that is they will buy or not. In this tutorial of How to, you will learn ” How to Predict using Logistic Regression in Python “.
Linear Regression: In the Linear Regression you are predicting the numerical continuous values from the trained Dataset. That is the numbers are in a certain range.
Logistic Regression: In it, you are predicting the numerical categorical or ordinal values. It means predictions are of discrete values.
There are many popular Use Cases for Logistic Regression. Some of them are the following :
Purchase Behavior: To check whether a customer will buy or not.
Disaster Prediction: Predict the possibility of Hazardous events like Floods, Cyclone e.t.c
Diseases Prediction: Possibilities of Cancer in a person or not.
Handwriting recognition
The followings assumptions are applied before doing the Logistic Regression. You must remember these as a condition before modeling.
Before doing the logistic regression, load the necessary python libraries like numpy, pandas, scipy, matplotlib, sklearn e.t.c .
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pylab import rcParams
import seaborn as sb
import scipy
from scipy.stats import spearmanr
import sklearn
from sklearn import preprocessing
from sklearn.preprocessing import scale
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split
import sklearn.metrics as sm
Here you are importing for the following purposes
%matplotlib inline
rcParams["figure.figsize"] =10,5
sb.set_style("whitegrid")
It tells the python interpreter to show all the figures inline in Jupyter Notebook.
In this step, you will load and define the target and the input variable for your model. I am using the mtcars dataset. You can download from the GitHub URL.
address = "data/mtcars.csv"
cars= pd.read_csv(address)
cars.columns = ['car_names','mpg','cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']
data = cars.iloc[:,[5,11]].values
data_names = ["drat","carb"]
y = cars.iloc[:,[9]].values
drat= cars["drat"]
carb = cars["carb"]
#Find the Spearmen Cofficient.
spearmanr_coff, p_value = spearmanr(drat,carb)
spearmanr_coff
#negative no correlation
The Spearman rank’s coefficient is negative therefore we can say drat and the carb variable has no correlation. These two are independent of each other.
cars.isnull().sum()
You can see there are no missing values in the dataset that is good. If you find any missing values in the dataset then remove or replace it. Read the following tutorial for dealing with the missing values.
Steps to Deal with the missing values.
sb.countplot(x="am",data=cars,palette="hls")
From the figure, you can say the variables are binary that has only 0 and 1 values.
x = scale(data)
LogReg = LogisticRegression()
#fit the model
LogReg.fit(x,y)
#print the score
print(LogReg.score(x,y))
After scaling the data you are fitting the LogReg model on the x and y. The LogReg.score(x,y) will output the model score that is R square value. In this case, the score is 0.8125 that is good. You can use the sklearn metrics for the classification report. If there are High recall and High
y_predict = LogReg.predict(x)
from sklearn.metrics import classification_report
report = classification_report(y,y_predict)
print(report)
The precision and recall of the above model are 0.81 that is adequate for the prediction. Just remember you look for the high recall and high precision for the best model.
Logistic Regression is the popular way to predict the values if the target is binary or ordinal. Only the requirement is that data must be clean and no missing values in it. You can use it any field where you want to manipulate the decision of the user. Just follow the above steps and you will master of it.
Hope this tutorial on How to Predict using Logistic Regression in Python? benefited you in the deployment of the model on your own dataset. If you have any query regarding this then please contact or message on our official data science learner page.