Linear Regression is a very popular supervised machine learning algorithms. Supervised Means you have to train the data before making any new predictions. It finds the relationship between the variables for prediction. In this tutorial of “How to” you will know how Linear Regression Works in Machine Learning in easy steps.
You will come to know the following things after reading the entire post.
Linear Regression is of two types. One is simple linear regression and other is Multiple Linear Regression. In a dataset, if you have one predictor (variable ) and one predictant then it is simple linear regression. If there are multiple predictors and one predictant , then it is multiple linear regression. In this tutorial, I will demonstrate only multiple linear regression. If you understood it, then you will easily implement the Simple type.
There are many use cases of the Linear Regression you will find in daily life. Some of the major use cases are:
If you directly jump to perform the linear regression on the dataset can be time wasting. As you cannot use the regression model in every dataset. Therefore you should check the following assumptions before doing regression analysis.
Let’s do the coding part to know How Linear Regression Works in Machine Learning. You just follow the simple steps and keep in mind the above assumption.
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
import matplotlib.pyplot as plt
from pylab import rcParams
import seaborn as sb
import sklearn
from sklearn.preprocessing import scale
from sklearn import datasets
from sklearn.linear_model import LinearRegression
import sklearn.metrics as sm
from collections import Counter
I am using the popular Sklearn library for preprocessing and regression algorithms.
For displaying the figure inline I am using the Matplotlib inline statement and defining the figure size.
%matplotlib inline
rcParams["figure.figsize"] =20,10
sb.set_style("whitegrid")
I am using the enrollment dataset for doing Multiple linear regression analysis. Here all the predictors variable are numerical and continuous numerical.
address = "C:\\Users\\skrsu\\Desktop\\Jypter\\data\\enrollment.csv"
enroll= pd.read_csv(address)
enroll.columns = ["year","roll","unemployment","grade","income"]
enroll.head()
To find the relationship between the variables I am calling the seaborn pairplot() method. It helps you to verify the relationship. Then you will use the corr() method on the dataset will for verifying the independent variables. After that, we will scale the chosen input variable from the dataset.
sb.pairplot(enroll)
The figure shows clearly the linearity between the variable and they have a good linear relationship.
enroll.corr()
You will choose those variables that are independent and are linear with each other. In this case, unemployment and grade have not a good correlation. You will choose that as predictors.
enroll_data = enroll.iloc[:,[2,3]]
enroll_target = enroll.iloc[:,[1]]
enroll_data_names = ["unemployment","grade"]
Now you will scale the dataset. It will normalize the dataset for the right predictions.
#scale the data
x= scale(enroll_data)
y =enroll_target
missing_values = x==np.NaN
x[missing_values ==True]
The output shows there are not any missing values in the dataset that is great.
In this step, we will call the Sklearn Linear Regression Model and fit this model on the dataset.
LinReg = LinearRegression(normalize=True)
#fit he model
LinReg.fit(x,y)
If you have correctly modeled the Linear Regression then you will get a good accuracy score. You can use the model score() method for finding the accuracy score.
print(LinReg.score(x,y))
The LinReg.coef_ will return an array of coefficients for the independent variables. Here We are using the two variables (unemployment and grade). In the same way LinReg.intercept_ gives the intercept of the Linear Regression. You can also verify the predicted values using the predict( ) method on the dataset.
Linear Regression is a very popular machine learning algorithm for analyzing numeric and continuous data. All the features or the variable used in prediction must be not correlated to each other. Therefore before designing the model you should always check the assumptions and preprocess the data for better accuracy.
Hope you have learned how the linear regression works in very simple steps. In case you have any query on the machine learning algorithms then contact us. We are always ready to help you.
Thanks
Data Science Learner Team