Linear Regression is a very popular supervised machine learning algorithms. Supervised Means you have to train the data before making any new predictions. It finds the relationship between the variables for prediction. In this tutorial of “How to” you will know how Linear Regression Works in Machine Learning in easy steps.
You will come to know the following things after reading the entire post.
- Types of Linear Regression.
- Use Cases of the Linear Regression.
- Assumption and Conditions for Regression.
- Coding Demonstration.
Types of Linear Regression
Linear Regression is of two types. One is simple linear regression and other is Multiple Linear Regression. In a dataset, if you have one predictor (variable ) and one predictant then it is simple linear regression. If there are multiple predictors and one predictant , then it is multiple linear regression. In this tutorial, I will demonstrate only multiple linear regression. If you understood it, then you will easily implement the Simple type.
What are the Use Cases of the Linear Regression?
There are many use cases of the Linear Regression you will find in daily life. Some of the major use cases are:
- Forecasting of a Company Annual Sales.
- Supply and Demand Forecasting.
- Weather Forecasting.
- Stocks Market Prediction
When to use the Linear Regression?
If you directly jump to perform the linear regression on the dataset can be time wasting. As you cannot use the regression model in every dataset. Therefore you should check the following assumptions before doing regression analysis.
- All the variables (Features) must be continuous numerical. It should not be categorically divided.
- There should be no missing values and the outliers in the dataset.
- The relationship between the predictors and predicant must be linear.
- No correlation between each predictor. Means All predictors should be independent of each other.
- Residual(Difference between the Predicted value and Observed value ) must be Normally Distributed.
Let’s do the coding part to know How Linear Regression Works in Machine Learning. You just follow the simple steps and keep in mind the above assumption.
Step 1: Import the necessary libraries for performing the regression.
import numpy as np import pandas as pd from pandas import Series,DataFrame import matplotlib.pyplot as plt from pylab import rcParams import seaborn as sb import sklearn from sklearn.preprocessing import scale from sklearn import datasets from sklearn.linear_model import LinearRegression import sklearn.metrics as sm from collections import Counter
I am using the popular Sklearn library for preprocessing and regression algorithms.
Step 2: Define the plotting parameters for the Jupyter notebook.
For displaying the figure inline I am using the Matplotlib inline statement and defining the figure size.
%matplotlib inline rcParams["figure.figsize"] =20,10 sb.set_style("whitegrid")
Step 3: Load the dataset.
I am using the enrollment dataset for doing Multiple linear regression analysis. Here all the predictors variable are numerical and continuous numerical.
address = "C:\\Users\\skrsu\\Desktop\\Jypter\\data\\enrollment.csv" enroll= pd.read_csv(address) enroll.columns = ["year","roll","unemployment","grade","income"] enroll.head()
Step 4: Find the relationship between the predictant and predictors.
To find the relationship between the variables I am calling the seaborn pairplot() method. It helps you to verify the relationship. Then you will use the corr() method on the dataset will for verifying the independent variables. After that, we will scale the chosen input variable from the dataset.
The figure shows clearly the linearity between the variable and they have a good linear relationship.
You will choose those variables that are independent and are linear with each other. In this case, unemployment and grade have not a good correlation. You will choose that as predictors.
enroll_data = enroll.iloc[:,[2,3]] enroll_target = enroll.iloc[:,] enroll_data_names = ["unemployment","grade"]
Now you will scale the dataset. It will normalize the dataset for the right predictions.
#scale the data x= scale(enroll_data) y =enroll_target
Step 5: Find the Null or Missing Values
missing_values = x==np.NaN x[missing_values ==True]
The output shows there are not any missing values in the dataset that is great.
Step 6: Define the Linear Regression Model and Fit on the dataset.
In this step, we will call the Sklearn Linear Regression Model and fit this model on the dataset.
LinReg = LinearRegression(normalize=True) #fit he model LinReg.fit(x,y)
Step 7: Check the accuracy and find Model Coefficients and Intercepts
If you have correctly modeled the Linear Regression then you will get a good accuracy score. You can use the model score() method for finding the accuracy score.
The LinReg.coef_ will return an array of coefficients for the independent variables. Here We are using the two variables (unemployment and grade). In the same way LinReg.intercept_ gives the intercept of the Linear Regression. You can also verify the predicted values using the predict( ) method on the dataset.
Linear Regression is a very popular machine learning algorithm for analyzing numeric and continuous data. All the features or the variable used in prediction must be not correlated to each other. Therefore before designing the model you should always check the assumptions and preprocess the data for better accuracy.
Hope you have learned how the linear regression works in very simple steps. In case you have any query on the machine learning algorithms then contact us. We are always ready to help you.
Data Science Learner Team
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.