Step by Step Guide to Build Machine Learning Pipeline : Using scikit-learn

I know You have knowledge of building a machine learning model. It requires many steps like data cleaning, data reduction, model creation, and other steps. Each time you define a problem on it, you repeat all the steps to make a better model. But wait do you know you can automate these steps? If yes then you can read our other articles. And if not then this tutorial is for you. You will know step by step guide to build a machine learning pipeline.

Steps for building the best predictive model

Before defining all the steps in the pipeline first, you should know what are the steps for building a proper machine-learning model.
Suppose you want the following steps.

1. Scaling the dataset and target variable.

from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

2. Data Reductions using the Principal Component.

#apply pca
from sklearn.decomposition import PCA
pca = PCA(n_components=5)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)

3.Creation of model.

#model selection
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=200)
regressor.fit(X_train,y_train)

The above steps seem good, but you can define all the steps in a single machine-learning pipeline and use it. For this, you have to import the sklearn pipeline module. You will use as a key-value pair for all the different steps.

from sklearn.pipeline import Pipeline
pipe = Pipeline([
    ('scale',StandardScaler()),
    ('pca',PCA(n_components=5)),
    ('randomforest',RandomForestRegressor())
    
])

You can see the Pipeline() constructor has been called with all the steps inside it. For example, the above steps are Principal component analysis, scaling, and b. Now all the steps have been included in the pipeline, you can call the fit() method on X_train and Y_train and the score() method on X_test and y_test.

pipe.fit(X_train,y_train)

pipe.score(X_test,y_test)

Other Modifications you can do with the pipeline.

1. See all the steps inside the pipeline

Just call the pipe.steps to see all the steps used in the pipeline.

2. Look all the parameters

You can use the method get_params() for looking at all the method parameters.

pipe.get_params()

3. Change or Set the value of the parameters.

Use the set_params() method for changing the value of the parameters. For example, if I want to change the number of components for the PCA to 3, then you will use the following code.

pipe.set_params(pca__n_components=3)

Conclusion

Creating a pipeline for all the steps not only reduces the lines of code but also makes a way to implement all the steps automatically. You can also add the other steps inside it to make your prediction model more accurate. We hope you have understood how to build a Machine Learning Pipeline. If you have any questions about it then you can contact us or message us on our Data Science Learner Page.