How to Improve Accuracy of Random Forest ? Tune Classifier In 7 Steps

random forest classifier feature image

Random Forest is the best algorithm after the decision trees. You can say it’s a collection of independent decision trees. Each decision tree has some predicted score and value and the best score is the average of all the scores of the trees. But wait do you know you can improve the accuracy of the score by tuning the parameters of the Random Forest? Yes, rather than completely depend upon adding new data to improve accuracy, you can tune the hyperparameters to improve the accuracy. In this tutorial on “how to, you will know how to improve the accuracy of a random forest classifiers.

How Random Forest Works?

In a Random Forest,  algorithms select a random subset of the training dataset. Then It makes a decision tree on each of the sub-dataset. After that, it aggregates the score of each decision tree to determine the class of the test object. It is the case of the Random Forest Classifier. But for the Random Forest regressor, averages the score of each of the decision trees. This intuition is for random forest Classifier.

When to use Random Forest?

There are various machine learning algorithms and choosing the best algorithms requires some knowledge. Here are the things you should remember before using the Random Forest Algorithm

1. Random Forest works very well on both the categorical ( Random Forest Classifier) as well as continuous Variables (Random Forest Regressor).

2. Use it to build a quick benchmark of the model as it is fast to train.

3. If you have a dataset that has many outliers, missing values, or skewed data, it is very useful.

In the background, Random Forest Tree has hundreds of trees, Due to this, it takes more time to predict, therefore you should not use it for real-time predictions.

Hyper Parameters Tuning of Random Forest

Step1: Import the necessary libraries

import numpy as np
import pandas as pd
import sklearn

Step 2: Import the dataset.

train_features = pd.read_csv("train_features.csv")
train_label = pd.read_csv("train_label.csv")

You can download the dataset here. Same Dataset that works for tuning Support Vector Machine.

Step 3: Import the Random Forest Algorithm from the scikit-learn.

from sklearn.ensemble import RandomForestClassifier,RandomForestRegressor
print(RandomForestClassifier())
print(RandomForestRegressor())

parameters for the Random Forest

Step 4: Choose the parameters to be tuned.

On running step 3, you will see a lot of parameters for both the Random Forest Classifier and Regressor. I am choosing the important ones that are the number of estimators/trees (n_estimators)  and the maximum depth of the tree (max_depth).

Step 5: Call the classifier constructor and make the expected list of all the parameters.

You will make a list of all the parameters, you chose in step 4. Like in this example.

rfc = RandomForestClassifier()
parameters = {
    "n_estimators":[5,10,50,100,250],
    "max_depth":[2,4,8,16,32,None]
    
}

Step 6: Use the GridSearchCV model selection for cross-validation

You will pass the classifier and parameters and the number of iterations in the GridSearchCV method. In this example, I am passing the cross-validation iteration of 5. Then you will fit the GridSearchCV to the X_train variables and the X_train label.

Please note that you have to convert the values of the label into a one-dimensional array. That’s why we are using the ravel() method.

from sklearn.model_selection import GridSearchCV
cv = GridSearchCV(rfc,parameters,cv=5)
cv.fit(train_features,train_label.values.ravel())

grid search cross validation parameters

Step 7: Print the best Parameters.

This feature is available in the GridSearchCV. You can use cv. best_params_ to know the best parameters. But what the algorithm is doing inside it doesn’t print. That’s why We have defined the method for printing all the iterations done and scores in each iteration.

def display(results):
    print(f'Best parameters are: {results.best_params_}')
    print("\n")
    mean_score = results.cv_results_['mean_test_score']
    std_score = results.cv_results_['std_test_score']
    params = results.cv_results_['params']
    for mean,std,params in zip(mean_score,std_score,params):
        print(f'{round(mean,3)} + or -{round(std,3)} for the {params}')
display(cv)

scores for the each iteration

It will print the entire iteration results defined in the above function. And you can clearly see it print out the best score and the parameters. In  this example the best parameters are :

{'max_depth': 8, 'n_estimators': 250}

Use it in your random forest classifier for the best score.

random forest sklearn accuracy improvement
random forest sklearn accuracy improvement

Conclusion

The Parameters tuning is the best way to improve the accuracy of the model. In fact, there are also other ways, like adding more data e.t.c. But it is obvious that it adds some cost and time to improve the score. Therefore  I recommend you to first go with parameter tuning if you have sufficient data and then move on to add more data.

That’s all for now. If you want to get featured on Data Science Learner Page. Then contact us to know what are the requirements. If you have any queries, then message us. You can also message us on our official Facebook Page.

Other Queries

Here you will know all the queries asked by the data science reader.

Q: How to improve the accuracy of svm in python?

There are many ways to improve the accuracy of the Support vector machine and some of them are the following.

  1. Improve preprocessing
  2. Use another kernel
  3. Change training instance
  4. Change the cost function.

There is an answer in stackoverflow for this question. You can know from there.

 

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Sukesh ( Chief Editor ), a passionate and skilled Python programmer with a deep fascination for data science, NumPy, and Pandas. His journey in the world of coding began as a curious explorer and has evolved into a seasoned data enthusiast.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner