How to Choose n_estimators in Random Forest ? Get Solution

How to Choose n_estimators in Random Forest

Are you looking for how to choose n_estimators in the random forest?  Actually, n_estimators  defines in the underline decision tree in Random Forest. See ! the  Random Forest algorithms is a bagging Technique. Where we ensemble many weak learn to decrease the variance. The  n_estimators  is a hyperparameter for Random Forest. So In order to tune this parameter, we will use GridSearchCV. In this article, We will explore the implementation of  GridSearchCV for  n_estimators in random forests.

Choosing  n_estimators in the random forest ( Steps ) –

Let’s understand the complete process in the steps. We will use sklearn Library for all baseline implementation.

Step 1-

Firstly, The prerequisite to see the implementation of hyperparameter tuning is to import the GridSearchCV python module.

from sklearn.model_selection import GridSearchCV

Step 2-

Secondly, Here we need to define the range for n_estimators. With GridSearchCV, We define it in a param_grid. This param_grid is an ordinary dictionary that we pass in the GridSearchCV constructor. In this dictionary, We can define various hyperparameter along with n_estimators.

param_grid = {
    'n_estimators': [100, 200, 300, 1000]

Step 3 –

To sum up, this is the final step where define the model and apply GridSearchCV to it.

random_forest_model = RandomForestRegressor()
# Instantiate the grid search model
grid_search = GridSearchCV(estimator = random_forest_model , param_grid = param_grid, cv = 3, n_jobs = -1)

We invoke GridSearchCV() with the param_grid. The n_jobs = -1 indicates utilizing all the cores of the system.
Now once we call the ‘grid_search.best_params_ ‘, It will give you the optimal number for n_estimators for the Random Forest. We may use the RandomSearchCV method for choosing n_estimators in the random forest as an alternative to GridSearchCV.  This will also give the best parameter for Random Forest Model.

Random Forest Hyperparameters :

Most importantly, Here is the complete syntax for Random Forest Model. You may see the default values for n_estimators.

class sklearn.ensemble.RandomForestClassifier(n_estimators=100, *, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, class_weight=None, ccp_alpha=0.0, max_samples=None)
How to choose n_estimators in random forest
How to choose n_estimators in random forest

Conclusion –

Most Importantly,  this implementation must have cleared you how to choose n_estimators in the random forest. If you still facing any difficulties with n_estimators and their optimal value, Please comment below. Above all, If you want to keep reading an article on These bagging and boosting Algorithms, Please subscribe to us.

Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages, where he and his team share knowledge and help others learn more about data science.
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner