How to Choose n_estimators in Random Forest ? Get Solution

Are you looking for how to choose n_estimators in the random forest? Actually, n_estimators defines in the underline decision tree in Random Forest. See ! the Random Forest algorithms is a bagging Technique. Where we ensemble many weak learn to decrease the variance. The n_estimators is a hyperparameter for Random Forest. So In order to tune this parameter, we will use GridSearchCV. In this article, We will explore the implementation of GridSearchCV for n_estimators in random forests.

Choosing n_estimators in the random forest ( Steps ) –

Let’s understand the complete process in the steps. We will use sklearn Library for all baseline implementation.

Step 1-

Firstly, The prerequisite to see the implementation of hyperparameter tuning is to import the GridSearchCV python module.

from sklearn.model_selection import GridSearchCV

Step 2-

Secondly, Here we need to define the range for n_estimators. With GridSearchCV, We define it in a param_grid. This param_grid is an ordinary dictionary that we pass in the GridSearchCV constructor. In this dictionary, We can define various hyperparameter along with n_estimators.

param_grid = {
  
    'n_estimators': [100, 200, 300, 1000]
}

Step 3 –

To sum up, this is the final step where define the model and apply GridSearchCV to it.

random_forest_model = RandomForestRegressor()
# Instantiate the grid search model
grid_search = GridSearchCV(estimator = random_forest_model , param_grid = param_grid, cv = 3, n_jobs = -1)

We invoke GridSearchCV() with the param_grid. The n_jobs = -1 indicates utilizing all the cores of the system.
Now once we call the ‘grid_search.best_params_ ‘, It will give you the optimal number for n_estimators for the Random Forest. We may use the RandomSearchCV method for choosing n_estimators in the random forest as an alternative to GridSearchCV. This will also give the best parameter for Random Forest Model.

Random Forest Hyperparameters :

Most importantly, Here is the complete syntax for Random Forest Model. You may see the default values for n_estimators.

class sklearn.ensemble.RandomForestClassifier(n_estimators=100, *, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, class_weight=None, ccp_alpha=0.0, max_samples=None)

How to choose n_estimators in random forest

Conclusion –

Most Importantly, this implementation must have cleared you how to choose n_estimators in the random forest. If you still facing any difficulties with n_estimators and their optimal value, Please comment below. Above all, If you want to keep reading an article on These bagging and boosting Algorithms, Please subscribe to us.

Thanks
Data Science Learner Team