Does Random Forest Need Normalization ? Get Complete Analysis

Does Random Forest Need Normalization
Does Random Forest Need Normalization

The Answer for the question “Does random forest need normalization ?” is No. Random Forest is Tree Based Approach where distance matrix is not required.  In fact, The normalization or any kind of Feature scaling is only applicable for only those ML algorithms where any distance matrix is required. In this article we will see how Random Forest is unimpacted with any Scaling. We will also explore multiple algorithms where scaling is mandatory.

 

Does random forest need normalization ? ( Practical Scenario) –

Above all, Lets construct two Radom Forest Model. One without scaling ( with Absolute values of feature ) and second with scaling ( Normalization ). Hence We will compare their performance with Accuracy to validate our assumptions and Hypothesis.

Scenario 1 : Random Forest without Feature Scaling ( Normalization ) –

Firstly, In order to demonstrate lets build a simple random forest with sklearn dataset. We will put all parameters and syntax very generic. Our main focus is on comparing both the hypothesis. Here is the complete code for the same.

from sklearn import metrics
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split
data = load_boston()
X = data.data
y = data.target
y_round = y.round()
rfc = RandomForestClassifier(random_state=20)
X_train, X_test, y_train, y_test = train_test_split(X, y_round, test_size=0.20, random_state=20)
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
print(f'Accuracy with Absolute Value is : {metrics.accuracy_score(y_test, y_pred)}')

Here we used the data from sklearn boston dataset. We trained the random forest model and got the below accuracy on running the above code.

Does random forest need normalization ( Comparing the performance )
Does random forest need normalization ( Comparing the performance )

 

Scenario 2 : Random Forest with Feature Scaling ( Normalization ) –

Secondly, In this scenario, we will keep everything identical. I mean parametric value , dataset etc. Only we will add the step to scale the data. Then we will train the model and again check the accuracy for the random forest model. Please run the below code.

from sklearn import metrics
from sklearn.preprocessing import  MinMaxScaler
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split
data = load_boston()
X = data.data
y = data.target
y_round = y.round()
rfc = RandomForestClassifier(random_state=20)
X_train, X_test, y_train, y_test = train_test_split(X, y_round, test_size=0.20, random_state=20)
sc = MinMaxScaler()
X_train_norm = sc.fit_transform(X_train)
X_test_norm = sc.transform(X_test)
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
print(f'Accuracy After Normalization  is : {metrics.accuracy_score(y_test, y_pred)}')

Here you may see that we have scaled our data using MinMaxScaler before training the model. Finally we will check the accuracy for the model.

random forest with normalization ( Comparing the performance )
random forest with Scalar ( Comparing the performance )

Final Comparison  –

When we compare both the scenario, we found both the places the accuracy is almost same. It means the Feature Scaling will not impact the performance of Tree Models ( Random Forest ).Actually it involves Gini Index, Information gain where scaling will not make any sense.  Most Importantly, The Algorithms like Neural Networks, Regressions etc requires Feature Scaling. Since It involves distance Matrix.

Thanks 

Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner