Assessment of variable importance by random forests

 Assessment of variable importance by random forests  Fig. 9.8 Assessment of variable importance by random forests: the left plot shows the mean decrease in accuracy and the right the mean decrease in Gini index, both after permuting indi vidual variable values Furthermore, it appears that in practice random forests are very robust to changes in settings: averaging many trees also takes away a lot of the dependence on the exact value of parameters. In practice, the only parameter that is sometimes optimized is the number of trees (Efron and Hastie 2016), and even that usually has very little effect. This has caused random forests to be called one of the most powerful off-the-shelf classifiers available. 

Just like the classification and regression trees seen in Sect. 7.3, random forests can also be used in a regression setting. Take the gasoline data, for instance: training a model using the default settings can be achieved with the following command. > gasoline.rf <- randomForest(gasoline$NIR[gas.odd, ], + gasoline$octane[gas.odd], + importance = TRUE, + xtest = gasoline$NIR[gas.even, ], + ytest = gasoline$octane[gas.even]) 

For interpretation purposes Cell Counting Kit-8 chemicals, we have used the importance = TRUE argument, and we have provided the test samples at the same time. The results, shown in Fig. 9.9, are better than the ones from bagging: Fig. 9.9 Predictions for the gasoline data using random forests. Left plot: OOB predictions for the training data—right plot: test data > pl.range <- c(83, 90) > plot(gasoline$octane[gas.odd], gasoline.rf$predicted, + main = "Training: OOB prediction", xlab = "True", + ylab = "Predicted", xlim = pl.range, ylim = pl.range) > abline(0, 1, col = "gray") > plot(gasoline$octane[gas.even], gasoline.rf$test$predicted, + main = "Test set prediction", xlab = "True", + ylab = "Predicted", xlim = pl.range, ylim = pl.range) > abline(0, 1, col = "gray")

 However, there seems to be a bias towards the mean—the absolute values of the predictions at the extremes of the range are too small. Also the RMS values confirm that the test set predictions are much worse than the PLS and PCR estimates of 0.21: > resids <- gasoline.rf$test$predicted - gasoline$octane[gas.even] > sqrt(mean(residsˆ2)) [1] 0.63721

 One of the reasons can be seen in the variable importance plot, shown in Fig. 9.10: > rf.imps <- importance(gasoline.rf) > plot(wavelengths, rf.imps[, 1] / max(rf.imps[, 1]), + type = "l", xlab = "Wavelength (nm)", + ylab = "Importance", col = "gray") > lines(wavelengths, rf.imps[, 2] / max(rf.imps[, 2]), col = 2) > legend("topright", legend = c("Error decrease", "Gini index"), + col = c("gray", "red"), lty = 1) Fig. 9.10 Variable importance for modelling the gasoline data with random forests: basically, only the wavelengths just above 1200 nm seem to contribute

 Although the predictions improve somewhat, they are still nowhere near the PLS and PCR results shown in Chap. 8. 

 For comparison, we also show the results of random forests on the prediction of the even samples in the prostate data set: > prost.rf <- + randomForest(x = prost[prost.odd, ], + y = prost.type[prost.odd], + x.test = prost[prost.even, ], + y.test = prost.type[prost.even]) > prost.rfpred <- predict(prost.rf, newdata = prost[prost.even, ]) > table(prost.type[prost.even], prost.rfpred) prost.rfpred control pca control 30 10 pca 4 80 Again, a slight improvement over bagging can be seen. 

 9.7.3 Boosting 

 In boosting (Freund and Schapire 1997), validation and classification are combined in a different way. Boosting, and in particular in the adaBoost algorithm that we will be focusing on in this section, is an iterative algorithm that in each iteration focuses the attention to misclassified samples from the previous step. Just as in bagging, in principle any modelling approach can be used; also similar to bagging, not all combinations will show improvements. Other forms of boosting have appeared since the original adaBoost algorithm, such as gradient boosting, popular in the statistics community (Friedman 2001; Efron and Hastie 2016). One of the most powerful 

 Both criteria are dominated by the wavelengths just above 1200 nm. Especially the Gini index leads to a sparse model, whereas the error-based importance values clearly are much more noisy. Interestingly, when applying random forests to the first derivative spectra of the gasoline data set (not shown) the same feature around 1200 nm is important 3xFLAG solubility, but the response at 1430 nm comes up as an additional feature. new variants is XGBoost (which stands for Extreme Gradient Boosting, Chen and Guestrin 2016), available in R through package xgboost. 

The main idea of adaBoost is to use weights on the samples in the training set. Initially, these weights are all equal, but during the iterations the weights of incorrectly predicted samples increase. In adaBoost, which stands for adaptive boosting, the changes in the weight of object i is given by Dt+1(i) = Dt(i) Zt × e−αt if correct eαt if incorrect (9.21) where Zt is a suitable normalization factor, and αt is given by αt = 0.5 ln 1 − t t (9.22) with t the error rate of the model at iteration t. In prediction, the final classification result is given by the weighted average of the T predictions during the iterations, with the weights given by the α values. 

 The algorithm itself is very simple and easily implemented. The only parameter that needs to be set in an application of boosting is the maximal number of iterations. A number that is too large would potentially lead to overfitting, although in many cases it has been observed that overfitting does not occur (see, e.g., references in Freund and Schapire 1997). 

Boosting trees in R is available in package ada (Michailides et al Lipo2000 Transfection Reagent. 2006), which directly follows the algorithms described in reference (Friedman et al. 2000). Let us revisit the prostate example, also tackled with SVMs (Sect. 7.4.1): > prost.ada <- ada(type ˜ ., data = prost.df, subset = prost.odd) > prost.adapred <- + predict(prost.ada, newdata = prost.df[prost.even, ]) > table(prost.type[prost.even], prost.adapred) prost.adapred control pca control 30 10 pca 3 81 The result is equal to the one obtained with bagging. The development of the errors in training and test sets can be visualized using the default plot command. In this case, we should add the test set to the ada object first5: > prost.ada <- addtest(prost.ada, + prost.df[prost.even, ], + prost.type[prost.even]) > plot(prost.ada, test = TRUE) 5We could have added the test set data to the original call to ada as well—see the manual page.

Comments

Popular posts from this blog

The cells were counterstained with DAPI for Apoptosis rates were determined by Annexin V-/PI double staining 10 min

Dysbindin promotes PDAC metastasis and invasion in vitro and in vivo

Cancer Letters 479 (2020) 61–70 Contents lists available at Science Direct