Stability of bagged decision trees

Yves Granvalet

IDIAP, Switzerlnad

Bagging is a simple ensemble technique, where an estimator is produced by averaging predictors fitted to bootstrap samples. Bagged decision trees almost consistently improve on the original predictor, and it is widely believed that bagging is effective thanks to the variance reduction stemming from averaging predictors. We provide here a counter-example, and we give experimental evidence supporting that bagging stabilizes prediction by equalizing the influence of training examples. The influence of near-boundary points is increased when they participate to the definition of the split location of any node. Highly influential examples, which have a high weight in deciding the split direction near the root node, are down-weighted due to their absence in some of the bootstrap samples. Recent analyses relating stability to generalization error are empirically tested, to see if they account for baggingÕs success. We quantify hypothesis stability on several benchmarks, and conclude that the influence equalization process improves significantly the stability, which in turn may increase the generalization performances. Our experiments furthermore suggest that the bounds on generalization performances based on the stability analysis are quite tight for unbagged and bagged decision trees.