Estimating Prediction Error in Microarray Classification: Modifications on the .632+ Bootstrap When n < p

Jiang, W. and Chen, B. E.
Canadian Journal of Statistics.

We are interested in estimating prediction error for a classification model built on high dimensional genomic data when the number of genes (p) greatly exceeds the number of subjects (n). We examine a distance argument supporting the conventional 0.632+ bootstrap proposed for the n greater than p scenario, modify it for the n less than p situation and develop learning curves to describe how the true prediction error varies with the number of subjects in the training set. The curves are then applied to define adjusted resampling estimates for the prediction error in order to achieve a balance in terms of bias and variability. The adjusted resampling methods are proposed as counterparts of the 0.632+ bootstrap when equation image, and are found to improve on the 0.632+ bootstrap and other existing methods in the microarray study scenario when the sample size is small and there is some level of differential expression.

KEY WORDS: Bootstrap; 0.632+ bootstrap; class prediction; cross-validation; feature selection; learning curve; microarray data; prediction error; MSC 2010: Primary 62G09; secondary 62P10

SN:

Estimating Prediction Error in Microarray Classification: Modifications on the .632+ Bootstrap When n < p

Jiang, W. and Chen, B. E. Canadian Journal of Statistics.

Jiang, W. and Chen, B. E.
Canadian Journal of Statistics.