Estimating Prediction Error in Microarray Classification: Modifications on the .632+ Bootstrap When n < p
Jiang, W. and Chen, B. E.
Canadian Journal of Statistics.
We are interested in estimating prediction error for a classification model built on high dimensional genomic data when the number of genes
(p) greatly exceeds the number of subjects (n). We examine a distance argument supporting the conventional 0.632+ bootstrap proposed for
the n greater than p scenario, modify it for the n less than p situation and develop learning curves to describe how the true prediction error varies with the
number of subjects in the training set. The curves are then applied to define adjusted resampling estimates for the prediction error in
order to achieve a balance in terms of bias and variability. The adjusted resampling methods are proposed as counterparts of the 0.632+
bootstrap when equation image, and are found to improve on the 0.632+ bootstrap and other existing methods in the microarray study
scenario when the sample size is small and there is some level of differential expression.