Resumen
Having constructed a data-based estimation rule,
perhaps a logistic regression or a classification tree, the statistician would
like to know its performance as a predictor of future cases. There are two main
theories concerning prediction error: (1) penalty methods such as Cp
, Akaike's information
criterion, and Stein's unbiased risk estimate that depend on the covariance
between data points and their corresponding predictions; and (2)
cross-validation and related nonparametric bootstrap techniques. This article
concerns the connection between the two theories. A Rao-Blackwell
type of relation is derived in which nonparametric methods such as
cross-validation are seen to be randomized versions of their covariance penalty
counterparts. The model-based penalty methods offer substantially better
accuracy, assuming that the model is believable.
|