|
Resumen
Using 1998 and 1999 singleton birth data of the State of Florida, we study the stability of classification trees. Tree stability depends on both the learning algorithm and the specific data set. In this study, test samples are used in statistical learning to evaluate both stability and predictive performance. We also use the resampling technique bootstrap, which can be regarded as data self-perturbation, to evaluate the sensitivity of the modeling algorithm with respect to the specific data set. We demonstrate that the selection of the cost function plays an important role in stability. In particular, classifiers with equal misclassification costs and equal priors are less stable compared to those with equal misclassification costs and equal priors are less stable compared to those with unequal misclassification costs and equal priors.
|
|
Resumen
Consider a sequence of dependent random variables X1,X2,…,Xn, where X1 has distribution F (or probability measure P), and the distribution of Xi+1 given X1,…,Xi and other covariates and environmental factors depends on F and the previous data, i=1,…,n-1. General repair models give rise to such random variables as the failure times of an item subject to repair. There exist nonparametric non-Bayes methods of estimating F in the literature, for instance, Whitaker and Samaniego [1989. Estimating the reliability of systems subject to imperfect repair. J. Amer. Statist. Assoc. 84, 301–309], Hollander et al. [1992. Nonparametric methods for imperfect repair models. Ann. Statist. 20, 879–896] and Dorado et al. [1997. Nonparametric estimation for a general repair model. Ann. Statist. 25, 1140–1160], etc. Typically these methods apply only to special repair models and also require repair data on N independent items until exactly only one item is left awaiting a “perfect repair”.
In this paper, we define a general model for dependent random variables taking values in a general space, which includes most of the repair models in the literature. We describe nonparametric Bayesian methods to estimate P, without making any assumptions on when we stop collecting data. To do this we introduce a new class of priors called partition-based (PB) priors and show that it is a conjugate class to a large class of our general repair models. We also define a subclass of such priors called partition-based Dirichlet (PBD) priors which also forms a conjugate family of priors. For a special case of the repair model called the aging repair model, we obtain an easily computable Bayes estimate of P under a Dirichlet prior. The Bayes estimates are smoother than Whitaker and Samaniego non-Bayes estimates. Graphical comparisons show that the Bayes and non-Bayes estimates tend to be close. |