Inicio Nosotros Búsquedas
Buscar en nuestra Base de Datos:     
Título: =Variable selection in Data Mining: Building a Predictive Model for Brankruptcy
Sólo un registro cumplió la condición especificada en la base de información BIBCYT.
Publicación seriada
Referencias AnalíticasReferencias Analíticas
Autor: Foster, Dean P. ; Stine, Robert A.
Título: Variable selection in Data Mining: Building a Predictive Model for Brankruptcy
Páginas/Colación: pp.303 - 313
Url: Ir a http://lysander.asa.catchword.org/vl=1993585/cl=87/nw=1/rpsv/cw/vhosts/asa/01621459/v99n466/contp1-1.htmhttp://lysander.asa.catchword.org/vl=1993585/cl=87/nw=1/rpsv/cw/vhosts/asa/01621459/v99n466/contp1-1.htm
Journal of the American Statistical Association Vol. 99, no. 466 June 2004
Información de existenciaInformación de existencia

Resumen
We predict the onset of personal bankruptcy using least squares regression. Although well publicized, only 2,244 bankruptcies occur in our dataset of 2.9 million months of credit-card activity. We use stepwise selection to find predictors of these from a mix of payment history, debt load, demographics, and their interactions. This combination of rare responses and over 67,000 possible predictors leads to a challenging modeling question: How does one separate coincidental from useful predictors? We show that three modifications turn stepwise regression into an effective methodology for predicting bankruptcy. Our version of stepwise regression (1) organizes calculations to accommodate interactions, (2) exploits modern decision theoretic criteria to choose predictors, and (3) conservatively estimates p-values to handle sparse data and a binary response. Omitting any one of these leads to poor performance. A final step in our procedure calibrates regression predictions. With these modifications, stepwise regression predicts bankruptcy as well as, if not better than, recently developed data-mining tools. When sorted, the largest 14,000 resulting predictions hold 1,000 of the 1,800 bankruptcies hidden in a validation sample of 2.3 million observations. If the cost of missing a bankruptcy is 200 times that of a false positive, our predictions incur less than 2/3 of the costs of classification errors produced by the tree-based classifier C4.5.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

UCLA - Biblioteca de Ciencias y Tecnologia Felix Morales Bueno

Generados por el servidor 'bibcyt.ucla.edu.ve' (3.139.238.76)
Adaptive Server Anywhere (07.00.0000)
ODBC
Sesión="" Sesión anterior=""
ejecutando Back-end Alejandría BE 7.0.7b0 ** * *
3.139.238.76 (NTM) bajo el ambiente Apache/2.2.4 (Win32) PHP/5.2.2.
usando una conexión ODBC (RowCount) al manejador de bases de datos..
Versión de la base de información BIBCYT: 7.0.0 (con listas invertidas [2.0])

Cliente: 3.139.238.76
Salida con Javascript


** Back-end Alejandría BE 7.0.7b0 *