Comparison of Methods and Software for Modeling Nonlinear Dependencies: A Fraud Application

Richard Derrig~Louise Francis, United-States

In recent years a number of approaches for modeling data containing nonlinear and other complex dependencies have appeared in the literature. These procedures include classification and regression trees, neural networks, regression splines and naïve Bayes. Viaene et al (2002) compared several of these procedures, as well as a classical linear model, logistic regression, for prediction accuracy on a small fixed data set of fraud indicators or “red flags”. They found simple logistic regression did as well at predicting expert opinion as the more sophisticated procedures. In this paper we will introduce some available common data mining approaches and explain how they are used to model nonlinear dependencies in insurance claim data. We investigate the relative performance of several software products in predicting the key claim variables for the decision to investigate for excessive and/or fraudulent practices in a large claim database. Among the software programs we will investigate are MARS, CART, S-PLUS, TREENET and Insightful Miner. The data used for this analysis are the approximately 500,000 auto injury claims reported to the Detailed Claim Database (DCD) of the Automobile Insurers Bureau of Massachusetts from accident years 1995 through 1997. The decision to order an independent medical examination or a special investigation for fraud are the modeling targets. We find that the methods all provide some predictive value or lift from the available DCD variables with significant differences among the methods and the two targets. All modeling outcomes are compared to logistic regression as in Viaene et al. with some model/software combinations doing significantly better than the logistic model.
Date: 29 May - Time: 14:30 to 16:00 - Room: 251
Theme: 1.A. Stochastic dependence