In: American Statistical Association, eds. Proceedings of the Section on Statistics in Epidemiology. Alexandria, VA, USA: American Statistical Association:42-49.
Department of General Internal Medicine, Boston University School of Medicine, Boston, MA, USA
A common problem in statistical modeling is determining how well models “work.” For predicting a continuous outcome, the traditional R2 is often reported. But, especially when the largest achievable R2s are small and implementing a more powerful model may be costly, potential users need to assess the practical implications of small differences in R2s.
In this paper we explore alternative measures and graphical methods for describing and comparing models that predict expected costs of people who sign up for health plans (such as HMOs). The intended application is in calculating payments to plans that adjust for the health care costs of the particular people they enroll. While models based on age and sex can reliably distinguish population subgroups whose costs differ by as much as 10 to 1 (for, say, people over 65 versus 10 year olds), R2s for age-sex models rarely exceed 0.02. Health-based predictions explain more of the variation, detect subgroups whose costs differ more, and have much larger, but still small, R2s.
We develop an exemplary range of numerical summaries and graphical displays which can be used to create rich pictures of model performance. These ideas are useful at the most basic level of understanding the strengths and weaknesses of any imperfect model. They may be particularly important when choosing which of two rather similar models should be used for a particular purpose.