Beyond classical tests: A distributional approach to missingness
M. Serguei Rouzinov, Faculté des sciences sociales et politiques, Université de Lausanne, Lausanne
Missing data occur in almost all surveys, and to handle them correctly, it is essential that we know their type. Missing data are generally divided into three types (or generating mechanisms): missing completely at random, missing at random, and missing not at random. In addition, the generating mechanism can be a mixture of these single mechanisms. The first step to understanding their type generally consists of testing whether or not the missing data are missing completely at random. Several tests have been developed for that purpose, but they are not appropriate for mixed missing data mechanisms and have difficulties when dealing with non-continuous variables. Our approach tests whether the missing data are missing completely at random or missing at random with a regression model and a distribution test. Formally, for a variable with missing data, we compare the predictions of the regression model given for observed data with those given for unobserved data. Our simulation results show that compared to the standard Little test, our method is at least as powerful for single mechanisms but clearly better for mixed mechanisms.