Published online by Cambridge University Press: 01 March 2019
In insurance underwriting, misrepresentation represents the type of insurance fraud when an applicant purposely makes a false statement on a risk factor that may lower his or her cost of insurance. Under the insurance ratemaking context, we propose to use the expectation-maximization (EM) algorithm to perform maximum likelihood estimation of the regression effects and the prevalence of misrepresentation for the misrepresentation model proposed by Xia and Gustafson [(2016) The Canadian Journal of Statistics, 44, 198–218]. For applying the EM algorithm, the unobserved status of misrepresentation is treated as a latent variable in the complete-data likelihood function. We derive the iterative formulas for the EM algorithm and obtain the analytical form of the Fisher information matrix for frequentist inference on the parameters of interest for lognormal losses. We implement the algorithm and demonstrate that valid inference can be obtained on the risk effect despite the unobserved status of misrepresentation. Applying the proposed algorithm, we perform a loss severity analysis with the Medical Expenditure Panel Survey data. The analysis reveals not only the potential impact misrepresentation may have on the risk effect but also statistical evidence on the presence of misrepresentation in the self-reported insurance status.