Spotting hidden data-quality issues

The growing market for longevity risk-transfer means that takers of the risk are keenly interested in the mortality characteristics of the portfolio concerned. The first thing requested by the risk-taker is therefore detailed data on the portfolio's recent mortality experience. This is ideally data extracted on a policy-by-policy basis. Once received, the careful analyst checks that the data are sound. Failure to spot data problems at the start will at best waste time, and at worst lead to concluding a deal on bad terms. There is therefore tremendous value in simple checks of data quality.

We saw in an earlier post how survival models can reveal data problems. However, these issues can sometimes be spotted even more easily using the estimator proposed by Kaplan & Meier (1958). As an example of this, consider Figures 1–4, which plot the Kaplan-Meier functions for males and females in a number of different European portfolios we have analysed in recent years. Figures 1–3 show that females show a clearly higher survival probability at all ages, irrespective of whether the portfolio is Dutch, French or German. This emphasizes the wide applicability of the Kaplan-Meier estimator.

However, Figure 4 suggests that there is something wrong with the data in the UK annuity portfolio. This is not because there is something special about either the UK or annuities, because Kaplan-Meier functions for other UK annuity portfolios look just like the Dutch, French and German portfolios in Figures 1–3. In our experience, the sort of pattern exhibited in Figure 4 is sometimes a result of data corruptions relating to the processing of benefits for a surviving spouse.

Figure 1. Kaplan-Meier function for Dutch private-sector occupational pension scheme. Source: Own calculations.

Figure 2. Kaplan-Meier function for French public-sector top-up pension scheme. Source: Own calculations.

Figure 3. Kaplan-Meier function for German public-sector top-up pension scheme. Source: Richards, Kaufhold and Rosenbusch (2013).

Figure 4. Kaplan-Meier function for UK annuity portfolio. Source: Own calculations.

One particularly important aspect of Figure 4 is that this kind of data problem cannot be detected from a simple A/E comparison against a standard table. To spot this kind of issue you must either plot the Kaplan-Meier function or else fit a statistical model and observe the suspicious parameter values for gender differentials. In practical day-to-day work, however, we find that the graphical nature of the Kaplan-Meier check means it is immediately understood by non-statisticians.

UPDATE on 2024-11-19: This later blog gives some R code to calculate the Kaplan-Meier estimate for left-truncated ages. It has an example data file for a UK pension scheme.

References:

Kaplan, E. L. and Meier, P. (1958) Nonparametric estimation from incomplete observations, Journal of the American Statistical Association 53, 457–481.

Richards, S. J., Kaufhold, K. and Rosenbusch, S. (2013) Creating portfolio-specific mortality tables: a case study, European Actuarial Journal, 3, 295–319 (2013). doi:10.1007/s13385-013-0076-6

Written by: Stephen Richards

Publication Date: 03 November 2013

Last Updated: 19 November 2024

Services: Survival Modelling

Tags: data validation, Kaplan-Meier

Kaplan-Meier in Longevitas

Longevitas users can choose whether or not to generate Kaplan-Meier curves with each model fitted. The default option is to have Kaplan-Meier curves generated, but it can be controlled in the Advanced Options section of the modelling screen. The Kaplan-Meier curves themselves can be plotted in the Curves tab of the model report.

View all posts

Spotting hidden data-quality issues

Kaplan-Meier in Longevitas

Add new comment

Restricted HTML