Great Expectations

When fitting statistical models, a number of features are commonly assumed by users. Chief amongst these assumptions is that the expected number of events according to the model will equal the actual number in the data. This strikes most people as a thoroughly reasonable expectation. Reasonable, but often wrong.

For example, in the field of Generalised Linear Models (GLMs), the user has a choice of so-called link functions to specify the model. For binomial data, the default is the canonical link, the logit, which gives the following function for the rate of mortality, qx:

qx = exp(α + βx) / (1 + exp(α + βx))

This is known to actuaries as a simplied version of Perks Law when applied to mortality data. However, there are several other choices of link function. The interesting thing about these link functions is that only one of them guarantees that the GLM produces the same number of expected deaths as were actually observed. The following table summarises the results for five alternative GLMs fitted to the same data with 4739 deaths:

GLM link function Expected deaths
Logit 4739
Log 4733.503
Cauchy 4797.986
Complementary log-log 4738.649
Probit 4741.33

We can see that only the logit link produces the same number of expected deaths as in the actual data. What's going on with the rest? Why don't they all produce the same number of expected deaths? The answer lies with the fact that GLMs use the log-likelihood function to determine maximum-likelihood estimates (MLEs) for the model parameters. Only some forms of the log-likelihood result in MLEs which coincidentally result in the same number of expected events.

If you want to experiment with this yourself, you can use the file on the right to reproduce the results in R, a freely available statistical package which fits a variety of GLMs

Written by: Stephen Richards
Publication Date:
Last Updated:

If you want to experiment with this yourself, you can use this file to reproduce the results in R

expectations.r

Previous posts

Confounding compounding

Earlier posts discussed the importance of deduplication in annuity portfolios and pension schemes and some of the issues around the deduplication of names, specifically the use of double metaphone to look through common variant spellings of the surname or family name.
Tags: Filter information matrix by tag: deduplication, Filter information matrix by tag: duplicates

A likely story

The foundation for most modern statistical inference is the log-likelihood function.  By maximising the value of this function, we find the maximum-likelihood estimate (MLE) for a given parameter, i.e. the most likely value given the model and data.  For models with more than one parameter, we find the set of values which jointly maximise the log-likelihood.

Tags: Filter information matrix by tag: Makeham, Filter information matrix by tag: log-likelihood

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.