Smooth Models Meet Lumpy Data
Most of the survival models used by actuaries are smooth or piecewise smooth; think of a Gompertz model for the hazard rate, or constant hazard rates at individual ages. When we need a cumulative quantity, we use an integral, as in the cumulative hazard function, \(\Lambda_x(t)\):
\[ \Lambda_x(t) = \int_0^t \mu_{x+s} \, ds. \qquad (1) \]
Mortality data, on the other hand, are nearly always lumpy. A finite number of people, \(d_{x+t_i}\) say, die at a discrete time \(t_i\), one of a set of observed times of death \(t_1, t_2, \ldots, t_r\). Then when we need a cumulative quantity, we use a sum. We saw in a previous blog that if \(l_{x+t_i^-}\) was the number of persons being observed just before time \(t_i\), then the sum:
\[ \hat{\Lambda}_x(t) = \sum_{t_i \le t} \frac{d_{x+t_i}}{l_{x+t_i^-}} \qquad (2) \]
was an empirical estimate of \(\Lambda_x(t)\) called the Nelson-Aalen estimator.
How do we reconcile the smooth model in equation (1) with the lumpy data in equation (2)? With one observation and three simple definitions, we can unify both sums and integrals with gratifying results. The observation is that the quantity \(l_{x+t_i^-}\) is defined not only at the death times \(t_i\), but at all times \(t\) — we always know how many people we are observing, whether anyone dies or not. Thus we have a function \(l_{x+t^-}\) for all \(t \ge 0\). The first definition is this: denote by \(N_x(t)\) the number of persons observed to die up to and including time \(t\). That is:
\[ N_x(t) = \sum_{t_i \le t} d_{x+t_i}. \qquad (3) \]
The process \(N_x(t)\) is called a counting process — it counts the number of deaths up to time \(t\). Next, define \(dN_x(t)\) to be the increment of \(N_x(t)\) at time \(t\):
- At a death time \(t_i\), the increment of \(N_x(t)\) is \(dN_x(t_i) = d_{x+t_i}\).
- At any other time \(t\), the increment of \(N_x(t)\) is \(dN_x(t) = 0\).
Finally, define the integral of the increments \(dN_x(t)\) to be as follows:
\[ \int_0^t dN_x(s) = N_x(t). \qquad (4) \]
Although written in integral form, expression (4) is, by construction, the same as expression (3), namely a finite sum over discrete death times. So where are the gratifying results? Look at expression (2). It is the accumulation of the function \(1/l_{x+t^-}\), defined for all \(t\), but weighted by:
- the number of deaths at each death time, and
- zero everywhere else
in other words, the increments \(dN_x(t)\). Thus it makes sense to write:
\[ \hat{\Lambda}_x(t) = \sum_{t_i \le t} \frac{d_{x+t_i}}{l_{x+t_i^-}} = \sum_{t_i \le t} \frac{dN_x(t_i)}{l_{x+t_i^-}} = \int_0^t \frac{dN_x(s)}{l_{x+s^-}}. \qquad (5) \]
The last expression on the right is still a finite sum, but now — the big idea! — it is expressed as the integral of a function between limits 0 and \(t\). Just like expression (1), in fact. A natural question, since \(\hat{\Lambda}_x(t)\) estimates \(\Lambda_x(t)\), is how good an estimator it might be? Put another way, what are the statistical properties of the discrepancy \(\hat{\Lambda}_x(t) - \Lambda_x(t)\)? We are now able to write:
\begin{eqnarray*}\hat{\Lambda}_x(t) - \Lambda_x(t) & = & \int_0^t \frac{dN_x(s)}{l_{x+s^-}} - \int_0^t \mu_{x+s} \, ds \\ & = & \int_0^t \frac{dN_x(s)}{l_{x+s^-}} - \int_0^t \frac{l_{x+s^-}}{l_{x+s^-}} \, \mu_{x+s} \, ds \\ & = & \int_0^t \frac{1}{l_{x+s^-}} \big( dN_x(s) - (l_{x+s^-}) \mu_{x+s} \, ds \big) \qquad (6)\end{eqnarray*}
provided, of course, that \(l_{x+s^-}\) is never zero on \([0,t]\). This tells us two useful things:
- The properties of the Nelson-Aalen estimator must depend on \(l_{x+s^-}\) everywhere, not just at the death times.
- The process:
\[ N_x(t) - \int_0^t (l_{x+s^-}) \, \mu_{x+s} \, ds \qquad(7) \]
i.e. the integral of the term in parentheses in expression (6), must be playing a fundamental role.
In fact, we have the essence of the modern counting-process approach to survival models, in which expression (7) does indeed play the starring role. The reason is that it reveals the fundamental connection between the smooth model and the lumpy data that underlies all the statistics of a survival model. Chapter 17 of our book describes this approach in more detail.
References:
Macdonald, A. S., Richards, S. J. and Currie, I. D. (2018) Modelling Mortality with Actuarial Applications, Cambridge University Press.
Previous posts
Valuing liabilities with survival models
Regular readers of this blog will know that we are strong advocates of the benefits of modelling mortality in continuous time via survival models. What is less widely appreciated is that a great many financial liabilities can be valued with just two curves, each entirely determined by the force of mortality, \(\mu_{x+t}\), and a discount function, \(v^t\).
More than one kind of information
This collection of blogs is called Information Matrix, and it is named after an important quantity in statistics. If we are fitting a parametric model of the hazard rate, with log-likelihood:
\[ \ell( \alpha_1, \ldots, \alpha_n ) \]
as a function of \(n\) parameters \(\alpha_1, \ldots, \alpha_n\), then the information matrix is the matrix of second-order partial derivatives of \(\ell\). That is, the matrix \({\cal I}\) with \(ij\)th component:
Add new comment