Stopping the clock on the Poisson process

"The true nature of the Poisson distribution will become apparent only in connection with the theory of stochastic processes"

Feller (1950)

In a previous blog, we showed how survival data lead inexorably toward a Poisson-like likelihood. This explains the common assumption that if we observe Dx deaths among n individuals, given Ecx person-years exposed-to-risk, and we assume a constant hazard rate μ, then Dx is a Poisson random variable with parameter Ecxμ. But then Pr. That is, an impossible event has non-zero probability, even if it is negligibly small. What is going on?

Physicists are ever alert to the tiniest difference between a model's predictions and empirical reality. Likewise, the tiniest non-zero probability of an impossible event ought to invite us to dig deeper. Any puzzle about aggregated data is often clarified by looking at individual data. Our most basic assumption is that, if that the hazard rate at age x is \mu, then the probability that a person alive at age x will die in time dt is:

\Pr[{\rm Death\ in\ }dt|{\rm Alive\ at\ }x]=\mu dt+o(dt)\qquad (1)

(see also equation (4) in Stephen's earlier blog.) This is so fundamental that it is the equivalent, if you like, of the physicist's model of the electron. It is also the fundamental assumption underlying a Poisson process with parameter \mu, if we replace 'death' with 'the process jumps'. For i=1,\ldots,n let \tilde{N}_i(t) be a Poisson process with parameter \mu. Suppose all the \tilde{N}_i(t) are mutually independent, and define \tilde{N}(t)=\sum_{i=1}^n \tilde{N}_i(t) (also a Poisson process). Then the following are true for any time t\ge 0 chosen in a non-random way:

  • \tilde{N}_i(t) is a Poisson random variable with parameter t \mu, a non-random quantity.
  • \tilde{N}_i(t) can take any non-negative integer value.
  • The total time exposed-to-risk is nt, a non-random quantity.
  • The total number of jumps, \tilde{N}(t), is a Poisson random variable with parameter nt\mu, a non-random quantity. So \Pr[\tilde{N}(t)>n]>0; compare this with \Pr[D_x>n]> 0 above.

Note the emphasis given to non-random in the above. Replace the time t with a random variable and the resulting process and associated random variables are no longer Poisson.

To make the link between these Poisson processes and a survival model, we would like to define \tilde{N}_i(t) to be the number of times the i^{\rm th} of n individuals has died by time t, but to prevent each process from jumping more than once. A neat way to do this is to define an indicator process, Y_i(t), as follows:

Y_i(t) = \begin{cases} \mbox{1 if the \(i^{\rm th}\) individual is alive just before time \(t\)} \qquad \mbox{(2)} \\ \mbox{0 otherwise} \end{cases}

and to replace the constant hazard rate \mu with the hazard process Y_i(t)\mu. This has the following effect:

  • While the i^{\rm th} individual is alive, the hazard rate is 'switched on' and the individual is at risk of dying.
  • As soon as the i^{\rm th} individual dies, the hazard rate is 'switched off' and they are no longer at risk of dying (again).
  • The resulting process, denoted by N_i(t), can take only the values 0 or 1. That makes it a suitable model for the mortality of an individual.
  • N_i(t) is not a Poisson process, it is a Poisson process 'stopped' after the first jump, a different thing altogether. As a result the total number of observed deaths, N(t) = \sum_{i=1}^n N_i(t), cannot exceed n, and is not a Poisson random variable.
  • The time spent exposed-to-risk by the i^{\rm th} individual up to time t is then E_i^c = \int_0^t Y_i(s)ds. This is a random variable, as is the total exposed-to-risk, E_x^c. However, if we treat E_x^c as non-random, as we did at the start of this blog, then D_x does have a Poisson distribution, but the cost of this assumption is that \Pr[D_x>n]>0.

The mean time between jumps of a Poisson process with parameter \mu is 1/\mu; if \mu is very small, then the probability of any of the original \tilde{N}_i(t) jumping more than once is also very small. This is why the Poisson distribution is usually a good approximation for the number of deaths D_x in a survival model. But, like the physicist, we cannot ignore that tiny gap between model and reality. Further progress depends on studying the 'stopped' processes N_i(t), not the Poisson processes \tilde{N}_i(t), and certainly not the aggregate Poisson process \tilde{N}(t).

The indicator process Y_i(t) is important in its own right, and it appears in several places in our forthcoming book Modelling Mortality with Actuarial Applications. As just one example, if we tweak definition (2) slightly, so that Y_i(t) = 1 if the i^{\rm th} individual is alive and under observation just before time t, we have included both left-truncation and right-censoring in the model.

References:

Feller, W. (1950). An Introduction to Probability and its Applications, third edition, Vol. 1. John Wiley and Sons, New York.

Macdonald, A.S., Richards. S.J. and Currie, I.D. Modelling Mortality with Actuarial Applications. Cambridge University Press (forthcoming).

Written by: Angus Macdonald
Publication Date:
Last Updated:

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.