Boundless confidence?
We've talked repeatedly about a key advantage of statistical models over deterministic ones — specifically, that they provide confidence intervals in addition to a best estimate. These bounds allow us to decide how certain we can be about predictions made by the model (preferably before, say, publishing any conclusions in the national press). Confidence intervals, sadly, seldom influence the headlines. In the past few weeks, we find reports of a life expectancy of 105 for females in part of Cramlington, Northumberland. And more recently there has been excitement around male lifespan outstripping female, most notably by 13 years in Broadfield in Crawley, West Sussex. See related reports here and here.
Carl Sagan popularised the phrase "extraordinary claims require extraordinary evidence", and it is certainly legitimate to ask whether the interpretations of the evidence underpinning these media reports would pass his baloney detection kit. Since none of the journalists involved cites anything more specific than "Public Health England", the exact dataset used isn't obvious. However, a plausible source is PHE's Local Health site. The most granular data available appears to be defined at Ward (2011) level, and from here we can extract the population indicators relevant to some of the claims, as in Table 1.
Ward | Female LE |
Female 95% Bounds |
Male LE |
Male 95% Bounds |
Pop. |
Pop. |
---|---|---|---|---|---|---|
Cramlington North ED | 105.0 | 85.3–124.7 | 90.5 | 77.1–103.9 | 5,392 | <27 |
Broadfield North Ward | 83.3 | 79.3–87.4 | 87.9 | 77.6–98.1 | 6,798 | <28 |
Knightsbridge and Belgravia | 93.7 | 89.9–97.6 | 97.7 | 91.8–103.6 | 9,346 | <169 |
So what does Table 1 reveal? For both Cramlington and Broadfield we appear to have life-expectancy predictions based on small numbers of long-lived individuals. Furthermore, in Cramlington our confidence bounds around female lifespan are shockingly wide — we are 95% confident the true life expectancy will lie in the near-forty-year interval from 85 to 125. Meanwhile, in Broadfield the data supports males living only 4.6 years longer than females, not 13 years as claimed. Since this is the only ward in Crawley where the gender gap is reversed — and the largest positive male-female difference in the entire data set — it does appear we have a clear inconsistency, perhaps even a data restatement. Equally troubling, the Broadfield confidence intervals for male life expectancy contain the entire range for female life expectancy. In other words, our 95% bounds permit that the male life expectancy could be either lower or higher than that of females, irrespective of the stress we place on the female estimate. The Knightsbridge ward exhibits a larger elderly population, but still a fairly pronounced overlap in the confidence intervals (which are again wider for males). Taken together with Broadfield, these headline-grabbing outliers constitute too weak a signal to revise our expectations of the longevity gender effect.
A final point — the media reports talk of the reversal of the gender effect in 100 areas, which sounds impressive. In the current ward-level data we find the phenomenon in only 86 of 7,689 wards, with the projected difference being less than one year in the majority of these cases. We have no data to analyse in a further 18 wards. Given differences in some of the reported area names and the fact that Voice of Russia (yes, the story really has travelled) reports reversal in 100 postcodes, we have to acknowledge the possibility that these reports are based on data for even smaller geographical areas. If that is the case then the modelling will be drawing upon smaller populations with a commensurate widening of the confidence bounds — and that 40 year span in Cramlington seems already more than wide enough to draw our conclusions!
Previous posts
Effective dimension
Actuaries often need to smooth mortality rates. Gompertz (1825) smoothed mortality rates by age and his famous law was a landmark in this area. Figure 1 shows the Gompertz model fitted to CMI assured lives data for ages 20–90 in the year 2002. The Gompertz Law usually breaks down below about age 40 and a more general smooth curve would be appropriate. However, a more general smooth curve would obviously require more parameters than the two for the simple Gompertz model.
Add new comment