What's in a (file)name?

The upcoming EU General Data Protection Regulation places focus on the potential for personal data exposures to create a risk to the rights of natural persons. The best way to reduce such risk is to minimise the ability to identify individuals from the data you use in your analysis. Thankfully, not all data used for modelling runs the risk of identifying individuals. Group data, such as that used by Longevitas group count survival models, or the grouped death and exposure formats used within the Projections Toolkit service, are not personal data under the terms of the GDPR. Such data stands no risk of identifying individuals. However, individual data used within mortalityrating.com, and within Longevitas individual level survival models may, depending on content, be classified as personal data.

There are various technical measures adopted within software to minimise the risk individuals can be identified. Such mechanisms, including encryptionmulti-factor authentication, and pseudonymisation are all valuable. A more fundamental technique to guard against personal data risk is to remove unnecessary data elements to reduce (preferably to zero!) the number of ways and individual might be traced from the data shared and processed. This might be thought of as a variation on the popular security concept of Need to Know. If a calculation, such as a rating or a survival model, doesn't require a piece of knowledge, then our goal should be to remove that knowledge from the process. How can you avoid combining postcodes and dates of birth? How can you avoid combining names and sensitive codes? Questions such as these were the focus of our previous blog on the latest release of mortalityrating.com, and the Transform on Download feature available since February 2016.

However we should not forget aspects that are seemingly more mundane. What knowledge is encoded in uploaded file names and file descriptions? Clearly if we use publicly recognisable references for pension schemes or annuity portfolios, that piece of context may, when combined with other fields in the dataset, make it easier to identify individuals. Identifying the dataset member who is oldest, youngest or has the highest or lowest pension may be made easier by knowing the source of their annuity, and is certainly made easier with knowledge of the organisation paying their defined-benefit pension. For this reason our latest GDPR updates focus on such details in two ways:

  1. On file upload the system will propose a random, neutral description that can be retained or overtyped.
  2. The system will discard all knowledge of the original file name and rely only upon the user-supplied description.

These changes are already in place for the latest releases of mortalityrating.comLongevitas and the Projections Toolkit. Contact us if you need further information.

Written by: Gavin Ritchie
Publication Date:
Last Updated:

GDPR

Not all Longevitas services work at the individual level or process potentially personal data. However, the software incorporates a number of features and techniques to minimise the need for personal data even within individual modelling and rating calculations. Key features like "Transform on Download" and "Postcode Proxies" can anonymise postcodes, names and dates of birth. This retains the benefits of modelling individual lifetimes, but without uploading records that can identify individuals. And of course, our services operate strong authentication and encryption of uploaded data along with a variety of other technical measures. 

Previous posts

Functions of a random variable

Assume we have a random variable, \(X\), with expected value \(\eta\) and variance \(\sigma^2\). Often we find ourselves wanting to know the expected value and variance of a function of that random variable, \(f(X)\). Fortunately there are some workable approximations involving only \(\eta\), \(\sigma^2\) and the derivatives of \(f\). In both cases we make use of a Taylor-series expansion of \(f(X)\) around \(\eta\):

\[f(X)=\sum_{n=0}^\infty \frac{f^{(n)}(\eta)}{n!}(X-\eta)^n\]

Tags: Filter information matrix by tag: GLM, Filter information matrix by tag: log link, Filter information matrix by tag: logit link

The Karma of Kaplan-Meier

Our new book, Modelling Mortality with Actuarial Applications, describes several non-parametric estimators of two quantities:

Tags: Filter information matrix by tag: Kaplan-Meier, Filter information matrix by tag: Nelson-Aalen, Filter information matrix by tag: Fleming-Harrington, Filter information matrix by tag: product integral

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.