Rewriting the rulebook

It is an unfortunate fact of life that through time every portfolio will acquire data artefacts that make risk analysis trickier. Policyholder duplication is one example of this and archival of claims breaking the time-series is another. Data errors introduced by servicing are perhaps the most commonplace of all, and this posting describes how validation rules can protect the modelling stage from such errors.

The first class of issue is the generic data corruption, termed generic because these problems occur with the same characteristics in more or less every portfolio you work with. Generic validation rules are critical here, screening out such problems before modelling commences. These issues include invalid or inconsistent birth-, commencement- or end-dates, corrupt gender codes, missing or invalid benefit amounts along with a variety of other elephant-traps. 

The second class of issue is the portfolio-specific corruption, and these often masquerade as valid data. Such issues occur most commonly where a servicing team has created conventions using special marker values inside standard fields (like postcode, surname, or birthdate) to record data conditions the administration system does not capture adequately any other way.

You can trap portfolio-specific issues by looking at common occurrences. A portfolio with 20,000 members probably shouldn't contain 2000 people with the same birthdate — 1900-01-01 is a popular example. And finding 150 members all with a surname of "TESTDATA" seems to be telling us something as well! Attribute misuse like this usually precludes modelling with the affected data.

Where common occurrences like these suggest portfolio-specific screening rules, then it would be ideal to add these to an automatic rule-set for use whenever related portfolios are handled. After all, many models are revisited at least annually, and others even more frequently than that. To assist with this aspect of validation, Longevitas now allows you to create rulesets for the validation of specific datafiles. Here is a small ruleset in CSV format that would catch the issues we discussed in the previous paragraph:

Sample Ruleset

In Longevitas, rules are Java-like Boolean expressions that return a True or False. When applied to a given record, a True result allows the data to remain usable, but False causes it to be rejected and excluded from modelling. Such rulesets allow you to automate the validation and removal of portfolio-specific data problems. By always operating after comprehensive generic screening rules, the rules you create only have to worry about very specific conditions. This makes them much simpler to write.

No discussion of validation rules is complete without considering the validation of rules. To that end Longevitas checks each ruleset against randomly generated data when it is first introduced to the system and portfolio data will only be tested against rules that prove valid. In addition, you can easily inspect all portfolio data rejected by any rule to make sure everything is behaving as you expect. After all, if this area teaches us anything, it's that you can't be too careful!

Written by: Gavin Ritchie
Publication Date:
Last Updated:

Validation in Longevitas

Longevitas validates all uploaded data and provides analysis of "common occurences" to let you decide if ostensibly valid values (like birthdate "1900-01-01" or postcode "SPOUSE") are being abused to mark some other data condition. You can enhance the generic validation by creating sets of bespoke rules to screen data any way you like. Bespoke rules can also be used to enhance the data by creating your own factors.

Extracts of rejected data (with rejection reason appended) are always available to allow further investigation of data issues at source.

Previous posts

A model point

The current issue of The Actuary magazine carries an article on the selection of model points.  Model points were widely used by actuaries in the 1980s and 1990s, when computing power was insufficient to perform complex policy calculations on every policy in a reasonable time-frame.  The idea is to select a much smaller number of sample policies, whose behaviour in aggregate mimics that of the portfolio overall.

Tags: Filter information matrix by tag: model points, Filter information matrix by tag: simulation

Forward thinking

A forward contract is an agreement between two parties to buy or sell an asset at a specified price at a date in the future. It is typically a private arrangement used by one or both parties to manage their risk, or where one party wishes to speculate.
Tags: Filter information matrix by tag: survivor forward, Filter information matrix by tag: S-forward, Filter information matrix by tag: survival curve

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.