The value of contingency planning
When it all nearly goes wrong!
It turned out alright in the end, with Alex Yee overhauling Hayden Wilde in the closing stages of the run to take the gold medal in the Olympic Triathlon in Paris. But 24 hours earlier it all looked so different, with heavy rainfall leaving the water quality in the Seine too low to hold the competition. With the schedule already having been pushed back, organisers were contemplating the unthinkable and looking at reducing the triathlon to a run-bike-run duathlon.
With the Seine having been too polluted to swim in for the last 100 years and rain a not so unusual part of the Parisian summer forecast, it feels as though the events of the last week – rain leading to unsafe swimming conditions – should have been considered as well within the boundaries of a “reasonable worst case” scenario. The lack of a well-prepared contingency plan came perilously close to causing an outbreak of red faces!
What makes a good Plan B?
Contingency planning has long been recognised as a critical part of risk management within the delivery of IT change and systems delivery projects. The Project Management Institute categorises contingency planning as “defining action steps to be taken if an identified risk event should occur”. These action steps should be well defined and with a clear trigger for executing them in the event of a risk manifesting – the last thing that you want during a moment of crisis is a lack of clarity over who should be doing what and whether you should be enacting your contingency or not.
But how do you decide which risks are worth mitigating and what makes a viable contingency? There are a set of existential risks that, unless you are a national government, it is unlikely to make sense to consider, e.g. widespread natural disaster, war, or the zombie apocalypse – the common-sense test here is whether anyone will care about the success of your system upgrade project in the event that one of these scenarios occur.
Of course, this calculation can change over time – until a couple of weeks ago many enterprises may have considered a widespread global outage of major cloud services as sufficiently unlikely that a contingency solution was unnecessary…
During the Covid pandemic we were all introduced to the concept of Reasonable Worst Case Scenario planning – a generic representation of a challenging yet plausible manifestation of a risk – and this is a good starting point for developing a well-rounded set of contingency responses for a systems project.
Experience vs prediction
We have a strong bias to believing that the future will be like the past – our collective experiences of what has gone well and what has gone badly in previous projects strongly informs our view of what might go wrong in the future and the preventative or reactive steps that could be taken in mitigation. For these risks we will often have a good view of the potential impact and therefore what a proportionate mitigation strategy and set of costs might be.
But our contingency planning should not be limited to these events but should also include an analysis of more extreme possibilities – worst-credible-case scenarios. In doing so we are attempting to minimise the potential of surprises and enabling a balancing of the costs of mitigation against the likelihood of an event occurring – once everything is on the table decisions can be made about the amount of time and investment that will be made in exploring each potential risk and the level of mitigation that will be applied.
Where the rubber hits the road
One hopes that contingency plans will never be executed, but most people who have been involved in large scale, complex system delivery will recognise the moment over a go-live weekend where one of the team tells you that the cutover plan hasn’t worked as expected!
At this point you don’t have the luxury of time – the business may well have signed off a limited outage window, or you may already be incurring financial or reputational damage.
A good contingency plan should enable quick and clear decision making:
Has the threshold been met to trigger its execution?
Who needs to approve the execution?
How long will the contingency take to execute?
Is the contingency “rolling back” to a previous good state, or is it an alternative path to delivering the desired systems change?
What impact will the contingency plan have on stakeholder communities and what communications will be distributed?
So if you are in the middle of an IT change programme or are about to initiate one, why not spend some time reviewing your risk management approach and contingency plans. Don’t be left without a river to swim in!