Controlled Model and Software Releases

How can you be confident that your model or software release is doing what it was intended to do and not inadvertently causing damage at the same time? This article is a case study explaining my journey in achieving this in a major financial institution. My intention is to help professionals in the tech and data space achieve higher quality by sharing my experience.

 

Introduction
Like many financial institutions in recent decades, our team found a series of errors – unrelated to each other – in how pricing was implemented and how discounts were applied.

 

When an error is discovered, an incident is raised and rated based on the estimated number of customers affected and the dollar impact. Then begins the analysis to determine the precise population of customers affected and the impact on each customer. The total refunds can be hundreds of thousands, or even in the millions, of dollars. An error in the customer’s favour is considered zero impact – a financial institution typically does not ask a customer to pay a refund due to the financial institution’s error.

 

Causes include the following:

 

Testing gaps where the full testing, documentation and signoff cycle were not followed.

 

‘Implement the fix and close the ticket’ mentality and approach, probably forced by SLAs and lack of procedure to do anything more, which resulted in sometimes minimal testing, such as isolated unit testing in the belief that this was a ‘quick and simple change’. There may have been no regression testing. The quick and simple fix introduced an error, which was discovered years later, after it had affected thousands or tens of thousands of customers.

 

Lack of Legal scrutiny and Risk scrutiny to ensure the logic implemented to apply discounts, cashbacks or rebates aligns with the public statement of the offer. It is possible that the question of who else needs to review this change was never asked. Or, if it was asked, then there was a quick ‘at a glance’ assessment of ‘no legal impact’ or ‘no risk impact’ hence no need for legal or risk review.

 

One reason for lack of Legal and Risk scrutiny may have been that Legal and Risk were not given the mandate or resources to go out and seek the areas of the company that are making changes and asking questions about what review and testing is being done before changes are deployed to production.

 

Case Study
This case study relates to implementing price changes in production so that a pricing model takes customer parameters or ‘rating factors’ as input and generates a price.

 

Our team with every bit of encouragement and support from our leadership updated and refined the price change release process so that it was far more rigorous, while we also discovered and remediated errors made in the distant past.

 

I was in charge of the delivery of price changes into the production system which calculates prices for customers. The financial analysts had a process to create the new rates, explain the drivers for the changes, document their analysis and obtain signoff. The process was transparent and had several internal and external controls.

 

Between the financial analysts and the customer, errors had between introduced, causing unintended outcomes.

 

In our current price changes, I was aware of the risk of errors occurring in my peripheral vision, but I was mostly focussed on ‘getting the price changes done’. At this stage, there had been no known implementation errors while I was in the role, but a small number of past errors had been discovered before I joined the team and were in the process of being investigated for remediation.

 

Then a new error hit. It was in the quotes. The price the customer was quoted was higher than the actual price. This was a blessing because it meant that the customer got a nice surprise when they completed their quote and decided to buy – the actual price was slightly less than they expected. The amount by which the quote was higher was minimal and so unlikely to tip the customer over in their decision to buy, but who can say. But it was a production error. Then there was another error found based on an implementation while I was in the team, which was in the taxes component of the price, and again it affected the quote only.

 

This was like a lightning bolt, and while investigating I realised that so many implementation steps were in people’s heads and not written down. If I asked a question about a detail, I got a verbal answer and it was left at that. Such as ‘Where are the taxes calculated?’ and ‘Where are the tax rates stored?’ People in the team knew the answers well. This doesn’t mean they knew the cause of the errors but as a team we could map out the process to do a root cause analysis.

 

Having the process presented in a structured way was essential to identifying control weaknesses, and so not having it written down was itself a weakness. The answer being ‘in the code’ might be sufficient for some contexts. I had read books about Microsoft which explained that, at least in part of Microsoft’s history, documentation was not considered all that critical. If you need to know how something works, go and read the code. But this does not work in financial services. We need to be able to show our process and controls in writing to anyone who asks and be ready for audit at any time. Coming out of the Financial Services Royal Commission, we even had to be ready for a public spotlight on our code and process.

 

Our process had to be accessible and transparent to the Risk team most of whom do not read code because they have a Risk, Audit and maybe Legal background, not a software or engineering background. Even for those who do, it is not acceptable to refer business professionals to the code, and moreover process steps cannot be written in code of course.

 

So when a team member explained something to me, I always thought ‘But where is this written down?’ I embarked on a mission of documenting our process and references to our technical details on Confluence. If I had a question, I didn’t want a verbal answer but wanted to be able to read the answer on a permanent page, so I knew it was available for anyone with access to our Confluence site internally to read how our process works and get a good working understanding of our steps. This also became the basis for repeatable steps that our team followed exactly, rather than the steps as remembered and understood by individuals in the team.

 

For example, ‘How often do we update our postcode list, which systems is the list updated in, and how is it done?’ If it wasn’t clearly documented, it needed to be.

 

By this stage, I had realised that just getting the price changes done and remediating existing known errors was far from enough. The possibility of new errors and controlling those risks needed my focus. We found as a team that there had been many more errors over the last 10-15 years which were now being discovered and costing millions of dollars in customer refunds.

 

For around 20 incidents, a well-funded remediation team was stood up to analyse the problems and ensure the right customers were refunded the right amounts for around 20 incidents.

 

Due to the number and size of errors, I naturally wondered how other industries avoid errors. Our industry is not about saving lives but is about protecting lives and showing the utmost integrity in financial matters. Other industries are about saving and protecting lives in a much more direct way, so their error rate cannot be at the level that ours had been. How do they control risk and avoid errors? This led me to a company which specialised in mission critical systems called Dedicated Systems. Due to coincidental good timing, they were running a seminar in Brisbane in about a month. I decided to attend at my own cost. It was a chance for my family and I to visit Brisbane where we had not been before. For a short part of the trip, I attended the free-of-charge day-time conference on software quality.

 

There were presentations from aviation, health, and other industries which control their software for errors. One presentation explained how Airbus control for software errors, and which methods they use to quality check their code. They explained the software operating system for fighter jets, VxWorks which is a real-time operating system.

 

They explained the certified quality check process which costs $USD 500 or more per line of code, and the certification process. Aviation software is required to be submitted to this quality certification process.

 

In the breaks, I spoke to people in the small space industry in Queensland working at a university, and people in healthcare about their applications for software quality control. At that particular seminar I was the only person from financial services.

 

But submitting our code to quality review or adopting VxWorks were not solutions to our problem. Indeed, none of the tools or processes mentioned above were fit for my problem in financial services. Nevertheless, just by being there, I gained ‘intellectual permission’ to take quality seriously. In the several companies in financial services that I had worked in, I had never been to an event, meeting or conference which brought the resources and intellectual firepower to the problem of software and system quality like I experienced at that Brisbane seminar.

 

By reading about medical quality checks, I stumbled upon the iconic article ‘The Checklist’ by Atul Gawande in The New Yorker (2 Dec 2007). The article begins with a breathtaking story about what was involved in saving the life of a three-year-old who had fallen into an icy fishpond in a small town in the Austria Alps, was airlifted by helicopter to a nearby hospital, and through a vast array of medical processes executed flawlessly by caring and highly skilled professionals, the little girl was saved. In fact, these processes and technologies are used daily in nearly every country in the world. If just one of the panoply of steps is not completed on time and correctly, a patient may die or experience lifelong complications.

 

I devoured the literature on checklists starting with the book The Checklist Manifesto: How to get things right also by Gawande. Checklists were a simple, almost cost-free and entirely fit approach for our problem at that time.

 

To ensure that checklists add value, discipline and enforcement are required.

 

We discovered that in a space launch, a long checklist is implemented through the countdown which starts 3-4 days prior to launch. Different parameters need to be just right at various times before launch. If a part is not at the right temperature 3 hours before launch then the launch may need to be called off, and if that same part is not at the correct temperature – which now is a different required temperature – 30 seconds before launch, then the launch will need to be called off.

 

To explain and promote this thinking, we gave internal presentations including a short video showing how the checklist is a critical part of the launch process. This showed everyone that checklists are central to any mission critical process, as well as getting us all excited that we are doing what cutting-edge engineers do when implementing our checklist processes. The presentation also explained how checklists save lives in hospitals and surgery theatres every day, largely thanks to the organising and communication activities of Gawande.

 

A world-leading heart surgeon can forget to remove a cotton swab from the patient before sewing them up. A nurse or any assistant can enforce a checklist over the authority and reputation of that surgeon. This saves lives, and more often reduces the chance of complications which add cost and pain to the lives of patients and their families. This avoids future efforts in the hospital to deal with the consequences of the error in that patient.

 

We implemented a checklist in our standard operating procedure (SOP). Whether we followed our SOP was subject to review by the Risk team, which held us accountable. Nobody wanted an incident raised by the Risk team. Being named as a cause of an incident affects your bonus, and chance for salary rises and promotion.

 

We created the steps as a team, added signoff steps, and made it part of the price change signoff pack. The financial analysis was always thorough and incontrovertible because it was based on numbers and maths. There were uncertainties but these are covered by generally accepted probabilistic methods.

 

By contrast, implementation errors were an unknown and a risk that needed to be managed. Now, the CFO wanted to know why we were confident that there would be no errors when we implemented. What was our testing process? Hence the checklist and testing steps now received at least as much attention as the financial analysis and business drivers of the price change.

 

The project flow at a high-level (figure 1) was largely unchanged.

 

Figure 1: Price change process

 

Price change process

 

One level down, though, it was entirely different. There were checklists at several stages of the process where previously the steps relied on the knowledge of experienced people.

 

We still needed experienced people, but it was easy to leave a step out when you received 20 phone calls and 50 emails per day.

 

It is the manager’s job to protect the team’s time. However, the team had pre-existing relationships across the organisation, and these ‘favours’ – e.g. a data extract for John, a quick data refresh for Julie, a small piece of analysis for Jodie – were a major source of distraction. It took time to taper these off.

 

In this busy environment, mistakes can easily happen despite everyone’s best efforts. Enforcing a checklist ensures that steps are not missed. While I had trust in the team, they are professionals and I was relatively new. So it was a delicate matter to enquire whether you have done it right. But if a nurse or intern can hold a heart surgeon to account, then it was my job to not hesitate when a completion of a step was not evidenced.

 

The culture changed in favour of accuracy. When technical people know that quality is paramount, then as a team we find every error down to fractions of a cent. It’s a beautiful thing. We can make our own assessment as to what is material and what isn’t, and the assessment is documented and supported. We were never satisfied because the more we looked the more we found. Still, I began to look forward to the Risk reviews, because they frequently were wowed by our rigour and made informal comments about how we compared with other teams they reviewed.

 

Testing steps
The testing steps will be familiar to you (figure 2).

 

Figure 2: Testing steps in the implementation process

 

Testing steps in the implementation process

 

We conduct the empirical check of reconciling the results on a large dataset and perhaps on the entire dataset (figure 3). This gives confidence that the implementation is correct.

 

The comparisons done are shown in Figure 3.

 

Figure 3: Empirical tests

 

(a) Output of simulator is compared against the business and financial expectation of the distribution of pricing across the various pricing segments. This test is a price change distribution ‘meets expectations’ and is not a like-for-like check.

 

Empirical tests

 

(b) Output of UAT environment which is PROD-like is compared the unit testing results to ensure they are the same.

 

Output of UAT

 

(c) Output of PROD is compared with UAT to ensure they are the same. Anomalies are investigated in the pricing simulator because the precise impact of each rating factor can be traced to find the cause of the error, e.g. ‘postcode 44xx’ is not being picked up because we haven’t updated our postcode table in production.

 

Output pf PROD compared to UAT

 

Empirical checks versus process integrity
Empirical checks of comparing one set of results with another are not enough. DEV and UAT are the development and user acceptance testing (UAT) environments.

 

For example, the result sets in test might both be wrong so even if they reconcile perfectly the outcome might be incorrect giving false confidence.

 

Also, production tests, called business post-implementation verification (BPIV), might not be able to cover all the scenarios that tests in the UAT and DEV environments covered. So production could be wrong even when the test returns a positive result because the scenarios that return the error are not in our test case population.

 

The reconciliation approach is essential, but ‘blind’ and can give unwarranted confidence. Therefore, we also need the process checklist to be clear and to be followed, so we know that through our careful steps we are not introducing flaws that poison all the environments.

 

Checklists certainly do not allow you or your team to switch to autopilot. However, there is a list of things that every software or technical change release requires and without following these steps you are, in effect, being unprofessional. The more detailed your checklist and active engaged effort to check zooming out and in and back out then in again, the more likely you’ll find the unknown unknowns. More detail in your checklist undoubtedly adds more rigour, but your checklist needs to be practical, not so long and detailed as to be unwieldy.

 

Monthly retrospective control in production
During and between price change releases, we also performed a monthly check of all prices in production against the simulator. This ensured that where unusual new customer scenarios arose which our overall system did not take into account, we identified those early. As a manager, I needed to ensure that 0.5 FTE was dedicated to running the monthly comparison, identifying and documenting discrepancies, and investigating the discrepancies.

 

It would have been so easy to deprioritise this in favour of the new deliverables, since the monthly check involves – over and over again – scrutinising the existing prices returned by our implementations of the past. It is surprising how many anomalies are found when tens of thousands of customers are completing quotes every month. Most are small and immaterial, but we are always learning about our model and end-to-end automated production pricing process through these checks.

 

Some discrepancies which were in the cents were the result of limitations in the base Python libraries we were using for our simulator. We documented these discrepancies and the cause so that Risk would know we left no stone unturned.

 

Despite the cost in time, the commitment to Line 1 Risk made it impossible to deprioritise. There was a 6-monthly operational effectiveness test of the monthly check, which was a formal control in our control register. These operational effectiveness tests are where the Risk team really earns their money, because they hold the team to account, right up the management chain. Activities that do not have any obvious profit or revenue motive – but are essential for the integrity of the organisation and the welfare of customers – are forced to the top of the priority list.

 

Conclusion
Making quality paramount in financial services has been a rewarding journey for me.

 

I would say there is no one single thing that ensures our development and release process is as robust as required. There are a range of process, practice and organisational factors that reduce the risk of an error creeping in. But even if you lack one of those domains and don’t have the ability to introduce it – e.g. you don’t have a well-resourced Line 1 Risk team – then you can create something similar within your sphere of influence, such as controls written in a visible register.

 

References
Dedicated Systems Tech Days 2018. Accessed 26 Sep 2024 at https://dedicatedsystems.com.au/tech-days-2018/seminars-brisbane/

Gawande, A. ‘The Checklist’ The New Yorker 2 Dec 2007. Accessed 26 Sep 2024 at https://www.newyorker.com/magazine/2007/12/10/the-checklist
Gawande, A. The Checklist Manifesto: How to get things right Metropolitan Books 2009

About the Dan Misra
My approach to transparency and success in managing data and tech related projects is encapsulated in ‘Total Ownership’. I am always happy to share on this topic. Find me at https://www.linkedin.com/in/danmisra/

My habits, skills and knowledge developed over 20 years in data and tech-related projects, mostly in banking and insurance in Sydney Australia.

My tertiary studies are:

• Research: UWarwick MPhil computer science with 4 publications(*), and UNSW MSc history of mathematics on Calculus and G.W.Leibniz.
• Mathematics: USyd BSc Hons I pure mathematics.
• Law: USyd LLB, and QMUL Postgraduate Diploma international dispute resolution.
(*) https://www.researchgate.net/scientific-contributions/Kundan-Misra-70373972

 

See Dan’s profile here.

Get In Touch