The Use of the Word ‘Robust’ to Describe Software Code
This post was first published on 25/06/2019.
Author: Stephen Mason.
In 1997, the Law Commission decided that writers of software code wrote perfect code, because it introduced the presumption, that included computers by implication (or more accurately, digital data), that, ‘In the absence of evidence to the contrary, the courts will presume that mechanical instruments were in order at the material time’. Politicians decided to replicate the presumption for criminal proceedings by passing section 129(2) of the Criminal Justice Act 2003.
No evidence was put forward by the Law Commission to substantiate the assertion that computers were ‘reliable’ (this is the word that is often used), and proposals for reform have not been take up. (‘Electronic evidence: A proposal to reform the presumption of reliability and hearsay’, Computer Law and Security Review, Volume 30 Issue 1 (February 2014), 80 – 84.)
This presumption illustrates the cognitive dissonance of the Law Commission and judges (for which also see ‘Artificial intelligence: Oh really? And why judges and lawyers are central to the way we live now – but they don’t know it’, Computer and Telecommunications Law Review, 2017, Volume 23, Issue 8, 213 – 225).
They accept computers are ‘reliable’, yet allow companies that write software code to include a contract term in the software licence that clearly states that writers of software code are not perfect. Here is an example:
The Licensee acknowledges that software in general is not error free and agrees that the existence of such errors shall not constitute a breach of this Licence
So, who is correct?
Is it the Law Commission and judges who agree that software is ‘reliable’ (whatever that means – no judge has ever determined what ‘reliability’ is, for which see chapter 6 in Electronic Evidence)?
Or is it the people responsible for writing software code, who explicitly state that software is generally not error free?
In the current trial of Bates v Post Office Limited TLQ17/0455 before Mr Justice Fraser in London (a transcript of the trial is available at www.postofficetrial.com), the Post Office claimed that the software code was ‘robust’.
Below is what Anthony de Garr Robinson QC said in his opening speech:
Day 1: 11 March 2019, 87
MR JUSTICE FRASER: I’m not in any way being difficult, I think we may as well just deal with it upfront at the beginning. Am I to read “robust” as meaning “extremely unlikely to be the cause”, or is there more meaning to “robust” than that? Because I think whatever it is, we all have to make sure we are using the word the correct way, or the same way.
MR DE GARR ROBINSON: The concept of robustness is a concept which involves reducing to an appropriate low level of risk, the risk of problems in Horizon causing shortfalls which have a more than transient effect on branches. So it involves both measures to prevent bugs arising in the first place but those measures are never going to be perfect and it includes measures which operate once a bug has actually occurred and triggered a result. It is both aspects of the equation. I don’t say that the word “robust” necessarily means “extremely low level of risk”, but what we say is that if you have a robust system it produces a result in which the system works well in the overwhelming majority of cases and when it doesn’t work well there are measures and controls in place to reduce to a very small level the risk of bugs causing non-transient lasting shortfalls in any given set of branch accounts.
Day 1: 11 March 2019, 101 – 103
Now, before addressing the expert reports on robustness it is worth noting the large measure of agreement that now exists between the experts. There is no dispute about the architecture or capabilities of Horizon. There’s no suggestion that Horizon lacks important capabilities or that it doesn’t generally perform satisfactorily. There is no suggestion of any systemic problem lurking in Horizon.
In short, it is accepted that Horizon works well for the overwhelming majority of cases and consistently with that it is now common ground between the experts that Horizon is robust and that its robustness has improved over time and your Lordship already has the reference, it is the joint statement, the third joint statement, page 2, {D1/4/2}.
Now, what does relatively robust mean? It means robust as compared with comparable systems — big systems, systems that keep aircraft in the air, that run power stations and that run banks.
My Lord, by the same token it is common ground that the Horizon is not infallible. It has and will continue to suffer faults every now and then. Sometimes, in a really small number of cases, those faults will have an effect on branch accounts, but it should be remembered that robustness is not just about preventing bugs from appearing in the first place, it is also about limiting the lasting detrimental effects when they do appear.
Your Lordship will hear evidence that bugs affecting branch accounts are given a high priority when they are addressed by Fujitsu. They are not ignored. And, my Lord, the evidence also shows that bugs which have an effect on branch accounts occur only very rarely indeed. There is a dispute between the experts as to precisely how rarely, but in the context of a huge system that’s been in continuous operation for 20 years, that dispute in my submission does not have a material bearing on the outcome of this trial. In the overwhelming majority of cases, branch accounts will not contain a shortfall caused by a bug and the scale of bugs that would be needed to undermine that simple fact would be enormous.
Putting the point another way, the difference now being played out between the experts is at the margins. They accept that there are imperfections in the Horizon system with the result that in some rare cases bugs affecting branch accounts occur and will not be immediately fixed. The issue between them is how slight are the relevant imperfections.
A number of people have kindly responded to my request to clarify these comments, which are set out below, with their agreement.
Professor Martyn Thomas, CBE, Fellow and former IT Livery Company Professor of Information Technology at Gresham College; Visiting Professor in Software Engineering at the Universities of Manchester, Aberystwyth and formerly Oxford and Bristol (www.gresham.ac.uk/professors-and-speakers/professor-martyn-thomas-cbe), comments:
It seems that De Garr Robinson is using the word ‘robust’ tautologically, in that the software is asserted to be robust (i.e. not to have caused serious problems) and therefore it didn’t cause serious problems. The excerpt at pp 101-103 is again tautological. Bugs are only ‘slight imperfections’ if their consequences are trivial. This is the central issue in the case, as I understand it. It is also true that bugs are only fixed when they have been detected and determined to be important. If the PO is arguing that there were no bugs or that any bugs would have been fixed, their argument is either meaningless or circular.
Mr De Garr Robinson says
Now, what does relatively robust mean? It means robust as compared with comparable systems — big systems, systems that keep aircraft in the air, that run power stations and that run banks.
But safety-critical systems that keep aircraft in the air are built to rigorous standards that far exceed the normal practices of commercial software developers and that are unlikely to have been followed for Horizon. Commercial systems, such as those used by some banks, fail uncomfortably often as the customers of TSB discovered to their cost in 2018.
Little reliance should be placed on the failure frequency in a long period of service because new defects can be introduced any time that the software is changed (whether to correct defects that have caused failures or for other reasons). A bug that caused aircraft to be grounded at Heathrow in December 2014 was in a flight data processing system that was written in the 1960s. The defect was introduced in a modification made in the 1990s that was not found in extensive testing or subsequent use but that was triggered in 2014 by particular data.
Most complex software contains many latent defects that will only cause failures under specific and rare combinations of data. It is perfectly possible that Horizon could contain defects that are not triggered by most branch transactions but that were triggered by some others.
Professor Peter Bishop, School of Mathematics, Computer Science & Engineering, Department of Computer Science, City, University of London (www.city.ac.uk/people/academics/peter-bishop) observed:
It could be broader and apply to the system as a whole, e.g. ‘A system is robust if abnormal behaviour can be detected and rectified’. This is a personal definition, not a broadly agreed term, but I think it captures the idea that software is never going to be perfect, but we can live with it if there is some means of reducing the impact of failures.
So for Horizon we could ask:
(i) What means exist for detecting abnormal behaviour?
(ii) What processes exist to rectify to the consequences?
(iii) What means exist to identify the cause of abnormal behaviour?
(iv) What processes exist to prevent a recurrence of abnormal behaviour?
I have some experience with electronic fund transfer systems, and what I see there are separate journal logs (e.g. for individual banks and the central bank) with some form of periodic ‘reconciliation’, i.e. money sent from A to B should agree in both A and B journals. For Horizon, we could ask:
(i) Is there an independent (tamper-proof) journal for each sub-post office?
(ii) Can this journal be reconciled against the amount recorded within Horizon?
(iii) Is there a composite journal for the central Horizon system that can be checked for consistency against the sub-post office journals?
(iv) Is there a test environment where journals be re-run to identify the cause of a discrepancy?
Professor Derek Partridge, Professor Emeritus, past Chair of Computer Science at the University of Exeter (http://emps.exeter.ac.uk/computer-science/staff/dpartrid) commented:
I do not think that “robustness” is a particularly pertinent term. It usually refers to the ability of a software system to stand up to misuse (i.e. users entering wrong commands and/or inappropriate data) and not crash (as so many do) or deliver spurious results. A robust system should be able to take what’s thrown at it, continue working smoothly and request the user (ideally with some guidance) to enter an appropriate command or valid data.
This is very different from, what seems to me to be, the ‘correctness’ of the system, i.e., is it always functioning exactly as it should be (which, ideally, is defined in the original system specifications)?
It seems to me like the very difficult issue of what appears to be a subtle error that is either activated rarely by an unknown condition, or is possibly always active but only compounds into an obvious problem on odd occasions.
The Post Office Horizon system is vastly more complex than a cash machine which must broaden the scope for subtle either generally (but not always) self-correcting or very rarely occurring errors, perhaps very small errors that compound into significance.
Roger Porkess, past Chief Executive of Mathematics, Education, Innovation (MEI) for 20 years, and author of a number of books on maths, including (with Sophie Goldie), Cambridge International AS and A Level Mathematics Pure Mathematics 2 and 3 (Hodder Education, 2010), and author or co-author of national reports on mathematics and statistics, including ‘A world full of data’ (Royal Statistical Society), as well as a very large number of mathematics and statistics textbooks, noted:
As far as I can see there are two strands to Mr de Garr Robinson’s argument. Neither is valid.
Strand 1
Malfunctions occur only rarely so the system is robust.
Since the system is robust, the malfunctions cannot be the fault of the system.
This is obviously a circular argument.
Strand 2
If the system was to blame, the number of software errors would be so large as to be unrealistic.
The figures used to support this argument are fallacious.
Specific comments:
It is both aspects of the equation.
This is the first of several places where Mr de Garr Robinson uses language imprecisely, something that I find very surprising when presenting a legal case. Something like “In both strands …” would have been better than “… both aspects of the equation …”. There is no equation in sight.
There is no suggestion of any systemic problem lurking in Horizon.
I do not think this statement is true. Clearly some problems do remain.
systems that keep aircraft in the air.
This is a highly inappropriate analogy. Excepting the new 737, aircraft do not fall out of the sky because their systems are 100 per cent robust. No level of failure, however rare, is acceptable. By contrast the evidence shows that the Horizon software is not completely robust, for whatever reason.
In the overwhelming majority of cases, branch accounts will not contain a shortfall caused by a bug and the scale of bugs that would be needed to undermine that simple fact would be enormous.
The calculation of the number of bugs does not hold up.
The trial has now ended, and the leading barristers will be presenting their closing speeches to the judge next week.
This guest post was written by Stephen Mason. This post therefore reflects the views of the author, and not those of the IALS.