Using predictive coding technology in England – encouragement for sceptical litigators

Pyrrho & Anr v MWB Property Ltd & Othrs [2016] EWHC 256 (Ch),1 was the first United Kingdom court decision to approve the use of predictive coding to review electronically stored information (ESI) in civil disclosure. As commercial litigators are well aware, disclosure (discovery in the United States) is often the most protracted and contested aspect of a case, and usually the largest single element of the pre-trial costs budget.

Disclosure in the UK

Disclosure is governed by the Civil Procedure Rules and Practice Directions,2 which must be applied in accordance with the ‘overriding objective’ enabling the court to deal with cases ‘justly and at proportionate cost’. While there a number of different variants, perhaps the most common type of disclosure used in commercial cases is ‘standard disclosure’. The rules of standard disclosure require each party to make a ‘reasonable search’ for relevant documents, which must then be listed and disclosed (subject to privilege) to the opposing party for inspection. A ‘document’ includes a computer file.

It may be reasonable to search ESI by means of keyword searches, or ‘other automated methods of searching’, if a full manual review would be unreasonable. The practice direction (31B) recognises that keyword searches may be unsuitable if they find excessive quantities of irrelevant documents (eg, by duplication of documents in email and ‘cc’ chains), or fail to find important documents which ought to be disclosed.

In such circumstances, the parties should consider supplementing automated searches with ‘additional techniques’ (such as individual review of key documents), and taking ‘such other steps as may be required to justify the selection to the court’. In Pyrrho, Master Matthews accepted that the use of predictive coding technology was acceptable as an ‘additional technique’ contemplated by the practice direction.

The use of predictive coding or other type of technology assisted review is not mandatory. Nevertheless, where one party suggests it and the other refuses, they may now need to justify that refusal to the court. In a subsequent contested UK predictive coding case, David Brown v BCA Trading [2016] EWHC 1464 Ch, Registrar Jones ordered the use of predictive coding despite the objections of the opposing party.

Predictive coding

Predictive coding is a type of technology assisted review which uses technological, statistical and legal input to ‘train’ computer software to look for common concepts and themes within documents, to give an indication of likely relevance. Complex algorithms underpin the software.

While there are various different workflows available, the basic principle is that the software is trained by manual review of relatively small subsets of documents from within the ‘document universe’ available. Ultimately, the computer software is able, based on the training, to search the entire document universe of potentially disclosable material, so as to prioritise documents to be reviewed in order of relevance (or even, in some cases, to actually select documents to be disclosed subject only to some quality checks). There are major savings of time, labour and costs as a result. A full explanation of how the process works is found in the case report of Pyrrho.

UK lawyers and e-disclosure technology

A study by Hitesh Chowdhry for Kroll noted that, in general, UK lawyers were risk adverse, mistrustful of the technology’s accuracy, and feared the cost implications.3 As one of the lawyers involved in the Pyrrho litigation, and the smallest firm of all of the defendants, we have been able to see the potential benefits of the technology; we now consider using it as a litigation tool right from the start for suitable cases.

This article notes some of our insights, which might assist a cautious lawyer in making the decision whether to use predictive coding in any particular piece of litigation.

Using the court, the rules and the technology to get the best results from predictive coding

While there are many different variants of the technology, some variants can be trained as the case evolves and more information is added. Therefore, it can be adaptable right from the start of the anticipated litigation to assess the key issues, find evidential trails and possible witnesses and, of course, identify likely documents for disclosure.

Once the ESI has been uploaded, it can be searched quickly and efficiently, allowing the parties to determine and refine their litigation strategy. In cases where there are both ESI and paper documents, the latter can be uploaded using optical character recognition technology, so that all material can be reviewed as a whole.

For large corporate clients anticipating litigation, a service provider can work with the company’s document management policies to address issues of document preservation and document organisation. We are now considering using it in far more modest cases than Pyrrho, where we estimate the savings in fee earner time to be greater than the hosting, processing and review costs.

One of the features in Pyrrho which has been remarked on, particularly by US lawyers, is the degree of cooperation between the seven parties (there were two claimants and five co-defendants). The claimants estimated that there were potentially 17 million documents in electronic format to be disclosed, and even after de-duplication, this number was only reduced to 3.7 million documents. It was estimated that reviewing all of these documents manually could lead to costs of almost £15m.

However, the parties were able to reach a case management agreement, subject to the approval of the court, providing for the use of predictive coding, which offered the potential to substantially reduce the number of documents to be manually reviewed and, therefore, also the associated costs (which, it was estimated, might be reduced to around £400,000).

There were several reasons for the cooperation between all of the parties in the Pyrrho case. Firstly, the sheer size of the task, as well as potentially rampant costs, led to a pragmatic decision to at least consider the possibilities of using predictive coding as an option, in advance of the case management conference (CMC) before the Master.

Secondly, the active role of the court in managing disclosure encouraged consensus between the parties. The civil procedure rules on case management include a requirement for the court to give consideration to the use of technology (CPR 1.4(k)). This usually first occurs at the CMC. Both the Pyrrho and BCA decisions were interlocutory rulings made at this stage of the proceedings. In Pyrrho, there was a second CMC purely to consider disclosure, at which the predictive coding ruling was made.

The CMC takes place at the close of pleadings when all of the issues between the parties have been set out. In the Pyrrho case, the claim was issued in March 2013, so the issues and disclosure points had been well- rehearsed by the parties before they came before the Master in early 2016.

Ahead of the CMC, all parties must conduct relevant disclosure scoping enquiries and set out their positions and proposals on various issues in prescribed forms. The parties are encouraged also to prepare an electronic documents questionnaire (EDQ) – a nine- page form listing details of the ESI which each side has and proposals for how it will be searched. The parties must also liaise and try to agree as much of the electronic disclosure process as possible.

The emphasis on cooperation at this stage is emphasised by the Technology and Construction Court (TCC) in London, which publishes an e-disclosure protocol and guidelines4, (which are not binding, but were referred to in the Pyrrho case), suggesting that parties use the EDQ ‘as an opportunity to kickstart the dialogue process’. The Master in Pyrrho referred to the fact that EDQs had been exchanged a year before his ruling and ‘commendably, several rounds of correspondence between the parties have resulted in large measures of agreement’.

Where the parties are unable to reach consensus themselves, it is open to the court to assist them, as in the BCA case where the Registrar noted that a ‘successful outcome from the use of predictive coding must, at least to some extent, depend upon the success of the parties having been able first to narrow down the issues and therefore the categories/ types of documents relevant to the disclosure process. This must be the first stage and I have made proposals for that process in the form of directions for the identification of issues with the aim of narrowing down what needs to be the borders for the searches.’

A further reason for consensus in Pyrrho was that the use of predictive coding software enabled the disclosure process, despite the outstanding issues, to be addressed with a degree of flexibility. This reassured all the parties that their particular interests would be addressed. The Master accepted that ‘there are or will be some areas of disagreement which will or may have to be determined later’. This illustrates that the courts will encourage the parties to be constructive in seeking solutions to outstanding issues, and it may not be necessary to resolve every aspect before an agreement in principal as to how to address e-disclosure is reached.

In Pyrrho it was appreciated by the parties and the Master that once the software has been trained, then in principle the cost of applying the software against further documents can be minimal (so long as the initial training of the software does not need to be adjusted or altered in some way). It is therefore possible in principle, with relatively limited impact on costs, to increase the number of custodians (email addresses) involved, broaden the date range of the search, and/or use search terms to select further documents from a wider or different pool of data to be analysed by the software. Master Matthews said: ‘unlike with human review, the cost does not increase at the same rate as the number of documents to be reviewed increases. So doubling the number of documents does not double the cost’.

Furthermore, the parties in Pyrrho were well-aware of the serious cost consequences of getting disclosure wrong. In most divisions of the courts, at the CMC the parties are, subject to some exceptions, required to have costs estimates prepared for all aspects of the case which will be critically examined by the court. If a party’s costs exceed the estimate, it will not recover its costs on an assessment, even if it is successful. Civil practitioners are highly motivated to put the necessary work into disclosure to narrow and refine the issues (such as search terms and date ranges, etc) at this stage of the proceedings, in order to determine what is actually necessary and reasonably likely to assist their client’s case. The likely cost of hosting, processing and reviewing documents identified by predictive coding must be considered and assessed at the outset of litigation as part of any budgeting exercise. Where a party is insured or third-party funded, the insurers or funders will need to be brought on board to support the ongoing costs of this.

Another aspect is that the court will seek to avoid parties using electronic disclosure as a litigation tactic where there is an imbalance in the respective document repositories – for example, where a party seeks to make the disclosure process as broad and expensive as possible for their opponent, while their own disclosure obligations remain limited because they do not (for whatever reason) have as many documents.

In the BCA case, the party advocating the use of predictive coding was in possession of the majority of the documents and thereby would be the main beneficiary of the reduced costs associated with its use, and the Registrar gave particular weight to this fact. In Pyrrho it was ‘common ground that the bulk of the relevant documents are likely to be in the control of the second claimant [who] controls back up tapes on which data from email accounts used by the 2nd to 5th defendants are stored’. In any event, the losing side ultimately is likely to bear the costs of disclosure, so there is an incentive on all parties to reduce the costs to the minimum.

Law firms must establish a good relationship with litigation support services and staff able to provide a full range of e-disclosure services, including predictive coding. In Pyrrho, the Master referred to the support given to lawyers by their IT providers. Of course, lawyers and paralegals will still be required to interrogate the sample set and make important review decisions (particularly when training the software), and to work with their external service providers; the courts will not hesitate to make adverse costs orders for poorly executed disclosure exercises,5 whether as a result of technical or human error.6,

The question of privileged documents was not addressed in the Pyrrho judgment but, in theory, they could be excluded by an initial trawl using searches to find potentially privileged domain names and email addresses of individuals or lawyers – documents responding to such terms could then be ring- fenced for consideration in the usual way. The TCC protocol (para 7.2) suggests including a provision for the parties to ‘claw back’ inadvertently disclosed privileged documents without loss of privilege, without needing the consent of the court, but we are not aware of this having been tested in the UK courts.


Predictive coding may not be not suitable for every disclosure exercise, since the hosting and review costs can, of themselves, be significant. However, for the right cases, it can offer an exciting opportunity for lawyers to manage litigation effectively, efficiently and with significant costs savings to the client.


1 See
2 CPR31, especially 31.7, and PD 31A and PD 31B.
3 Hitesh Chowdhry, Rage against the Machine: Attitudes to Predictive Coding amongst UK lawyers.
4 e-disclosure protocol and guidelines, published by the TCC Solicitors’ Association, 9 January 2015 (TeCSA), Technology and Construction Bar Association (TECBAR) and the Society of Computers and the Law.
5 West African Gas Pipeline Co Ltd v Willbros Global Hldgs Inc [2012] EWHC 396 (TCC) – wasted costs order for electronic disclosure failures.
6 In 2015, the High Court criticised City law firms after outsourcing caused poor management of the disclosure process. See process.