#Miranda: Where is the UK Government getting its numbers from?
by Naomi Colvin
A few days ago I blogged on hints Glenn Greenwald made about witness testimony the UK Government was due to give in court about its grounds for continuing examination of electronic material confiscated from David Miranda.
In that blog, I suggested that if the UK Government really had only managed to decrypt “something like 75 documents”, it cast their assertions about the number of documents Miranda was carrying in a rather different light. Many news organisations have taken the “58,000 documents” figure as fact. But what is it really based on?
The court hearing was heard yesterday afternoon and, at its conclusion, Government lawyers released the testimony of Oliver Robbins, a senior civil servant who has held intelligence related positions in the Cabinet Office under the present and last governments. His is the securocrat’s voice par excellence.
At the outset, it should be noted that Robbins’ testimony isn’t the court filing Greenwald was referring to in the comment that prompted my last blog. That, it transpires, was a separate statement by Detective Superintendent Caroline Goode, from the Metropolitan Police’s Counter-Terrorism Command. Goode’s statement has not been released in full, but sections from it have been reported in the press. The fullest account of Goode’s statement, from which many of the others are drawn, is this Reuters piece.
Let’s look at what we know of Goode’s reported statement first.
Caroline Goode’s evidence
Use of TrueCrypt
Detective Superintendent Goode said that the information on the external hard drive was encrypted by a system called “True Crypt [sic],” which she said “renders the material extremely difficult to access.”
This is useful information. First of all, note the use of the word “access” to mean “access in readable form” and that Goode’s comments relate to just one of the devices taken from Miranda.
TrueCrypt is widely used encryption software that is free to use and download; many of those reading this blog will be familiar with its features. For those who aren’t, the TrueCrypt homepage describes what this software does (I’ve preserved the hyperlinks to more detailed resources on the Truecrypt website for those who want to read further):
Creates a virtual encrypted disk within a file and mounts it as a real disk.
Encrypts an entire partition or storage device such as USB flash drive or hard drive.
Encrypts a partition or drive where Windows is installed (pre-boot authentication)
Knowing what TrueCrypt does is useful because it gives us a good basis on which to assess the validity of subsequent statements. Note that TrueCrypt encrypts entire hard drives, or portions of them, rather than individual files. An area of a hard drive that has been encrypted with TrueCrypt is very much like a container you can drop files into. You need a password to open the container before you can access the files within it. This container is often called a TrueCrypt file but it can also be called a TrueCrypt volume.
60 GB of data and only a third of it “accessed”
Goode said the hard drive contained around 60 gigabytes of data, “of which only 20 have been accessed to date.” She said that she had been advised that the hard drive contains “approximately 58,000 UK documents which are highly classified in nature, to the highest level.”
Note first of all that Goode is still discussing only one of David Miranda’s electronic devices – an external hard drive . She then notes that only a 20GB portion of that external hard drive has been “accessed” – which either means that the remaining 40GB data is inaccessible (presumably because it is contained within one or more encrypted TrueCrypt volumes), or that the police simply haven’t got around to examining them. Given that Goode’s colleagues have now had access to that external hard drive for nearly two weeks, the former possibility is presumably the more likely of the two.
Incidentally, there is nothing in Goode’s statement to say that we’re dealing with a 60GB hard drive. The external hard drive could just as well be one of larger capacity holding only 60GB of data.
Finally, Goode “has been advised” about what the hard drive as a whole contains. This is not knowledge that she has determined herself, independently, from access to those 20GB of data. It seems odd that Goode’s reported statement about the content of the drive, including the 40GB of data she has not been able to “access”, does not rely to any extent on the 20GB she has.
“Only 75 documents have been reconstructed“
Goode said the process to decode the material was complex and that “so far only 75 documents have been reconstructed since the property was initially received.”
This is the statement that Glenn hinted at earlier this week.
“Reconstructed” is a strange word for Goode to use. The most natural interpretation is to see “reconstructed” as a synonym for “decrypted” or “put into a form that can be read”, although this doesn’t really fit in with the idea of a “complex” process. They may not have the technical nous of Edward Snowden, but I assume that Counter Terrorism Command are familiar with the process of mounting an encrypted TrueCrypt volume and typing in a password.
So what else could Goode mean here? It’s easy to exclude a few possibilities: even if the Met and GCHQ were trying very hard to open an encrypted volume by brute force, they wouldn’t be able to individually decrypt the files within it one by one.
What Goode could mean is that analysts have been able to recover deleted files from unallocated space on the hard drive (space that isn’t being used for data now, but may have been in the past). That, at least, is more of a fit for the idea of a “complex process.”
Let’s leave the vagueness about where the files came from to one side for the moment. Are there any other insights we can draw from Goode’s statement?
The first thing to note is that 75 documents out of an estimated total of 58,000 is an absolutely tiny proportion. It is difficult to see how such a minute sample could give a true indication of the entire collection of material held unless one or more of those decrypted files served as a kind of index to the whole. Indeed, if the files have been reconstructed from unallocated space – meaning they had previously been deleted – then they may tell you even less about what is currently on the drive.
There’s a further ambiguity when Goode talks about “the property” – is she referring to the external hard drive here, or Miranda’s confiscated belongings as a whole? If the latter is the case, then it is by no means certain that the “accessed” 20GB portion of the external hard drive contains any documents at all – those 75 could have been obtained from elsewhere.
If we take the opposing view and suppose that Goode’s “the property” means only the external hard drive discussed previously, then those 75 documents came from the “accessible” 20GB portion of the external hard drive or were recovered from unallocated space. Caroline Goode’s evidence could just as easily mean one of these scenarios as the other: it is remarkable for the range of possibilities it does not exclude.
Summary of Caroline Goode’s evidence
Caroline Goode’s evidence suggests that David Miranda’s hard drive contains a TrueCrypt volume or volumes of a total size of 40GB that UK police have no access to. The 20GB encrypted portion of Miranda’s external hard drive that the police have been able to access contains, at most, 75 files. It is possible that some – or even all – of those files came from other devices, or from unallocated space on the same device.
Goode’s statements about the remainder of the documents do not seem to be based on insights gained from the 75. This would tend to support Glenn Greenwald’s assertion that UK police have not been able to access anything sensitive. It certainly does not clarify how the total figure of 58,000 documents the Home Office has asserted is on Miranda’s external hard drive has been arrived at.
Oliver Robbins’ evidence
What follows is a close analysis of Oliver Robbins’ testimony – and I do think it deserves to be looked at very closely indeed. There is much in Robbins’ statement that deserves detailed analysis but, for the purposes of this blog post, I will restrict my attention to Robbins’ comments on the UK Government’s access to, and analysis of, the Miranda data.
Indefinite room for ambiguity.
[in justifying why the Government needs "continuing access" to the material seized from Miranda] … no information that has so far been analysed by Her Majesty’s Government (“HMG”) has identified a journalist source or has contained any items prepared by a journalist with a view to publication. The information that has been accessed consists entirely of misappropriated material in the form of approximately 58,000 highly classified intelligence documents. [para 6]
The first thing to note here is that Robbins’ use of the word “accessed” is different from Goode’s. As we saw above, when Goode talks about data “accessed” she means data that can be accessed in readable form. Robbins’ use of the word is broader because his witness statement is making an argument about the Government’s need for “continuing access” [para 5] to all the material seized from Miranda, including that which has not been decrypted. Robbins’ use of “access” therefore more closely corresponds to the idea of physical access to the devices themselves. This is confusing.
Robbins goes on to talk about a subset of the information that has been “analysed.” We are not told whether this means analysis of encrypted information, but given that he goes on to make statements as to the content of this information, it is likely to be the case that this information can be read in some form. What Robbins says about this analysed material is that none of it “has identified a journalist source” and neither does it contain “items prepared by a journalist with a view to publication.”
Of course, Robbins’ purpose here is to reject the idea that the Miranda material contains anything that should be withheld from examination, but It’s worth noting that the category of data which meets those two stipulations of his is quite a wide one: it includes shopping lists, youtube videos of cats and many other items of limited relevance to national security.
What Robbins says next is interesting: he moves straight from a limited description of a small subset of data to make a claim about the entirety of the Miranda material (“that has been accessed”). Putting to one side for the moment the ambiguity about whether Robbins is really talking about Goode’s external hard drive here or the Miranda devices in total, It is not at all clear on what he is basing this rather striking claim.
Let’s think about this situation in a different context. Imagine if you had a bookcase that, apart from a couple of volumes, consisted only of books with unopened pages. What Robbins says would be like asserting that all the books in the bookcase are illustrated, purely on the basis that, of the two books you can examine without a penknife, neither was printed in London or inscribed with the owner’s name. It is certainly a claim that can be made, but not one that deserves to be taken particularly seriously.
Wait, so it’s not your assertion after all?
I am advised that the data recovered from the claimant is almost certain to contain some of the material passed by Mr Snowden to Ms Poitras and Mr Greenwald. Much of the material is encrypted. However, among the unencrypted documents recovered from the claimant was a piece of paper that included the password for decrypting one of the encrypted files on the external hard drive recovered from the claimant. I have been briefed that the authorities have therefore been able to examine the data contained in this file. They have been able to determine that the external hard drive contains approximately 58,000 highly classified UK intelligence documents. Work continues to access the content of the other files on the hard drive and the USB sticks. [para 13]
There’s a lot in this paragraph, so let’s take it line by line. The first sentence seems to answer the question posed in the previous section: Robbins’ assertion about the content of the Miranda data is second hand after all (“I am advised”). It is also indefinite (“almost certain”) which seems to contradict the conclusive phrasing (“the data that has been accessed… consists entirely of”) of the previous paragraph.
Once again, this is confusing – so let’s try to resolve the contradiction. Is it possible that, when Robbins talks about “the data that has been accessed” in paragraph 6 he is slipping between the broad interpretation of the word “accessed” he has used in his previous sentences and the narrower sense – that of data that can be read and analysed – used by Caroline Goode? It’s much easier, after all, to be definite about the content of documents you’re able to read than ones you cannot.
I’m not sure this works either. Goode testified that the material “accessed” in the sense that it could be “analysed” amounted to a 20GB portion of an external hard drive, which may contain all, or maybe only some, of a total of 75 documents. To say this consists “entirely of misappropriated material in the form of approximately 58,000 highly classified intelligence documents” is just a nonsense. Robbins must therefore be using the word “accessed” in his usual sense and what he says is inconsistent with his previous paragraph.
Does the rest of paragraph 13 make things any clearer? Certainly, the next three sentences are straightforward. We know that “much of the information” carried by Miranda was encrypted and that Caroline Goode and her colleagues were able to decrypt one encrypted file on the external hard drive. By Goode’s own account, she and her colleagues were able to examine the data contained within this file. These sentences are consistent both with Robbins’ own statement and those of others.
What follows is much more troublesome. “They [the authorities] have been able to determine that the external hard drive contains approximately 58,000 highly classified UK intelligence documents.” The analysis of Goode’s statement shows that she and and her colleagues could not derive the presence of “58,000… documents” from what she found – and she didn’t claim to have done.
But have I missed something here? Could it be that Robbins’ “they” isn’t referring to Goode and her police colleagues at all? Could he be referring to different “authorities” altogether? Might they be the same authorities who “advised” both Robbins and Goode of “58,000 documents” figure and on whom both rely? I think that is likely and, although a casual reader may feel that the two sentences below bear a logical connection, in fact they do not:
I have been briefed that the authorities have therefore been able to examine the data contained in this file. They have been able to determine that the external hard drive contains approximately 58,000 highly classified UK intelligence documents.
In my opinion, this comes close to being a misleading statement. Oliver Robbins could equally well have expressed himself as follows:
I have been briefed that the authorities have therefore been able to examine the shopping lists and pictures of cats contained in this file. Independently of this, others have been able to determine that the external hard drive contains approximately 58,000 highly classified UK intelligence documents.
And what of that troublesome “58,000… documents” claim? The source for Robbins’ second authority becomes clearer in his next paragraph:
On the basis of GCHQ assessments, the totality of UK intelligence documents that would potentially have been accessible to Mr Snowden while we was working at the NSA is consistent with the volume of documents which we know to be on the external hard drive. [para 14]
This appears to be the best candidate for what the “58,000 documents” figure is actually based on. But what does it amount to? Let’s turn to “the volume of documents which we know to be on the external hard drive” first.
What we know about the external hard drive is that it is divided into at least two encrypted files, one of 20GB which the police are able to access and a further encrypted file (maybe more than one) of 40GB size. Because the police have access to the decrypted 20GB file, they can make an assessment about the number of documents within it (a maximum of 75). All that can be said about the other file(s) is that they have a total size of 40GB.
An encrypted file’s size is not dependent on the amount of data it contains. A 10GB encrypted file could contain 10kb data or 6 GB data – unless you can decrypt the file, you have no way of telling which is the case.
As such, GCHQ’s statement is almost meaningless. You could say that the maximum volume of documents an encrypted file could contain is 40GB – but that’s something you could say of any 40GB encrypted file. GCHQ’s assertion about “the volume of contents which we know to be on the external hard drive” appears to play on an ambiguity in the word volume (one can talk about a volume of documents, but it’s also a synonym for an encrypted file) in order to hide that it has no basis in fact.
In essence, what GCHQ seems to be saying here is that what it assesses to be “the totality of UK intelligence documents… potentially accessible to Mr Snowden” would fit on a 40 GB hard drive. That logic, if applied widely, could lead to an awful lot of Schedule 7 detentions at our airports and it’s an assessment made entirely independently of the Miranda data.
So, where does that leave the “58,000 documents” figure? Nowhere good. It looks like nothing more than a worst-case scenario GCHQ based on guesswork but presented as indubitable fact.
Neither of the witness statements presented by the UK Government in Home Office v Miranda are adequately precise about the matters they raise. Cryptographers have developed a vocabulary that is adequate to expressing these subjects with clarity – when they talk about “plain text” and “cypher text”, others understand what they mean. In contrast, when Caroline Goode and Oliver Robbins use terms like “access” and “analysis” in their statements, there is significant ambiguity in what they mean. This ambiguity leaves real potential for confusion; it also presents unacceptable opportunities for others to be misled.
I am concerned by the extent of the ambiguity in the statements presented in Home Office v Miranda. The UK Government has represented itself in language that is so vague that it may not have a case at all, yet it has presented its case in the strongest way possible – and has been accepted as such, without much demur, in much of the media.
I think it’s worth taking a moment to reflect on this. If a group of witness statements took a similar approach to legal issues as these have to technical ones, if they had eschewed technical terms in favour of ambiguous natural language and took advantage of that fact to obfuscate as these have, I think those imaginary witness statements would have received a much more critical reception. I am concerned that our courtrooms and our newsrooms may not be equipped to cut through some of this confusion and dubious statements may be allowed to stand without receiving proper scrutiny. It is not difficult to see how parties could take advantage of this, if they wished to do so.
While I know what TrueCrypt is, I am by no means a technical expert. My intention in this piece is to show how ambiguous the UK Government’s statements are, rather than put together a definitive account of what happened – I’m not sure that’s even possible on the evidence available.
The Q&As that follow below are an outlet for some of the fun speculative stuff I couldn’t justify putting in this post.
If there’s something you think I’ve got wrong in this piece, I’d be very interested to hear about it. Please email me or leave a comment below.
Have Greenwald, Miranda and Poitras been guilty of “very poor judgement in their security arrangements”?
Travelling with a password written on a piece of paper isn’t great. Transiting through Heathrow may have been inadvisable. But, if – as seems very possible – nothing of significance has been compromised you have to say that, on the face it it, not really.
Given that the Cabinet Office expressed its worries to the Guardian in terms of their ability to protect information from cyber attack, I think it’s relatively clear why the Government would like to cast doubt on others’ security practices if possible.
Is the 20GB encrypted file on the external hard drive a dummy volume intended to be surrendered without cost?
The thought has crossed my mind: it would certainly make it easier to explain why David Miranda was found in possession of an encryption key in a UK transit area. I am not sure it is possible to say for sure on the evidence of the statements presented, but I think this falls within the range of possibilities.
Is it possible that one of the 75 files the police have is an index to the rest?
It is possible – and if the case would make the “58,000 documents” figure much more credible – but I think on the balance of probabilities it is unlikely.
Were GCHQ just plucking a number out of the air with that “58,000 documents” thing?
Not entirely. One possibility is that they’ve plucked a number out of the Guardian.
On 2 August, the Guardian printed a fascinating feature article that is based partly on GCHQ’s internal “GCWiki”, making reference to this and many other GCHQ documents. That, and the discussions we know the Cabinet Office have had with the Guardian may have formed the starting point for GCHQ’s worst-case estimate.
Are you sure? They must know what Snowden has!
If the NSA doesn’t know what Snowden has, there’s no reason why GCHQ should.
Oh come on. if we’ve learned anything from the Snowden files it’s that GCHQ and the NSA have other ways of acquiring this kind of information.
Of course. Whether surveillance information is admissible in court is another matter, though, and one we should probably leave to David Miranda’s capable legal team.
Have the media been negligent in reporting the “58,000 documents” figure as fact?
This post proved to be quite a popular one, with 7250 page views yesterday alone. It also provoked quite a bit of discussion – I’d like to thank all of those whose contributions prompted me to make the following additions to my Q&A section.
Do you think Miranda was using a hidden volume?
It’s certainly a possibility and the first (pre-publication) draft of this post did in fact make that suggestion. Why did I leave it out? Because while the facts in Goode and Robbins’ statements do not exclude the possibility of a hidden volume, they also do not exclude a number of other possibilities. There’s nothing in the statements analysed to rule out the possibility that, for instance, police found a 20GB .tc file and a 40GB .tc file on that external hard drive but can only open the former.
Of course, this is yet another example of how the two witness statements are not adequately precise.
Why do you rule out the possibility that one of the files police have been able to access is an index to the rest?
I don’t rule it out, I say that – on the balance of probabilities – it is unlikely. Some of the reasons why I continue to think this are covered in this storify. Other very relevant points have been made in the comments section below.
Which media sources have used the 58,000 documents claim?
That’s an easy question to answer. A very cursory examination of articles published on this subject will reveal sources which take the “58,000 documents” claims as fact without even mentioning that they originated from a government witness statement (one, two, three, four). The number of sources which note the origins of the claim without subjecting it to any critical assessment is even higher. Critical scrutiny of the Government claims has in fact been strikingly absent, until now.
Has anyone else cast doubt on the Government’s story?
They have – although, as far as I am aware, mine is the only account which goes through the Government witness statements in detail. Links which I could have included in my original post include this piece from Alan Rusbridger and Friday’s statement from David Miranda’s legal team.