Are Covid-19 contact apps data safe?
The NHS contact tracing app is causing controversy. It's a departure from the contact-tracing tools Apple and Google have developed, which come with strings attached that the NHS and some governments would like to avoid...
Contact-tracing apps running on smartphones have the potential to provide excellent support to traditional manual epidemiological contact tracing techniques. To achieve this, they will need to perform two really useful tasks:
- They will detect and log when they are in close proximity to another device, such that a level of exposure can be calculated.
- They will provide contact tracers with a log of all the places that the "infected" party has been in the last few weeks, in order to track down further interactions or exposures.
These tasks are very useful in speeding up what is usually a highly laborious contact-tracing process that accomplishes these aims though detailed interviews with cases, and various other methods of investigation. In an ideal world, smartphone contact-tracing apps would perform both of these tasks. But we do not live in an ideal world, and many people are rightly worried about misuse of the data that these apps gather about their activities and contacts. And this is where the headaches begin.
The NHS, and the UK government more broadly, would like to perform both of the above tasks (contact identification and location tracking) using their Covid-19 tracking app. They would also like to be able to access all of the data centrally in order to study it, map the spread of the virus across the country, spot trends and tweak the algorithms that define what constitutes "an exposure". But, again, others view this as dangerous and a prime opportunity for the data to be stolen, leaked and abused. So, categories are now beginning to form for contact tracking apps:
- Category A: Focusses on reliable and robust contact tracing. These apps perform both proximity and localisation tasks, and they would have a centralised data storage to allow data analysis and to prevent the loss of any individual’s exposure records if their phone was lost, stolen, or broken.
- Category B: Focusses on privacy at the expense of the utility of the app for contact tracing. These apps only perform task 1 (proximity detection) and work in a decentralised manner: you hold your own data at all times and it cannot be used for any data analysis, model tuning or improvements to the system. Due to the lack of location data, these apps are fundamentally less useful for contact tracing than category A apps.
Google and Apple favour a Category B system, and they are trying to enforce directly that only Category B systems can be developed on their platforms and devices. They have developed new protocols that can run on their phones which will allow the Bluetooth-based proximity detections to run better in the background of the phones, saving battery life and ensuring that the devices are constantly looking for each other as desired. However they are going to prevent any apps that use these new protocols from accessing the location capabilities of the phone. This is unprecedented. There has never been a restriction placed on a future app that prevents it from accessing location.
For now, the NHS are trying to use workarounds to provide a Category A contact tracing app, albeit with reduced use of location data - they want to know only the first few characters of your postcode so they can understand disease spread across the country. But we will see how long this fight lasts before the NHS gives in and converts to a Category B app. Perhaps the NHS should just create both apps and let each member of the public decide which one they want to use.
Let’s say Bob runs both types of app on their phone, and then tests positive for coronavirus, here’s how the future unfolds for both app categories:
- Category A: Bob presses the “I’m symptomatic” button on their app. This notification goes to the central database along with the encrypted records stored on Bob’s phone of all of the Bluetooth IDs of other people’s phones that they have seen in the last 2 weeks. The central database then determines all of the phones that were exposed to Bob for long enough to be a concern, and then contacts those people to give them advice. The central system, such as the NHS, can analyse the data to help understand more about the spread of the contagion, use the information to feed machine learning tools in order to make the contact tracing apps smarter, and so on. For apps that are also storing location, the central database can use this information to determine which parts of the country are seeing more spread than others, potentially deploying more ventilators, PPE or even staff to regions that are seeing a spike in exposures and contacts.
- Category B: Bob presses the “I’m symptomatic” button on their app. This notification still goes to a central server, but all this server does is rebroadcast this notification - that Bob’s anonymised Bluetooth Covid Identifier is now flagged as infected - to all of the other app users. All of the phones then search their own records to see if they have seen Bob’s anonymised ID before. Therefore, in this scheme, each person’s own records of which devices they have encountered before are known only to their own app, not to a central database. This method uses only the most basic of central servers to relay messages and allocate unique identifiers; it does not necessarily store any data.
Can Bob be specifically identified in either case? Every device will need a unique identifier of some kind to function properly, but there is no need to link this identifier to the phone number or the person using the phone for Category B apps, where the big list of infected but anonymous IDs is regularly broadcast to everyone. However, in Category A apps, the central server determines which devices have become exposed to the infection and notifies only those devices. This step may represent a potential privacy issue, because the notifications are sent out to specific, potentially identifiable, devices. Further to this, the use of location by Category A apps, either fine scale GPS traces, or coarse location computed from partial postcodes, represents a further privacy concern by linking the presence of infections to specific locations. Category A apps provide much more information for the central authority to use for preventing or coping with the spread of the disease. But ensuring the security of that data is very important if we believe malicious parties could exploit the information, were they to gain access to it.
What is being overlooked here is the usefulness of the app to the end users, and the level of trust they will be able to place in it. Too many false positive reports, poor messaging as to how the app works, and bad publicity (rightly or wrongly) will all be factors in how successful the app is for contact tracing, regardless of which category is employed.