Cambridge Analytica, GDPR and the Future of APIs
Anyone who follows the news at all will have heard the story of Cambridge Analytica who collected the data of 50 million Facebook users without their knowledge and consent. A lot has been written about it, with different actors trying to give a different spin to the story, showing Facebook as either victim or accomplice to Cambridge Analytica. As a result, people might become more mindful about the ways they use Facebook, and how much data they share with the company, some even go as far as to #DeleteFacebook, but those will probably be few (we’re also not leaving, the CloudObjects Facebook page is still there). You can also expect that governments will tighten regulation on the social media giant as well as companies with a similar business model and wealth of data.
I believe it’s important to follow the chain of events from a technical perspective, to gain an understanding of what went wrong and how to stop these kinds of privacy violations.
So here is the gist of what happened: On behalf of Cambridge Analytica, researchers developed a Facebook app, a personality quiz. Users utilized their Facebook accounts to sign in to this app, and, in this process, permitted the app to read much of their data, including detailed information about their friends, through the Facebook Graph API. In itself, this was neither a hack nor data breach. It was a legitimate use of Facebook’s platform through an official channel at the time, as Facebook’s API allowed users to allow apps to not only access data about them but also give consent on behalf of their friends. This possibility no longer exists to this extent as Facebook has continued to lock down the amount of data available through the API when it comes to information about friends not using an app. When Facebook noticed that the app used this possibility excessively, they informed them. The researchers gave the incorrect statement that they use it for academic purposes only and Facebook accepted. Later, when Facebook found out about this lie, they requested deletion of the data, but apparently haven’t followed up on this request and disseminated no information about potential abuse of information to the affected users or the general public.
There were a few published reactions to the scandal that motivated me to write this article. Obviously, I’m very excited about APIs in general (otherwise I wouldn’t build CloudObjects and write this blog) and their potential to empower businesses, but also individuals. In a nutshell, APIs are all about exchanging data with as little friction as possible, and as such, they are a tool with applications that we can consider either good or bad. Good, if they empower “data subjects” to gain insight into the information that organizations collect about them and consensually leverage use of this data for their individual benefit. Bad, if vast amounts of personal data are moved around without transparency and end up in the wrong hands.
OAuth is a technology that establishes consent around APIs by issuing private access tokens to users with a narrow and explicit scope. Through a hint from Aaron Parecki, an IndieWeb advocate and maintainer of the OAuth specification, I came across an article in The Herald Scotland that could leave people unfamiliar with the technology with the impression that OAuth itself is part of the problem.
Another sentiment uttered in a tweet from Parker Thompson as well as in an article by German blogger Marcel Weiß was that, in fact, the whole affair strengthens Facebook’s position as a silo that locks down personal data against potential new and smaller players in the market.
It’s true: If you followed the API programs from large social networks for years, it is an observable trend that they continue to limit access to data via APIs. Ironically they can sell this as a privacy improvement because there are fewer options for other’s to build on their data and potentially abuse it, while at the same time keeping all the “Big Data” to themselves and limiting third parties from developing innovative add-ons.
So, what does the GDPR have to do with this? The General Data Protection Regulation (GDPR) applies to personal data from citizens of the European Union and people located on European soil, no matter where the data processing takes place, starting May 25th. If companies do not want to exclude half a billion people in one the world’s largest markets or implement two completely separate systems for the handling of personal data they have to deal with the GDPR and apply it to their whole platform.
The regulation forces companies to be more transparent about the information they collect and process, and they must explain everything to their users in clear language instead of hiding behind privacy policies written in legalese. Developers should apply these requirements to OAuth dialogs and educate their users about what happens to ensure that they give informed consent, even if it might hurt conversion rate in the short term. Also, if a breach occurs, corporations can no longer keep it to themselves but must be transparent about this, too.
Another requirement of the GDPR is to ensure insight, removability, and portability of personal data, which means not only offering export facilities but also assisting users in moving their data from one site to another, including competitors. These are perfect scenarios for consuming and providing APIs. Along with another European regulation directive, the PSD2, which makes it mandatory for banks to offer APIs, the GDPR could be beneficial for the whole API ecosystem. An active API ecosystem implies more possibilities for consumers to mix and match services from different providers and offers opportunities for smaller players to build on top of large platforms as well as each other. Parker’s suggestion that Facebook will “strengthen its walls” in the wake of the recent revelations may be correct on its own, but I assume that the GDPR will be their antagonist rather than their ally with this approach.
One thing, however, that is not as clear at the moment, is how the rights of an individual to their data will be balanced with the rights of other data subjects such as their friends and colleagues. For example: Who’s the rightful owner of my friend list on Facebook, is it just me or need all of my friends give consent to their inclusion when I export a file or make an API call? A lot of personal information is associated with more than one individual. Facebook users can already opt out of all platform usage and never appear in API data, but very few do. What will happen if it’s turned off by default and users must opt-in? There are a lot of questions which do not have answers right now, and there’s a chance that Marcel is correct, at least in the short term, when he assumes that GDPR makes it harder to export a social graph, often the most valuable asset in any social web service. In the long term, however, I hope that both GDPR as well as the latest scandal help us have a conversation about privacy and security that, after a few bumps along the road, leads to beneficial outcomes for both businesses and consumers. It’s essential, though, that people, organizations and politicians know what’s at stake and refrain from rash decisions that are well-intended but end up having opposite effects.
As Kin Lane, the API Evangelist, reminds us in one of his recent posts, it’s an excellent time for API providers to gain some insight on which data they collect and what the implications for their user’s privacy are.