The coming era of data and software transparency

October 6, 2015Open Source, Policy, Open Web

Algorithms are shaping what we see and think – even what our futures hold. The order of Google's search results, the people Twitter recommends us to follow, or the way Facebook filters our newsfeed can impact our perception of the world and drive our actions. But think about it: we have very little insight into how these algorithms work or what data is used. Given that algorithms guide much of our lives, how do we know that they don't have a bias, withhold information, or have bugs with negative consequences on individuals or society? This is a problem that we aren't talking about enough, and that we have to address in the next decade.

Open Sourcing software quality

In the past several weeks, Volkswagen's emissions crisis has raised new concerns around "cheating algorithms" and the overall need to validate the trustworthiness of companies. One of the many suggestions to solve this problem was to open-source the software around emissions and automobile safety testing (Dave Bollier's post about the dangers of proprietary software is particularly good). While open-sourcing alone will not fix software's accountability problems, it's certainly a good start.

As self-driving cars emerge, checks and balances on software quality will become even more important. Companies like Google and Tesla are the benchmarks of this next wave of automotive innovation, but all it will take is one safety incident to intensify the pressure on software versus human-driven cars. The idea of "autonomous things" has ignited a huge discussion around regulating artificially intelligent algorithms. Elon Musk went as far as stating that artificial intelligence is our biggest existential threat and donated millions to make artificial intelligence safer.

While making important algorithms available as Open Source does not guarantee security, it can only make the software more secure, not less. As Eric S. Raymond famously stated "given enough eyeballs, all bugs are shallow". When more people look at code, mistakes are corrected faster, and software gets stronger and more secure.

Less "Secret Sauce" please

Automobiles aside, there is possibly a larger scale, hidden controversy brewing on the web. Proprietary algorithms and data are big revenue generators for companies like Facebook and Google, whose services are used by billions of internet users around the world. With that type of reach, there is big potential for manipulation – whether intentional or not.

There are many examples as to why. Recently Politico reported on Google's ability to influence presidential elections. Google can build bias into the results returned by its search engine, simply by tweaking its algorithm. As a result, certain candidates can display more prominently than others in search results. Research has shown that Google can shift voting preferences by 20 percent or more (up to 80 percent in certain groups), and potentially flip the margins of voting elections worldwide. The scary part is that none of these voters know what is happening.

And, when Facebook's 2014 "emotional contagion" mood manipulation study was exposed, people were outraged at the thought of being tested at the mercy of a secret algorithm. Researchers manipulated the news feeds of 689,003 users to see if more negative-appearing news led to an increase in negative posts (it did). Although the experiment was found to comply with the terms of service of Facebook's user agreement, there was a tremendous outcry around the ethics of manipulating people's moods with an algorithm.

In theory, providing greater transparency into algorithms using an Open Source approach could avoid a crisis. However, in practice, it's not very likely this shift will happen, since these companies profit from the use of these algorithms. A middle ground might be allowing regulatory organizations to periodically check the effects of these algorithms to determine whether they're causing society harm. It's not crazy to imagine that governments will require organizations to give others access to key parts of their data and algorithms.

Ethical early days

The explosion of software and data can either have horribly negative effects, or transformative positive effects. The key to the ethical use of algorithms is providing consumers, academics, governments and other organizations access to data and source code so they can study how and why their data is used, and why it matters. This could mean that despite the huge success and impact of Open Source and Open Data, we're still in the early days. There are few things about which I'm more convinced.

— Dries Buytaert