The coming era of data and software transparency

Algorithms are shaping what we see and think -- even what our futures hold. The order of Google's search results, the people Twitter recommends us to follow, or the way Facebook filters our newsfeed can impact our perception of the world and drive our actions. But think about it: we have very little insight into how these algorithms work or what data is used. Given that algorithms guide much of our lives, how do we know that they don't have a bias, withhold information, or have bugs with negative consequences on individuals or society? This is a problem that we aren't talking about enough, and that we have to address in the next decade.

Open Sourcing software quality

In the past several weeks, Volkswagen's emissions crisis has raised new concerns around "cheating algorithms" and the overall need to validate the trustworthiness of companies. One of the many suggestions to solve this problem was to open-source the software around emissions and automobile safety testing (Dave Bollier's post about the dangers of proprietary software is particularly good). While open-sourcing alone will not fix software's accountability problems, it's certainly a good start.

As self-driving cars emerge, checks and balances on software quality will become even more important. Companies like Google and Tesla are the benchmarks of this next wave of automotive innovation, but all it will take is one safety incident to intensify the pressure on software versus human-driven cars. The idea of "autonomous things" has ignited a huge discussion around regulating artificially intelligent algorithms. Elon Musk went as far as stating that artificial intelligence is our biggest existential threat and donated millions to make artificial intelligence safer.

While making important algorithms available as Open Source does not guarantee security, it can only make the software more secure, not less. As Eric S. Raymond famously stated "given enough eyeballs, all bugs are shallow". When more people look at code, mistakes are corrected faster, and software gets stronger and more secure.

Less "Secret Sauce" please

Automobiles aside, there is possibly a larger scale, hidden controversy brewing on the web. Proprietary algorithms and data are big revenue generators for companies like Facebook and Google, whose services are used by billions of internet users around the world. With that type of reach, there is big potential for manipulation -- whether intentional or not.

There are many examples as to why. Recently Politico reported on Google's ability to influence presidential elections. Google can build bias into the results returned by its search engine, simply by tweaking its algorithm. As a result, certain candidates can display more prominently than others in search results. Research has shown that Google can shift voting preferences by 20 percent or more (up to 80 percent in certain groups), and potentially flip the margins of voting elections worldwide. The scary part is that none of these voters know what is happening.

And, when Facebook's 2014 "emotional contagion" mood manipulation study was exposed, people were outraged at the thought of being tested at the mercy of a secret algorithm. Researchers manipulated the news feeds of 689,003 users to see if more negative-appearing news led to an increase in negative posts (it did). Although the experiment was found to comply with the terms of service of Facebook's user agreement, there was a tremendous outcry around the ethics of manipulating people's moods with an algorithm.

In theory, providing greater transparency into algorithms using an Open Source approach could avoid a crisis. However, in practice, it's not very likely this shift will happen, since these companies profit from the use of these algorithms. A middle ground might be allowing regulatory organizations to periodically check the effects of these algorithms to determine whether they're causing society harm. It's not crazy to imagine that governments will require organizations to give others access to key parts of their data and algorithms.

Ethical early days

The explosion of software and data can either have horribly negative effects, or transformative positive effects. The key to the ethical use of algorithms is providing consumers, academics, governments and other organizations access to data and source code so they can study how and why their data is used, and why it matters. This could mean that despite the huge success and impact of Open Source and Open Data, we're still in the early days. There are few things about which I'm more convinced.

Comments

Matt (not verified):

Did the Facebook study really attempt to manipulate moods, or did it just measure the degree to which moods were being manipulated by posts they would have seen anyway?

Ivan Boothe (not verified):

Let me Google that for you:

"How Facebook can alter your mood: Concerns raised over study of ‘massive-scale emotional contagion’"
http://www.geekwire.com/2014/facebook-can-alter-mood-concerns-raised-st…

"...scientists commissioned by social media behemoth Facebook, in the US, have revealed research they say proves our emotions can be toyed with in the virtual world, on social media, where so many of us spend so much of our time."
http://www.news.com.au/technology/can-we-still-be-friends-with-facebook…

"A newly published paper reveals that scientists at Facebook conducted a massive psychological experiment on hundreds of thousands of users by tweaking their feeds and measuring how they felt afterward. In other words, Facebook decided to try to manipulate some people's emotional states -- for science."
http://www.huffingtonpost.com/2014/06/29/facebook-experiment-psychologi…

"Facebook manipulated the emotions of hundreds of thousands of its users, and found that they would pass on happy or sad emotions, it has said."
http://www.independent.co.uk/life-style/gadgets-and-tech/facebook-manip…

"The study intentionally tried to manipulate people’s emotional states, which for academic researchers constitutes 'human subject research': a systematic investigation to develop generalizable knowledge by gaining data about living individuals through intervention."
http://codingconduct.tumblr.com/post/90242838320/frame-clashes-or-why-t…

"A study recently published by researchers at Facebook and Cornell suggests that social networks can manipulate the emotions of their users by tweaking what is allowed into a user’s news feed."
http://www.nytimes.com/2014/07/01/opinion/jaron-lanier-on-lack-of-trans…

I'm not sure how you missed that pretty basic part of the whole debacle. Facebook researchers very specifically tried to alter people's mood based on showing more or fewer posts that they judged to be positive or negative -- not posts "they would have seen anyway." They manipulated the posts the posts they saw and were very clear that their objective was to manipulate people's moods as a result. They then measured the extent to which their manipulation was successful.

Alla (not verified):

I agree with your idea that the whole trend of data usage and predictions out of it should be somehow verified. However, behind any organization or business are people, and it's important to make sure that those who verify algorithms are society-conscious. Otherwise, we will end up with people behind organisations with legitimate power of 'verifying' algorithms in their hands and making a final call. Balancing the level of transparency with level of power is quite a challenge.

Dries:

Well said, Alla. Finding the right balance between all the competing forces (transparency, intellectual property, making money, etc), is the real challenge.

John Walling (not verified):

Algorithms are common denominator but the dividend is huge: medicine, elections, stock trading, robots, war machines, car controls, marketing, supply chains, utilities, navigation, pollution controls,... It boggles the mind.

Dries:

This recent article called "Convicted by Code", provides more examples of software that could benefit from being Open Source. The article talks about the growing number of investigative and forensic software/hardware, from DNA testing to facial recognition to breathalyzers, that can't be audited. The result is that we could have forensic evidence that is potentially unjust as no software is immune to bugs, and that could negatively impact defendants. Is there any reason why software like this shouldn't be Open Source, or at least publicly audit-able?