Making Hard Choices: The Quest for Ethics in Machine Learning
November 23, 2016
Editor’s note: We are just starting to recognize the pervasive, yet subtle, ways that machine learning algorithms influence our lives, from job recommendations to self-driving cars. The use of the these algorithms introduces difficult ethical problems for those involved in the creation of these systems: How can developers know that their systems are ethical and don’t reinforce the biases already present in society or introduce new, unique biases? How can companies create an environment that encourages the exploration of these ethical questions? What lessons learned can software engineers borrow from other professions to inform their own understanding of these issues?
In this primer, LinkedIn VP of Engineering Igor Perisic outlines some initial steps that can be taken to instill a culture of ethical behavior among machine learning researchers and discusses some of the hard questions that will need to be addressed in the future.
In Silicon Valley, many companies aspire to the ideal of an ethical company. You can see this in company mottos, such as “Don’t Be Evil,” or in the social responsibility efforts espoused by many peer tech companies. On a deeper level, though, the behavior of companies like Google, Facebook, LinkedIn, and others is increasingly governed by the machine-learned systems they build to run their businesses. These companies are now starting to ask themselves how they can make an informed decision about how they operate their machine learning systems in an ethical manner, instead of being driven solely by revenue or some more abstract success metric.
But we, as developers, are not off the hook. Since our code makes it all happen, we are in a situation where we need to consider the ethics of what we are building and running. If you needed to be convinced that these kinds of discussions are important to you as a developer—and to our society—then you should look no further than the Apple versus the FBI debate about iPhone encryption.
Ethics, as software “eats the world”
From search engines to self-driving cars, artificial intelligence and machine learning (and by extension: bots, autonomous agents, etc.) will impact the society around you in some way. However, when you look at any of the curricula for machine learning or computer science majors, ethics and philosophy courses are notably absent. Going back through college, how many computer science programs offered or required courses on these subjects? Thinking further back in your education, how many high schools actually had or required you to take a course on these subjects?
A contrast could be drawn, I think, to other fields where ethics is very much a part of the conversation at the undergraduate or graduate level. In biology or the medical sciences, for instance, ethics courses are mandatory. There it makes sense—“first, do no harm,” is a phrase that has been used by doctors since 5th century B.C. It’s widely accepted that a doctor doing drug testing needs to understand that the process is not just about finding a new treatment, but that it also needs to be respectful of the dignity of human (or animal) nature.
Journalists, lawyers, and many other professions have a similar grounding in ethical foundations. In many countries, bar associations are given the power to both set a standard for ethical behavior among lawyers and to disbar lawyers who violate that code of conduct. Journalistic ethics vary more widely by country and news organization, but certain practices (such as the protection of sources) are widely accepted as de facto standards, and in some cases, are protected under international law. Notably, these ethical standards have co-evolved with societal expectations for professionals in these roles. However, expectations for ethics in the practice of computer science are only beginning to emerge. Many developers, instead of making their own ethical judgements, instead defer that responsibility to their employers.
Ethical questions are not easy to answer, and can wind us into a knot. To resolve them, one cannot hide behind standard, widely-accepted practices. On a personal note, reading the “withholding lifesaving treatment” arguments that surrounded the famous Harvard Neonatal ECMO (extracorporeal membrane oxygenation) clinical trial is how I learned this hard lesson.
In our own profession, we now have a situation where many individuals who are creating the systems that will shape society are not themselves always informed about the way their actions impact the world and others. Is an algorithm still just an algorithm when it can recommend a given job to millions of people…or not? Until very recent times, even practicing philosophers could not agree that the use of software creates unique ethical dilemmas, in contrast to those posed by weapons or medicine—topics that have been discussed for thousands of years. But with the ubiquity of software-led decision-making, filtering, and other relevance models, today’s software can have a similar impact on the lives of everyday people. Also, the datasets these systems may leverage can often reflect societal trends and biases in the real world.
While I am not advocating for one specific ethical stance over another, I am advocating for the requirement of being able to reason in this space. It’s true that an education in ethics is not enough to ensure that engineers build ethical systems, but it provides a foundation from which they can understand these issues.
Making algorithms accountable
At LinkedIn, we have the opportunity to positively influence the career decisions of more than 460 million members. Among the many uses of machine learning at LinkedIn is presenting professionals with an optimized list of jobs that may interest them, based on their career, job title, interests, and many other features in our data set. For us, the power of an algorithm to show a given job to one population and not another has lead to many discussions and much soul-searching as to how we debias our algorithms. In fact, we have made conscious efforts not to track information (for instance, gender or political orientation) that we felt could be used at some point to bias the decision-making models that recommend jobs to members.
But rendering an algorithm “blind” does not mean that it has been made free from bias. Recently at LinkedIn, there was an unfortunate controversy where our search terms were showing “male” spelling suggestions for searches using stereotypically female names (an error that was regrettable, and which we quickly corrected). I know first-hand that the head of the search engineering team at LinkedIn who developed these algorithms is a very principled scientist. Furthermore, I can attest to his fundamental belief that the “first, do no harm” principle is an important part of the ethics of running these kinds of systems. However, having this algorithm serve suggestions based solely on word search frequency, without being aware of gender, actually resulted in biased results. In retrospect it is obvious that, by removing gender from our consideration, our algorithms were actually blind to it. Since we weren’t tracking that information in the first place, we couldn’t use it to verify that the output of the algorithms were, in fact, unbiased.
An additional problem is that there is no way to simply examine a machine learning algorithm in isolation and determine if it will deliver biased results. Computer code is necessarily an abstraction. Bias itself exists only within a broader cultural context and must be examined through that same lens. So how do we create a more ethical system? While we do not yet have the full answer, we are looking at solutions.
One key area of struggle remains the desire to not track information that could cause bias if used in building models (such as gender, sexual orientation, age, etc.). This stems from the realization that once you track or derive features from such signals, you have opened a potential Pandora’s Box. Quite simply, it would be extremely difficult to track down when they are used to train models, and the effects that type of data would have in those models. The conundrum, however, is that this same data may be required in order to validate that the output from your machine-learned models is, in fact, free from bias.
For example, let’s say that a deep learning system ended up extracting age as a valuable feature that could be used to predict future CEOs. After all, the likelihood that new college grad would take a CEO job is lower than for individuals who have more years of experience. The consequence is that when it comes to two equally-qualified “experienced” candidates, the system might tend to recommend the older of the two, thereby discriminating based on age. This is obviously not good. However, it would be unlikely that this bias would ever be identified, unless researchers were to go back and test the system’s results against a dataset that included candidate age information.
While there are no easy answers, it’s clear that good intentions are not enough. If we look again at the history of other professions (doctors, lawyers, journalists), their ethical foundations were built on a long and sometimes messy struggle with the role they played within a larger society. At the very least, it is time for computer scientists and machine learners to begin asking themselves the same hard questions that these other professions have already addressed.