Joi Ito's Web

Joi Ito's conversation with the living web.

Recently in the Ethics and Governance in AI Category

Applied Ethical and Governance Challenges in Artificial Intelligence (AI)

Part 3: Intervention

We recently completed the third and final section of our course that I co-taught with Jonathan Zittrain and TA'ed by Samantha Bates, John Bowers and Natalie Saltiel. The plan was to try to bring the discussion of diagnosis and prognosis in for a landing and figure out how to intervene.

The first class of this section (the eighth class of the course) looked at the use of algorithms in decision making. One paper that we read was the most recent in a series of papers by Jon Kleinberg, Sendhil Mullainathan and Cass Sunstein that supported the use of algorithms in decision making such as pretrial risk assessments - the particular paper we read focused on the use of algorithms for measuring the bias of the decision making. Sendhil Mullainathan, one of the authors of the paper joined us in the class. The second paper was by Rodrigo Ochigame, a history and science and technology in society (STS) student who criticized the fundamental premise of reducing notions such as "fairness" to "computationals formalisms" such as algorithms. The discussion which at points took the form of a lively debate was extremely interesting and helped us and the students see how important it is to question the framing of the questions and the assumptions that we often make when we begin working on a solution without coming to a societal agreement of the problem.

In the case of pretrial risk assessments, the basic question about whether rearrests are more of an indicator of policing practice or the "criminality of the individual" fundamentally changes whether the focus should be on the "fairness" and accuracy of the prediction of the criminality of the individual or whether we should be questioning the entire system of incarceration and its assumptions.

At the end of the class, Sendhil agreed to return to have a deeper and longer conversation with my Humanizing AI in Law (HAL) team to discuss this issue further.

In the next class, we discussed the history of causal inference and how statistics and correlation have dominated modern machine learning and data analysis. We discussed the difficulties and challenges in validating causal claims but also the importance of causal claims. In particular, we looked at how legal precedent has from time to time made references to the right to individualized sentencing. Clearly, risk scores used in sentencing that are protected by trade secrets and confidentiality agreements challenge the right to due process as expressed in the Wisconsin v. Loomis case as well as the right to an individualized sentence.

The last class focused on adversarial examples and technical debt - which helped us think about when and how policies and important "tests" and controls can and should be put in place vs when, if ever, we should just "move quickly and break things." I'm not sure if it was the consensus of the class, but I felt that somehow we needed a new design process that allowed for the creation of design stories and "tests" that could be developed by the users and members of the affected communities that were integrated into the development process - participant design that was deeply integrated into something that looked like agile development story and test development processes. Fairness and other contextual parameters are dynamic and can only be managed through interactions with the systems in which the algorithms are deployed. Figuring out a way to somehow integrate the dynamic nature of the social system seems like a possible approach for mitigating a category of technical debt and designing systems untethered from the normative environments in which they are deployed.

Throughout the course, I observed students learning from one another, rethinking their own assumptions, and collaborating on projects outside of class. We may not have figured out how to eliminate algorithmic bias or come up with a satisfactory definition of what makes an autonomous system interpretable, but we did find ourselves having conversations and coming to new points of view that I don't think would have happened otherwise.

It is clear that integrating humanities and social science into the conversation about law, economics and technology is required for us to navigate ourselves out of the mess that we've created and to chart a way forward into a our uncertain future with our increasingly algorithmic societal systems.

- Joi

Syllabus Notes

By Samantha Bates

In our final stage of the course, the intervention stage, we investigated potential solutions to the problems we identified earlier in the course. Class discussions included consideration of the various tradeoffs of implementing potential solutions and places to intervene in different systems. We also investigated the balance between waiting to address potential weaknesses in a given system until after deployment versus proactively correcting deficiencies before deploying the autonomous system.

Class Session 8: Intervening on behalf of fairness

This class was structured as a conversation involving two guests, University of Chicago Booth School of Business Professor Sendhil Mullainathan and MIT PhD student Rodrigo Ochigame. As a class we debated whether elements of the two papers were reconcilable given their seemingly opposite viewpoints.

  • "Discrimination in the Age of Algorithms" by Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Cass R. Sunstein (February 2019).

  • [FORTHCOMING] "The Illusion of Algorithmic Fairness" by Rodrigo Ochigame (2019)

The main argument in "Discrimination in the Age of Algorithms" is that algorithms make it easier to identify and prevent discrimination. The authors point out that current obstacles to proving discrimination are primarily caused by opacity around human decision making. Human decision makers can make up justifications for their decisions after the fact or may be influenced by bias without even knowing it. The authors argue that by making algorithms transparent, primarily through the use of counterfactuals, we can determine which components of the algorithm are causing a biased outcome. The paper also suggests that we allow algorithms to consider personal attributes such as race and gender in certain contexts because doing so could help counteract human bias. For example, if managers consistently give higher performance ratings to male workers over female workers, the algorithm won't be able to figure out that managers are discriminating against women in the workplace if it can't incorporate data about gender. But if we allow the algorithm to be aware of gender when calculating work productivity, it may be able to uncover existing biases and prevent them from being perpetuated.

The second assigned reading, "The Illusion of Algorithmic Fairness," demonstrates that attempts to reduce elements of fairness to mathematical equations have persisted throughout history. Discussions about algorithmic fairness today mirror many of the same points of contention reached in past debates about fairness, such as whether we should optimize for utility or optimize for fair outcomes. Consequently, fairness debates today have inherited some assumptions from these past discussions. In particular, we "take many concepts for granted including probability, risk, classification, correlation, regression, optimization, and utility." The author argues that despite our technical advances, fairness remains "irreducible to a mathematical property of algorithms, independent from specific social contexts." He shows that any attempt at formalism will ultimately be influenced by the social and political climate of the time. Moreover, researchers frequently use misrepresentative, historical data to create "fair" algorithms. The way that the data is framed and interpreted can be misrepresentative and frequently reinforces existing discrimination (for example, predictive policing algorithms predict future policing, not future crime.)

These readings set the stage for a conversation about how we should approach developing interventions. While "Discrimination in the Age of Algorithms" makes a strong case for using algorithms (in conjunction with counterfactuals) to improve the status quo and make it easier to prove discrimination in court, "The Illusion of Algorithmic Fairness" cautions against trying to reduce components of fairness to mathematical properties. The "Illusion of Algorithmic Fairness" paper shows that this is not a new endeavor. Humans have tried to standardize the concept of fairness as early as 1700 and we have proved time and again that determining what is fair and what is unfair is much too complicated and context dependent to model in an algorithm.

Class Session 9: Intervening on behalf of interpretability

In our second to last class, we discussed causal inference, how it differs from correlative machine learning techniques, and its benefits and drawbacks. We then considered how causal models could be deployed in the criminal justice context to generate individualized sentences and what an algorithmically informed individualized sentence would look like.

The Book of Why describes the emerging field of causal inference, which attempts to model how the human brain works by considering cause and effect relationships. The introduction delves a little into the history of causal inference and explains that it took time for the field to develop because it was nearly impossible for scientists to communicate causal relationships using mathematical terms. We've now devised ways to model what the authors call "the do-operator" (which indicates that there was some action/form of intervention that makes the relationship causal rather than correlative) through diagrams, mathematical formulas and lists of assumptions.

One main point of the introduction and the book is that "data are dumb" because they don't explain why something happened. A key component of causal inference is the creation of counterfactuals to help us understand what would have happened had certain circumstances been different. The hope with causal inference is that it will be less impacted by bias because causal inference models do not look for correlations in data, but rather focus on the "do-operator." A causal inference approach may also make algorithms more interpretable because counterfactuals will offer a better way to understand how the AI makes decisions.

The other assigned reading, State of Wisconsin v. Eric Loomis, is a 2016 case about the use of risk assessment tools in the criminal justice system. In Loomis, the court used a risk assessment tool, COMPAS, to determine the defendant's risk of pretrial recidivism, general recidivism, and violent recidivism. The key question in this case was whether the judge should be able to consider the risk scores when determining a defendant's sentence. The State Supreme Court in this case decided that judges could consider the risk score because they also take into account other evidence when making sentencing decisions. For the purposes of this class, the case provided a lede into a discussion about the right to an individualized sentence and whether risk assessment scores can result in more fair outcomes for defendants. However, it turns out that risk assessment tools should not be employed if the goal is to produce individualized sentences. Despite their appearance of generating unique risk scores for defendants, risk assessment scores are not individualized as they compare information about an individual defendant to data about similar groups of offenders to determine that individual's recidivism risk.

Class Session 10: Intervening against adversarial examples and course conclusion

We opened our final class with a discussion about adversarial examples and technical debt before wrapping up the course with a final reflection on the broader themes and findings of the course.

The term "technical debt" refers to the challenge of keeping machine learning systems up to date. While technical debt is a factor in any type of technical system, machine learning systems are particularly susceptible to collecting a lot of technical debt because they tend to involve many layers of infrastructure (code and non code). Technical debt also tends to accrue more in systems that are developed and deployed quickly. In a time crunch, it is more likely that new features will be added without deleting old ones and that the systems will not be checked for redundant features or unintended feedback loops before they are deployed. In order to combat technical debt, the authors suggest several approaches including, fostering a team culture that encourages simplifying systems and eliminating unnecessary features and creating an alert system that signals when a system has run up against pre-programmed limits and requires review.

During the course retrospective, students identified several overarching themes of the class including, the effectiveness and importance of interdisciplinary learning, the tendency of policymakers and industry leaders to emphasize short term outcomes over long term consequences of decisions, the challenge of teaching engineers to consider the ethical implications of their work during the development process, and the lack of input from diverse groups in system design and deployment.

Credits

Syllabus Notes by Samantha L. Bates

Syllabus by Samantha Bates, John Bowers and Natalie Saltiel

This is the second of three parts of the syllabus and summaries prepared by Samantha Bates who TAs the Applied Ethical and Governance Challenges in Artificial Intelligence course which I co-teach with Jonathan Zittrain. John Bowers and Natalie Satiel are also TAs for the course. I posted Part I earlier in the month.

My takeaways:

In Part I, we defined the space and tried to frame and understand some of the problems. We left with concerns about the reductionist, poorly defined and oversimplified notions of fairness and explainability in much of the literature. We also left feeling quite challenged by how the technical community will face new risks such as adversarial attacks and approaches like it.

In Part II, we continue our journey into a sense of despair about AI ethics and governance. In Solon Barocas and Andrew D. Selbst's paper "Big Data's Disparate Impact," they walk us through the state of the law around discrimination and fairness using Title VII of the US Civil Rights Act as an example. The authors show us that while it was enacted to address discrimination concerns raised by the civil rights movement, the law has evolved away from trying to correct societal inequities through remedies such as affirmative action. Instead, the law has focused more and more on fairness of processes and less on redistribution or on resolving historical inequity. As a result, the law has adopted a more technical notion of fairness - a kind of actuarial "all lives matter" sort of approach. During Part I, when we discussed the biased Amazon hiring tool, one of the proposed remedies was to "put our thumb on the scale" and just boost the scores of women and minorities. The Barocas and Selbst paper demonstrates that this type of solution is no longer supported by the law. The sense is that the engineers thought, "of course there must be a law prohibiting discrimination, we can use that." In fact, that law punts on redistribution or societal inequity. Jonathan pointed out that treatment of social inequality in Tort law is similar. If you run over a rich person and a poor person at the same time, you have to pay the rich family more - the calculation of damages is based on the victim's future earning power. Tort law, like Title VII, says, "there may be societal inequities, but we're not solving that problem here."

Sandra Wachter's paper proposing counterfactuals as a way to provide explainability is an excellent idea and feels like one way forward in the explainability debate. However, even Sandra seems concerned about whether laws such as the GDPR will actually be able to require companies to provide such explanations. We also had some concerns about the limits of counterfactuals in identifying biases or providing the "best" answer depending on the person - limits Sandra identifies in her paper.

Finally, we take adversarial attacks from the theoretical to a specific example in a recent paper that Jonathan and I wrote with John Bowers, Samuel Finlayson, Andrew L. Beam, and Isaac S. Kohane about the risks of adversarial attacks on medical AI systems.

Please see Samantha's summaries and links to the readings below for a more complete overview of the three classes in Part II.

- Joi

Part 2: Prognosis

By Samantha Bates

Syllabus Notes: Prognosis Stage

Welcome to part 2 of our Ethical and Governance Challenges in AI syllabus! In part 1, the assigned readings and class discussion focused on understanding how the social, technical, and philosophical roots of autonomous systems contribute to problems related to fairness, interpretability, and adversarial examples. In the second stage of the course, the prognosis stage, the class considered the social implications of these problems. Perhaps the most significant takeaway from this stage was the realization that many of these problems are social or political problems and cannot be addressed through solely a legal or technical approach.

Class Session 5: Prognosticating the impacts of unfair AI

Solon Barocas, an Assistant Professor at Cornell University, joined the class for the first day of the prognosis stage. We discussed his paper, "Big Data's Disparate Impact," which offered a legal and technical perspective on the use of algorithms in employment.

On the first day of the prognosis stage, the focus of the class shifted from examining the technical mechanisms underlying autonomous systems to looking at the societal impact of those systems. The Barocas and Selbst paper discusses how data collection and data labeling can perpetuate existing biases both intentionally and unintentionally. The authors outline five main ways that datasets can be discriminatory:

  1. Our own human biases may be integrated into a dataset when a human data miner determines the parameters that an autonomous system will use to make decisions.

  2. The training data might already be biased depending on how it was collected and how it was labeled.

  3. Data mining models consider a limited number of data points and thus may draw conclusions about an individual or a group of people based on data that is not representative of the subject.

  4. As Cathy O'Neil mentioned, prejudice may be introduced if the data points the model uses to make decisions are proxies for class membership.

  5. Discriminatory data mining could be intentional. However, the authors argue that unintentional discrimination is more common and harder to identify.

While there is legal doctrine that addresses discrimination in employment, the authors demonstrate that it is difficult to apply in practice, particularly in the data mining context. Title VII creates liability for intentional discrimination (disparate treatment) and for unintentional discrimination (disparate impact), but it is difficult to prove either type. For example, in order to hold employers liable for unintentional discrimination, the plaintiff must show that an alternative, nondiscriminatory method exists that will accomplish the same goals as the discriminatory practice. They must also prove that when presented with the alternative, the employer refused to consider it. Typically, an employer can mount a successful defense if they can prove they were unaware of the alternative or if there is a legitimate business reason for policies that may be discriminatory (the business necessity defense).

Bias in data mining is so difficult to identify, prove, and rectify in part because as a society, we have not determined the role of the law in addressing discrimination. According to one theory, the anticlassification theory, the law has an obligation to ensure that decision makers do not discriminate against protected classes in society. The opposing theory, the antisubordination theory, advocates a more hands-on approach and states that the law should work to "eliminate status-based inequality" at the societal level by actively improving the lives of marginalized groups. Our current society favors the anticlassification approach in part because the court established early on that antidiscrimination law was not solely intended to improve access to opportunities for protected classes. And while the authors demonstrate how data mining can exacerbate existing biases in the hiring context, there is a societal trade-off between prioritizing efficient decision making and eliminating bias.

This reading also raises the question of who is responsible for fixing the problem. Barocas and Selbst emphasize that the majority of data mining bias is unintentional and that it may be very difficult to identify bias and employ technical fixes to eliminate it. At the same time, there are political and social factors that make fixing this problem in the legal system equally difficult, so who should be in charge of addressing it? The authors suggest that as a society, we may need to reconsider how we approach discrimination issues more generally.

Class Session 6: Prognosticating the impacts of uninterpretable AI

For our sixth session, the class talked with Sandra Wachter, a lawyer and research fellow at the Oxford Internet Institute, about the possibility of using counterfactuals to make autonomous systems interpretable.

In our last discussion about interpretability, the class concluded that it is impossible to define the term "interpretability" because it greatly depends on the context of the decision and the motivations for making the model interpretable. The Sandra Wachter et al. paper essentially says that defining "interpretability" is not important and that instead we should focus on providing a way for individuals to learn how to change or challenge a model's output. While the authors point out that making these automated systems more transparent and devising some way to hold them accountable will improve the public's trust in AI, they primarily consider how to design autonomous models that will meet the explanation requirements of the GDPR. The paper's proposed solution is to generate counterfactuals for individual decisions (both positive and negative) that "provide reasons why a particular decision was received, offer grounds to contest it, and provide limited 'advice' on how to receive desired results in the future."

Not only would counterfactuals exceed the explainability requirements of the GDPR, the authors argue that counterfactuals would lay the groundwork for a legally binding right to explanation. Due to the difficulty of explaining the technical workings of an automated model to a lay person, legal concerns about protecting trade secrets and IP, and the danger of violating the privacy of data subjects, it has been challenging to provide more transparency around AI decision making. However, counterfactuals can serve as a workaround to these concerns because they indicate how a decision would change if certain inputs had been different rather than disclose information about the internal workings of the model. For example, a counterfactual for a bank loan algorithm might tell someone who was denied a loan that if their annual income had been $45,000 instead of $30,000, they would have received the loan. Without explaining any of the technical workings of the model, the counterfactual in this example can tell the individual the rationale behind the decision and how they can change the outcome in the future. Note that counterfactuals are not a sufficient solution to problems involving bias and unfairness. It may be possible for counterfactuals to provide evidence that a model is biased. However, because counterfactuals only show dependencies between a specific decision and particular external facts, they cannot be relied upon to expose all potential sources of bias or confirm that a model is not biased.

The optional reading, "Algorithmic Transparency for the Smart City," investigates the transparency around the use of big data analytics and predictive algorithms by city governments. The authors conclude that poor documentation and disclosure practices as well as trade secrecy concerns frequently prevented city governments from getting the information they needed to understand how the model worked and its implications for the city. The paper expands upon the barriers to understanding an autonomous model that are mentioned in the Watcher et. al. paper and also presents a great example of scenarios in which counterfactual explanations could be deployed.

Class Session 7: Prognosticating the impacts of adversarial examples

In our third prognosis session, the class continued its discussion about adversarial examples and considered potential scenarios, specifically in medical insurance fraud, in which they could be used to our benefit and detriment.

  • "Adversarial attacks on artificial intelligence systems as a new healthcare policy consideration" by Samuel Finlayson, Joi Ito, Jonathan Zittrain et al., preprint (2019)

  • "Law and Adversarial Machine Learning" by Ram Shankar Siva Kumar et al., ArXiv (2018)

In our previous session about adversarial examples, the class discussion was primarily focused on understanding how adversarial examples are created. The readings delve more into how adversarial examples can be used to our benefit and also to our detriment. "Adversarial attacks on artificial intelligence systems as a new healthcare policy consideration" considers the use of adversarial examples in health insurance fraud. The authors explain that doctors sometimes use a practice called "upcoding", when they submit insurance claims for procedures that are much more serious than were actually performed, in order to receive greater compensation. Adversarial examples could exacerbate this problem. For instance, a doctor could make slight perturbations to an image of a benign mole that causes an insurance company's autonomous billing code infrastructure to misclassify it as a malignant mole. Even as insurance companies start to require additional evidence that insurance claims are valid, adversarial examples could be used to trick their systems.

While insurance fraud is a serious problem in medicine, it is not always clearly fraudulent. There are also cases when doctors might use upcoding to improve a patient's experience by making sure they have access to certain drugs or treatments that would ordinarily be denied by insurance companies. Similarly, the "Law and Adversarial Machine Learning" paper encourages machine learning researchers to consider how the autonomous systems they build can both benefit individual users and also be used against them. The authors caution researchers that oppressive governments may use the tools they build to violate the privacy and free speech of their people. At the same time, people living in oppressive states could employ adversarial examples to evade the state's facial recognition systems. Both of these examples demonstrate that deciding what to do about adversarial examples is not straightforward.

The papers also make recommendations for crafting interventions for problems caused by adversarial examples. In the medical context, the authors suggest that the "procrastination principle," a concept from the early days of the internet that argued against changing the Internet's architecture to preempt problems, might be applicable to adversarial examples as well. The authors caution that addressing problems related to adversarial examples in healthcare too early could create ineffective regulation and prevent innovation in the field. Instead the authors propose extending existing regulations and taking small steps, such as creating "fingerprint" hashes of the data submitted as part of an insurance claim, to address concerns about adversarial examples.

In the "Law and Adversarial Machine Learning" paper, the authors emphasize that lawyers and policymakers need help from machine learning researchers to create the best machine learning policies possible. As such, they recommend that machine learning developers assess the risk of adversarial attacks and evaluate existing defense systems on their effectiveness in order to help policymakers understand how laws may be interpreted and how they should be enforced. The authors also suggest that machine learning developers build systems that make it easier to determine whether an attack has occurred, how it occurred and who might be responsible. For example, designers could devise a system that can "alert when the system is under adversarial attack, recommend appropriate logging, construct playbooks for incident response during an attack, and formulate a remediation plan to recover from an attack." Lastly, the authors remind machine learning developers to keep in mind how machine learning and adversarial examples may be used to both violate and protect civil liberties.

Credits

Notes by Samantha Bates

Syllabus by Samantha Bates, John Bowers and Natalie Saltiel

Jonathan Zittrain and I are co-teaching a class together for the third time. This year, the title of the course is Applied Ethical and Governance Challenges in Artificial Intelligence. It is a seminar, which means that we invite speakers for most of the classes and usually talk about their papers and their work. The speakers and the papers were mostly curated by our amazing teaching assistant team - Samantha Bates, John Bowers and Natalie Satiel.

One of the things that Sam does is help prepare for the class by summarizing the paper and the flow of the class and I realized that it was a waste for this work to just be crib notes for the instructors. I asked Sam for permission to publish the notes and the syllabus on my blog as a way for people to learn some of what we are learning and start potentially interesting conversations.

The course is structured as three sets of three classes on three focus areas. Previous classes were more general overviews of the space, but as the area of research matured, we realized that it would be more interesting to go deep in key areas than to go over what a lot of people probably already know.

We chose three main topics: fairness, interpretability, and adversarial examples. We then organized the classes to hit each topic three times, starting with diagnosis (identifying the technical root of the problem), then prognosis (exploring the social impact of those problems) then intervention (considering potential solutions to the problems we've identified while taking into account the costs and benefits of each proposed solution). See the diagram below for a visual of the structure.

The students in the class are half MIT and half Harvard students with diverse areas of expertise including software engineering, law, policy and other fields. The class has really been great and I feel that we're going deeper on many of the topics than I've ever gone before. The downside is that we are beginning to see how difficult the problems are. Personally, I'm feeling a bit overwhelmed by the scale of the work we have ahead of us to try to minimize the harm to society by the deployment of these algorithms.

We just finished the prognosis phase and are about to start intervention. I hope that we find something to be optimistic about as we enter that phase.

Please find below the summary and the syllabus for the introduction and the first phase - the diagnosis phase - by Samantha Bates along with links to the papers.

The tl;dr summary of the first phase is... we have no idea how to define fairness and it probably isn't reducible to a formula or a law, but it is dynamic. Interpretability sounds like a cool word, but as Zachary Lipton said in his talk to our class, it is a "wastebasket taxon" like the word "antelope" where we call anything that sort of looks like an antelope, an antelope, even if it has really no relationship with other antelopes. A bunch of students from MIT made it very clear to us that we are not prepared for adversarial attacks and that it was unclear whether we could build algorithms that were both robust against these attacks and still functionally effective.

Part 1: Introduction and Diagnosis

By Samantha Bates

Syllabus Notes: Introduction and Diagnosis Stage

This first post summarizes the readings assigned for the first four classes, which encompasses the introduction and the diagnosis stage. In the diagnosis stage, the class identified the core problems in AI related to fairness, interpretability, and adversarial examples and considered how the underlying mechanisms of autonomous systems contributed to those problems. As a result, our class discussions involved defining terminology and studying how the technology works. Included below is the first part of the course syllabus along with notes summarizing the main takeaways from each of the assigned readings.

Class Session 1: Introduction

In our first class session, we presented the structure and motivations behind the course, and set the stage for later class discussions by assigning readings that critique the current state of the field.

Both readings challenge the way Artificial Intelligence (AI) research is currently conducted and talked about, but from different perspectives. Michael Jordan's piece is mainly concerned with the need for more collaboration across disciplines in AI research. He argues that we are experiencing the creation of a new branch of engineering that needs to incorporate non-technical as well as engineering challenges and perspectives. "Troubling Trends in Machine Learning Scholarship" focuses more on falling standards and non-rigorous research practices in the academic machine learning community. The authors rightly point out that academic scholarship must be held to the highest standards in order to preserve public and academic trust in the field.

We chose to start out with readings that critique the current state of the field because they encourage students to think critically about the papers they will read throughout the semester. Just as the readings show that the use of precise terminology and explanation of thought are particularly important to prevent confusion, we challenge students to carefully consider how they present their own work and opinions. The readings set the stage for our deep dives into specific topic areas (fairness, interpretability, adversarial AI) and also set some expectations about how students should approach the research we will discuss throughout the course.

Class Session 2: Diagnosing problems of fairness

For our first class in the diagnosis stage, the class was joined by Cathy O'Neil, a data scientist and activist who has become one of the leading voices on fairness in machine learning.

Cathy O'Neil's book, Weapons of Math Destruction, is a great introduction to predictive models, how they work, and how they can become biased. She refers to flawed models that are opaque, scalable, and have the potential to damage lives (frequently the lives of the poor and disadvantaged) as Weapons of Math Destruction (WMDs). She explains that despite good intentions, we are more likely to create WMDs when we don't have enough data to draw reliable conclusions, use proxies to stand in for data we don't have, and try to use simplistic models to understand and predict human behavior, which is much too complicated to accurately model with just a handful of variables. Even worse, most of these algorithms are opaque, so the people impacted by these models are unable to challenge their outputs.

O'Neil demonstrates that the use of these types of models can have serious unforeseen consequences. Because WMDs are a cheap alternative to human review and decision-making, WMDs are more likely to be deployed in poor areas, and thus tend to have a larger impact on the poor and disadvantaged in our society. Additionally, WMDs can actually lead to worse behavior. In O'Neil's example of the Washington D.C. School District's model that used student test scores to identify and root out ineffective teachers, some teachers changed their students' test scores in order to protect their jobs. Although the WMD in this scenario was deployed to improve teacher effectiveness, it actually had the opposite effect by creating an unintended incentive structure.

The optional reading, "The Scored Society: Due Process for Automated Predictions," discusses algorithmic fairness in the credit scoring context. Like Cathy O'Neil, the authors contend that credit scoring algorithms exacerbate existing social inequalities and argue that our legal system has a duty to change that. They propose opening the credit scoring and credit sharing process to public review while also requiring that credit scoring companies educate individuals about how different variables influence their scores. By attacking the opacity problem that Cathy O'Neil identified as one of three characteristics of WMDs, the authors believe the credit scoring system can become more fair without infringing on intellectual property rights or requiring that we abandon the scoring models altogether.

Class Session 3: Diagnosing problems of interpretability

Zachary Lipton, an Assistant Professor at Carnegie Mellon University who is working intensively on defining and addressing problems of interpretability in machine learning, joined the class on Day 3 to discuss what it means for a model to be interpretable.

Class session three was our first day discussing interpretability, so both readings consider how best to define interpretability and why it is important. Lipton's paper asserts that interpretability reflects a number of different ideas and that its current definitions are often too simplistic. His paper primarily raises stage-setting questions: What is interpretability? In what contexts is interpretability most necessary? Does creating a model that is more transparent or can explain its outputs make it interpretable?

Through his examination of these questions, Lipton argues that the definition of interpretability depends on why we want a model to be interpretable. We might demand that a model be interpretable so that we can identify underlying biases and allow those affected by the algorithm to contest its outputs. We may also want an algorithm to be interpretable in order to provide more information to the humans involved in the decision, to give the algorithm more legitimacy, or to uncover possible causal relationships between variables that can then be tested further. By clarifying the different circumstances in which we demand interpretability, Lipton argues that we can get closer to a working definition of interpretability that better reflects its many facets.

Lipton also considers two types of proposals to improve interpretability: increasing transparency and providing post-hoc explanations. The increasing transparency approach can apply to the entire model (simulatability), meaning that a user should be able to reproduce the model's output if given the same input data and parameters. We can also improve transparency by making the different elements of the model (the input data, parameters, and calculations) individually interpretable, or by showing that during the training stage, the model will come to a unique solution regardless of the training dataset. However, as we will discuss further during the interventions stage of the course, providing more transparency at each level does not always make sense depending on the context and the type of model employed (for example a linear model vs. a neural network model). Additionally, improving the transparency of a model may decrease the model's accuracy and effectiveness. A second way to improve interpretability is to require post-hoc interpretability, meaning that the model must explain its decision-making process after generating an output. Post-hoc explanations can take the form of text, visuals, saliency maps, or analogies that show how a similar decision was reached in a similar context. Although post-hoc explanations can provide insight into how individuals affected by the model can challenge or change its outputs, Lipton cautions that these explanations can be unintentionally misleading, especially if they are influenced by our human biases.

Ultimately, Lipton's paper concludes that it is extremely challenging to define interpretability given how much it depends on external factors like context and the motivations for making a model interpretable. Without a working definition of the term it remains unclear how to determine whether a model is interpretable. While the Lipton paper focuses more on defining interpretability and considering why it is important, the optional reading, "Towards a rigorous Science of Interpretable Machine Learning," dives deeper into the various methods used to determine whether a model is interpretable. The authors define interpretability as the "ability to explain or present in understandable terms to a human" and are particularly concerned about the lack of standards for evaluating interpretability.

Class Session 4: Diagnosing vulnerabilities to adversarial examples

In our first session on adversarial examples, the class was joined by LabSix, a student-run AI research group at MIT that is doing cutting-edge work on adversarial techniques. LabSix gave a primer on adversarial examples and presented some of its own work.

The Gilmer et. al. paper is an accessible introduction to adversarial examples that defines them as "inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake." The main thrust of the paper is an examination of the different scenarios in which an attacker may employ adversarial examples. The authors develop a taxonomy to categorize these different types of attacks: "indistinguishable perturbation, content-preserving perturbation, non-suspicious input, content-constrained input, and unconstrained input." For each category of attack, the authors explore the different motivations and constraints of the attacker. By gaining a better understanding of the different types of attacks and the tradeoffs of each type, the authors argue that the designers of machine learning systems will be better able to defend against them.

The paper also includes an overview of the perturbation defense literature, which the authors criticize for failing to consider adversarial example attacks in plausible, real-world situations. For example, a common hypothetical situation posed in the defense literature is an attacker perturbing the image of a stop sign in an attempt to confuse a self-driving car. The Gilmer et. al. paper; however, points out that the engineers of the car would have considered and prepared for naturally occurring misclassification errors caused by the system itself or real world events (for example, the stop sign could be blown over by the wind). The authors also argue that there are likely easier, non technical methods that the attackers could use to confuse the car, so the hypothetical is not the most realistic test case. The authors' other main critique of the defense literature is that it does not acknowledge how improving certain aspects of a system's defense structure can make other aspects of the system less robust and thus more vulnerable to attack.

The recommended reading by Christian Szegedy et. al. is much more technical and requires some machine learning background to understand all of the terminology. Although it is a challenging read, we included it in the syllabus because it introduced the term "adversarial examples" and laid some of the foundation for research on this topic.



Credits

Figure and Notes by Samantha Bates

Syllabus by Samantha Bates, John Bowers and Natalie Saltiel

During the Long Hot Summer of 1967, race riots erupted across the United States. The 159 riots--or rebellions, depending on which side you took--were mostly clashes between the police and African Americans living in poor urban neighborhoods. The disrepair of these neighborhoods before the riots began and the difficulty in repairing them afterward was attributed to something called redlining, an insurance-company term for drawing a red line on a map around parts of a city deemed too risky to insure.

In an attempt to improve recovery from the riots and to address the role redlining may have played in them, President Lyndon Johnson created the President's National Advisory Panel on Insurance in Riot-Affected Areas in 1968. The report from the panel showed that once a minority community had been redlined, the red line established a feedback cycle that continued to drive inequity and deprive poor neighborhoods of financing and insurance coverage--redlining had contributed to creating poor economic conditions, which already affected these areas in the first place. There was a great deal of evidence at the time that insurance companies were engaging in overtly discriminatory practices, including redlining, while selling insurance to racial minorities, and would-be home- and business-owners were unable to get loans because financial institutions require insurance when making loans. Even before the riots, people there couldn't buy or build or improve or repair because they couldn't get financing.

Because of the panel's report, laws were enacted outlawing redlining and creating incentives for insurance companies to invest in developing inner-city neighborhoods. But redlining continued. To justify their discriminatory pricing or their refusal to sell insurance in urban centers, insurance companies developed sophisticated arguments about the statistical risks that certain neighborhoods presented.

The argument insurers used back then--that their job was purely technical and that it didn't involve moral judgments--is very reminiscent of the arguments made by some social network platforms today: That they are technical platforms running algorithms and should not be, and are not, involved in judging the content. Insurers argued that their job was to adhere to technical, mathematical, and market-based notions of fairness and accuracy and provide what was viewed--and is still viewed--as one of the most essential financial components of society. They argued that they were just doing their jobs. Second-order effects on society were really not their problem or their business.

Thus began the contentious career of the notion of "actuarial fairness," an idea that would spread in time far beyond the insurance industry into policing and paroling, education, and eventually AI, igniting fierce debates along the way over the push by our increasingly market-oriented society to define fairness in statistical and individualistic terms rather than relying on the morals and community standards used historically.

Risk spreading has been a central tenet of insurance for centuries. Risk classification has a shorter history. The notion of risk spreading is the idea that a community such as a church or village could pool its resources to help individuals when something unfortunate happened, spreading risk across the group--the principle of solidarity. Modern insurance began to assign a level of risk to an individual so that others in the pool with her had roughly the same level of risk--an individualistic approach. This approach protected individuals from carrying the expense of someone with a more risk-prone and costly profile. This individualistic approach became more prevalent after World War II, when the war on communism made anything that sounded too socialist unpopular. It also helped insurance companies compete in the market. By refining their risk classifications, companies could attract what they called "good risks." This saved them money on claims and forced competitors to take on more expensive-to-insure "bad risks."

(A research colleague of mine, Rodrigo Ochigame, who focuses on algorithmic fairness and actuarial politics, directed me to historian Caley Horan, who is working on an upcoming book titled Insurance Era: The Privatization of Security and Governance in the Postwar United States that will elaborate on many of the ideas in this article, which is based on her research.)

The original idea of risk spreading and the principle of solidarity was based on the notion that sharing risk bound people together, encouraging a spirit of mutual aid and interdependence. By the final decades of the 20th century, however, this vision had given way to the so-called actuarial fairness promoted by insurance companies to justify discrimination.

While discrimination was initially based on outright racist ideas and unfair stereotypes, insurance companies evolved and developed sophisticated-seeming calculations to show that their discrimination was "fair." Women should pay more for annuities because statistically they lived longer, and blacks should pay more for damage insurance when they lived in communities where crime and riots were likely to occur. While overt racism and bigotry still exist across American society, in insurance it has been integrated into and hidden from the public behind mathematics and statistics that are so difficult for nonexperts to understand that fighting back becomes nearly impossible.

By the late 1970s, women's activists had joined civil rights groups in challenging insurance redlining and risk-rating practices. These new insurance critics argued that the use of gender in insurance risk classification was a form of sex discrimination. Once again, insurers responded to these charges with statistics and mathematical models. Using gender to determine risk classification, they claimed, was fair; the statistics they used showed a strong correlation between gender and the outcomes they insured against.

And many critics of insurance inadvertently bought into the actuarial fairness argument. Civil rights and feminist activists in the late 20th century lost their battles with the insurance industry because they insisted on arguing about the accuracy of certain statistics or the validity of certain classifications rather than questioning whether actuarial fairness--an individualistic notion of market-driven pricing fairness--was a valid way of structuring a crucial and fundamental social institution like insurance in the first place.

But fairness and accuracy are not necessarily the same thing. For example, when Julia Angwin pointed out in her ProPublica report that risk scores used by the criminal justice system were biased against people of color, the company that sold the algorithmic risk score system argued that its scores were fair because they were accurate. The scores accurately predicted that people of color were more likely to reoffend. This likelihood of reoffense, called the recidivism rate, is the likelihood that someone recommits a crime after being released, and the rate is calculated primarily using arrest data. But this correlation contributes to discrimination, because using arrests as a proxy for recommitting a crime means the algorithm is codifying biases in arrests, such as a police officer bias to arrest more people of color or to patrol more heavily in poor neighborhoods. This risk of recidivism is used to set bail and determine sentencing and parole, and it informs predictive policing systems that direct police to neighborhoods likely to have more crime.

There are several obvious problems with this. If you believe the risk scores are accurate in predicting the future outcomes of a certain group of people, then it means it's "fair" that a person is more likely to spend more time in jail simply because they are black. This is actuarially "fair" but clearly not "fair" from a social, moral, or anti-discrimination perspective.

The other problem is that there are fewer arrests in rich neighborhoods, not because people there aren't smoking as much pot as in poor neighborhoods but because there is less policing. Obviously, one is more likely to be rearrested if one lives in an overpoliced neighborhood, and that creates a feedback loop--more arrests mean higher recidivism rates. In very much the same way that redlining in minority neighborhoods created a self-fulfilling prophecy of uninsurable communities, overpolicing and predictive policing may be "fair" and "accurate" in the short term, but the long-term effects on communities have been shown to be negative, creating self-fulfilling prophecies of poor, crime-ridden neighborhoods.

Angwin also showed in a recent ProPublica report that, despite regulations, insurance companies charge minority communities higher premiums than white communities, even when the risks are the same. The Spotlight team at The Boston Globe reported that the household median net worth in the Boston area was $247,500 for whites and $8 for nonimmigrant blacks--the result of redlining and unfair access to housing and financial services. So while redlining for insurance is not legal, when Amazon decides to provide Amazon Prime free same-day shipping to its "best" customers, it's effectively redlining--reinforcing the unfairness of the past in new and increasingly algorithmic ways.

Like the insurers, large tech firms and the computer science community also tend to frame "fairness" in a depoliticized, highly technical way involving only mathematics and code, which reinforces a circular logic. AI is trained to use the outcomes of discriminatory practices, like recidivism rates, to justify continuing practices such as incarceration or overpolicing that may contribute to the underlying causes of crime, such as poverty, difficulty getting jobs, or lack of education. We must create a system that requires long-term public accountability and understandability of the effects on society of policies developed using machines. The system should help us understand, rather than obscure, the impact of algorithms on society. We must provide a mechanism for civil society to be informed and engaged in the way in which algorithms are used, optimizations set, and data collected and interpreted.

The computer scientists of today are more sophisticated in many ways than the actuaries of yore, and they often sincerely are trying to build algorithms that are fair. The new literature on algorithmic fairness usually doesn't simply equate fairness with accuracy, but instead defines various trade-offs between fairness and accuracy. The problem is that fairness cannot be reduced to a simple self-contained mathematical definition--fairness is dynamic and social and not a statistical issue. It can never be fully achieved and must be constantly audited, adapted, and debated in a democracy. By merely relying on historical data and current definitions of fairness, we will lock in the accumulated unfairnesses of the past, and our algorithms and the products they support will always trail the norms, reflecting past norms rather than future ideals and slowing social progress rather than supporting it.