Joi Ito's Web

Joi Ito's conversation with the living web.

Albert-László Barabási PhotologAlbert-László BarabásiMon, Dec 31, 16:14 UTC


Like most parents of young children, I've found that determining how best to guide my almost 2-year-old daughter's relationship with technology--especially YouTube and mobile devices--is a challenge. And I'm not alone: One 2018 survey of parents found that overuse of digital devices has become the number one parenting concern in the United States.

Empirically grounded, rigorously researched advice is hard to come by. So perhaps it's not surprising that I've noticed a puzzling trend in my friends who provide me with unsolicited parenting advice. In general, my most liberal and tech-savvy friends exercise the most control and are weirdly technophobic when it comes to their children's screen time. What's most striking to me is how many of their opinions about children and technology are not representative of the broader consensus of research, but seem to be based on fearmongering books, media articles, and TED talks that amplify and focus on only the especially troubling outcomes of too much screen time.

I often turn to my sister, Mimi Ito, for advice on these issues. She has raised two well-adjusted kids and directs the Connected Learning Lab at UC Irvine, where researchers conduct extensive research on children and technology. Her opinion is that "most tech-privileged parents should be less concerned with controlling their kids' tech use and more about being connected to their digital lives." Mimi is glad that the American Association of Pediatrics (AAP) dropped its famous 2x2 rule--no screens for the first two years, and no more than two hours a day until a child hits 18. She argues that this rule fed into stigma and parent-shaming around screen time at the expense of what she calls "connected parenting"--guiding and engaging in kids' digital interests.

One example of my attempt at connected parenting is watching YouTube together with Kio, singing along with Elmo as Kio shows off the new dance moves she's learned. Everyday, Kio has more new videos and favorite characters that she is excited to share when I come home, and the songs and activities follow us into our ritual of goofing off in bed as a family before she goes to sleep. Her grandmother in Japan is usually part of this ritual in a surreal situation where she is participating via FaceTime on my wife's iPhone, watching Kio watching videos and singing along and cheering her on. I can't imagine depriving us of these ways of connecting with her.

The (Unfounded) War on Screens

The anti-screen narrative can sometimes read like the War on Drugs. Perhaps the best example is Glow Kids, in which Nicholas Kardaras tells us that screens deliver a dopamine rush rather like sex. He calls screens "digital heroin" and uses the term "addiction" when referring to children unable to self-regulate their time online.

More sober (and less breathlessly alarmist) assessments by child psychologists and data analysts offer a more balanced view of the impact of technology on our kids. Psychologist and baby observer Alison Gopnik, for instance, notes: "There are plenty of mindless things that you could be doing on a screen. But there are also interactive, exploratory things that you could be doing." Gopnik highlights how feeling good about digital connections is a normal part of psychology and child development. "If your friends give you a like, well, it would be bad if you didn't produce dopamine," she says.

Other research has found that the impact of screens on kids is relatively small, and even the conservative AAP says that cases of children who have trouble regulating their screen time are not the norm, representing just 4 percent to 8.5 percent of US children. This year, Andrew Przybylski and Amy Orben conducted a rigorous analysis of data on more than 350,000 adolescents and found a nearly negligible effect on psychological well-being at the aggregate level.

In their research on digital parenting, Sonia Livingstone and Alicia Blum-Ross found widespread concern among parents about screen time. They posit, however, that "screen time" is an unhelpful catchall term and recommend that parents focus instead on quality and joint engagement rather than just quantity. The Connected Learning Lab's Candice Odgers, a professor of psychological sciences, reviewed the research on adolescents and devices and found as many positive as negative effects. She points to the consequences of unbalanced attention on the negative ones. "The real threat isn't smartphones. It's this campaign of misinformation and the generation of fear among parents and educators."

We need to immediately begin rigorous, longitudinal studies on the effects of devices and the underlying algorithms that guide their interfaces and their interactions with and recommendations for children. Then we can make evidence-based decisions about how these systems should be designed, optimized for, and deployed among children, and not put all the burden on parents to do the monitoring and regulation.

My guess is that for most kids, this issue of screen time is statistically insignificant in the context of all the other issues we face as parents--education, health, day care--and for those outside my elite tech circles even more so. Parents like me, and other tech leaders profiled in a recent New York Times series about tech elites keeping their kids off devices, can afford to hire nannies to keep their kids off screens. Our kids are the least likely to suffer the harms of excessive screen time. We are also the ones least qualified to be judgmental about other families who may need to rely on screens in different ways. We should be creating technology that makes screen entertainment healthier and fun for all families, especially those who don't have nannies.

I'm not ignoring the kids and families for whom digital devices are a real problem, but I believe that even in those cases, focusing on relationships may be more important than focusing on controlling access to screens.

Keep It Positive

One metaphor for screen time that my sister uses is sugar. We know sugar is generally bad for you and has many side effects and can be addictive to kids. However, the occasional bonding ritual over milk and cookies might have more benefit to a family than an outright ban on sugar. Bans can also backfire, fueling binges and shame as well as mistrust and secrecy between parents and kids.

When parents allow kids to use computers, they often use spying tools, and many teens feel parental surveillance is invasive to their privacy. One study showed that using screen time to punish or reward behavior actually increased net screen time use by kids. Another study by Common Sense Media shows what seems intuitively obvious: Parents use screens as much as kids. Kids model their parents--and have a laserlike focus on parental hypocrisy.

In Alone Together, Sherry Turkle describes the fracturing of family cohesion because of the attention that devices get and how this has disintegrated family interaction. While I agree that there are situations where devices are a distraction--I often declare "laptops closed" in class, and I feel that texting during dinner is generally rude--I do not feel that iPhones necessarily draw families apart.

In the days before the proliferation of screens, I ran away from kindergarten every day until they kicked me out. I missed more classes than any other student in my high school and barely managed to graduate. I also started more extracurricular clubs in high school than any other student. My mother actively supported my inability to follow rules and my obsessive tendency to pursue my interests and hobbies over those things I was supposed to do. In the process, she fostered a highly supportive trust relationship that allowed me to learn through failure and sometimes get lost without feeling abandoned or ashamed.

It turns out my mother intuitively knew that it's more important to stay grounded in the fundamentals of positive parenting. "Research consistently finds that children benefit from parents who are sensitive, responsive, affectionate, consistent, and communicative" says education professor Stephanie Reich, another member of the Connected Learning Lab who specializes in parenting, media, and early childhood. One study shows measurable cognitive benefits from warm and less restrictive parenting.

When I watch my little girl learning dance moves from every earworm video that YouTube serves up, I imagine my mother looking at me while I spent every waking hour playing games online, which was my pathway to developing my global network of colleagues and exploring the internet and its potential early on. I wonder what wonderful as well as awful things will have happened by the time my daughter is my age, and I hope a good relationship with screens and the world beyond them can prepare her for this future.

This is the second of three parts of the syllabus and summaries prepared by Samantha Bates who TAs the Applied Ethical and Governance Challenges in Artificial Intelligence course which I co-teach with Jonathan Zittrain. John Bowers and Natalie Satiel are also TAs for the course. I posted Part I earlier in the month.

My takeaways:

In Part I, we defined the space and tried to frame and understand some of the problems. We left with concerns about the reductionist, poorly defined and oversimplified notions of fairness and explainability in much of the literature. We also left feeling quite challenged by how the technical community will face new risks such as adversarial attacks and approaches like it.

In Part II, we continue our journey into a sense of despair about AI ethics and governance. In Solon Barocas and Andrew D. Selbst's paper "Big Data's Disparate Impact," they walk us through the state of the law around discrimination and fairness using Title VII of the US Civil Rights Act as an example. The authors show us that while it was enacted to address discrimination concerns raised by the civil rights movement, the law has evolved away from trying to correct societal inequities through remedies such as affirmative action. Instead, the law has focused more and more on fairness of processes and less on redistribution or on resolving historical inequity. As a result, the law has adopted a more technical notion of fairness - a kind of actuarial "all lives matter" sort of approach. During Part I, when we discussed the biased Amazon hiring tool, one of the proposed remedies was to "put our thumb on the scale" and just boost the scores of women and minorities. The Barocas and Selbst paper demonstrates that this type of solution is no longer supported by the law. The sense is that the engineers thought, "of course there must be a law prohibiting discrimination, we can use that." In fact, that law punts on redistribution or societal inequity. Jonathan pointed out that treatment of social inequality in Tort law is similar. If you run over a rich person and a poor person at the same time, you have to pay the rich family more - the calculation of damages is based on the victim's future earning power. Tort law, like Title VII, says, "there may be societal inequities, but we're not solving that problem here."

Sandra Wachter's paper proposing counterfactuals as a way to provide explainability is an excellent idea and feels like one way forward in the explainability debate. However, even Sandra seems concerned about whether laws such as the GDPR will actually be able to require companies to provide such explanations. We also had some concerns about the limits of counterfactuals in identifying biases or providing the "best" answer depending on the person - limits Sandra identifies in her paper.

Finally, we take adversarial attacks from the theoretical to a specific example in a recent paper that Jonathan and I wrote with John Bowers, Samuel Finlayson, Andrew L. Beam, and Isaac S. Kohane about the risks of adversarial attacks on medical AI systems.

Please see Samantha's summaries and links to the readings below for a more complete overview of the three classes in Part II.

- Joi

Part 2: Prognosis

By Samantha Bates

Syllabus Notes: Prognosis Stage

Welcome to part 2 of our Ethical and Governance Challenges in AI syllabus! In part 1, the assigned readings and class discussion focused on understanding how the social, technical, and philosophical roots of autonomous systems contribute to problems related to fairness, interpretability, and adversarial examples. In the second stage of the course, the prognosis stage, the class considered the social implications of these problems. Perhaps the most significant takeaway from this stage was the realization that many of these problems are social or political problems and cannot be addressed through solely a legal or technical approach.

Class Session 5: Prognosticating the impacts of unfair AI

Solon Barocas, an Assistant Professor at Cornell University, joined the class for the first day of the prognosis stage. We discussed his paper, "Big Data's Disparate Impact," which offered a legal and technical perspective on the use of algorithms in employment.

On the first day of the prognosis stage, the focus of the class shifted from examining the technical mechanisms underlying autonomous systems to looking at the societal impact of those systems. The Barocas and Selbst paper discusses how data collection and data labeling can perpetuate existing biases both intentionally and unintentionally. The authors outline five main ways that datasets can be discriminatory:

  1. Our own human biases may be integrated into a dataset when a human data miner determines the parameters that an autonomous system will use to make decisions.

  2. The training data might already be biased depending on how it was collected and how it was labeled.

  3. Data mining models consider a limited number of data points and thus may draw conclusions about an individual or a group of people based on data that is not representative of the subject.

  4. As Cathy O'Neil mentioned, prejudice may be introduced if the data points the model uses to make decisions are proxies for class membership.

  5. Discriminatory data mining could be intentional. However, the authors argue that unintentional discrimination is more common and harder to identify.

While there is legal doctrine that addresses discrimination in employment, the authors demonstrate that it is difficult to apply in practice, particularly in the data mining context. Title VII creates liability for intentional discrimination (disparate treatment) and for unintentional discrimination (disparate impact), but it is difficult to prove either type. For example, in order to hold employers liable for unintentional discrimination, the plaintiff must show that an alternative, nondiscriminatory method exists that will accomplish the same goals as the discriminatory practice. They must also prove that when presented with the alternative, the employer refused to consider it. Typically, an employer can mount a successful defense if they can prove they were unaware of the alternative or if there is a legitimate business reason for policies that may be discriminatory (the business necessity defense).

Bias in data mining is so difficult to identify, prove, and rectify in part because as a society, we have not determined the role of the law in addressing discrimination. According to one theory, the anticlassification theory, the law has an obligation to ensure that decision makers do not discriminate against protected classes in society. The opposing theory, the antisubordination theory, advocates a more hands-on approach and states that the law should work to "eliminate status-based inequality" at the societal level by actively improving the lives of marginalized groups. Our current society favors the anticlassification approach in part because the court established early on that antidiscrimination law was not solely intended to improve access to opportunities for protected classes. And while the authors demonstrate how data mining can exacerbate existing biases in the hiring context, there is a societal trade-off between prioritizing efficient decision making and eliminating bias.

This reading also raises the question of who is responsible for fixing the problem. Barocas and Selbst emphasize that the majority of data mining bias is unintentional and that it may be very difficult to identify bias and employ technical fixes to eliminate it. At the same time, there are political and social factors that make fixing this problem in the legal system equally difficult, so who should be in charge of addressing it? The authors suggest that as a society, we may need to reconsider how we approach discrimination issues more generally.

Class Session 6: Prognosticating the impacts of uninterpretable AI

For our sixth session, the class talked with Sandra Wachter, a lawyer and research fellow at the Oxford Internet Institute, about the possibility of using counterfactuals to make autonomous systems interpretable.

In our last discussion about interpretability, the class concluded that it is impossible to define the term "interpretability" because it greatly depends on the context of the decision and the motivations for making the model interpretable. The Sandra Wachter et al. paper essentially says that defining "interpretability" is not important and that instead we should focus on providing a way for individuals to learn how to change or challenge a model's output. While the authors point out that making these automated systems more transparent and devising some way to hold them accountable will improve the public's trust in AI, they primarily consider how to design autonomous models that will meet the explanation requirements of the GDPR. The paper's proposed solution is to generate counterfactuals for individual decisions (both positive and negative) that "provide reasons why a particular decision was received, offer grounds to contest it, and provide limited 'advice' on how to receive desired results in the future."

Not only would counterfactuals exceed the explainability requirements of the GDPR, the authors argue that counterfactuals would lay the groundwork for a legally binding right to explanation. Due to the difficulty of explaining the technical workings of an automated model to a lay person, legal concerns about protecting trade secrets and IP, and the danger of violating the privacy of data subjects, it has been challenging to provide more transparency around AI decision making. However, counterfactuals can serve as a workaround to these concerns because they indicate how a decision would change if certain inputs had been different rather than disclose information about the internal workings of the model. For example, a counterfactual for a bank loan algorithm might tell someone who was denied a loan that if their annual income had been $45,000 instead of $30,000, they would have received the loan. Without explaining any of the technical workings of the model, the counterfactual in this example can tell the individual the rationale behind the decision and how they can change the outcome in the future. Note that counterfactuals are not a sufficient solution to problems involving bias and unfairness. It may be possible for counterfactuals to provide evidence that a model is biased. However, because counterfactuals only show dependencies between a specific decision and particular external facts, they cannot be relied upon to expose all potential sources of bias or confirm that a model is not biased.

The optional reading, "Algorithmic Transparency for the Smart City," investigates the transparency around the use of big data analytics and predictive algorithms by city governments. The authors conclude that poor documentation and disclosure practices as well as trade secrecy concerns frequently prevented city governments from getting the information they needed to understand how the model worked and its implications for the city. The paper expands upon the barriers to understanding an autonomous model that are mentioned in the Watcher et. al. paper and also presents a great example of scenarios in which counterfactual explanations could be deployed.

Class Session 7: Prognosticating the impacts of adversarial examples

In our third prognosis session, the class continued its discussion about adversarial examples and considered potential scenarios, specifically in medical insurance fraud, in which they could be used to our benefit and detriment.

  • "Adversarial attacks on artificial intelligence systems as a new healthcare policy consideration" by Samuel Finlayson, Joi Ito, Jonathan Zittrain et al., preprint (2019)

  • "Law and Adversarial Machine Learning" by Ram Shankar Siva Kumar et al., ArXiv (2018)

In our previous session about adversarial examples, the class discussion was primarily focused on understanding how adversarial examples are created. The readings delve more into how adversarial examples can be used to our benefit and also to our detriment. "Adversarial attacks on artificial intelligence systems as a new healthcare policy consideration" considers the use of adversarial examples in health insurance fraud. The authors explain that doctors sometimes use a practice called "upcoding", when they submit insurance claims for procedures that are much more serious than were actually performed, in order to receive greater compensation. Adversarial examples could exacerbate this problem. For instance, a doctor could make slight perturbations to an image of a benign mole that causes an insurance company's autonomous billing code infrastructure to misclassify it as a malignant mole. Even as insurance companies start to require additional evidence that insurance claims are valid, adversarial examples could be used to trick their systems.

While insurance fraud is a serious problem in medicine, it is not always clearly fraudulent. There are also cases when doctors might use upcoding to improve a patient's experience by making sure they have access to certain drugs or treatments that would ordinarily be denied by insurance companies. Similarly, the "Law and Adversarial Machine Learning" paper encourages machine learning researchers to consider how the autonomous systems they build can both benefit individual users and also be used against them. The authors caution researchers that oppressive governments may use the tools they build to violate the privacy and free speech of their people. At the same time, people living in oppressive states could employ adversarial examples to evade the state's facial recognition systems. Both of these examples demonstrate that deciding what to do about adversarial examples is not straightforward.

The papers also make recommendations for crafting interventions for problems caused by adversarial examples. In the medical context, the authors suggest that the "procrastination principle," a concept from the early days of the internet that argued against changing the Internet's architecture to preempt problems, might be applicable to adversarial examples as well. The authors caution that addressing problems related to adversarial examples in healthcare too early could create ineffective regulation and prevent innovation in the field. Instead the authors propose extending existing regulations and taking small steps, such as creating "fingerprint" hashes of the data submitted as part of an insurance claim, to address concerns about adversarial examples.

In the "Law and Adversarial Machine Learning" paper, the authors emphasize that lawyers and policymakers need help from machine learning researchers to create the best machine learning policies possible. As such, they recommend that machine learning developers assess the risk of adversarial attacks and evaluate existing defense systems on their effectiveness in order to help policymakers understand how laws may be interpreted and how they should be enforced. The authors also suggest that machine learning developers build systems that make it easier to determine whether an attack has occurred, how it occurred and who might be responsible. For example, designers could devise a system that can "alert when the system is under adversarial attack, recommend appropriate logging, construct playbooks for incident response during an attack, and formulate a remediation plan to recover from an attack." Lastly, the authors remind machine learning developers to keep in mind how machine learning and adversarial examples may be used to both violate and protect civil liberties.

Credits

Notes by Samantha Bates

Syllabus by Samantha Bates, John Bowers and Natalie Saltiel

Jonathan Zittrain and I are co-teaching a class together for the third time. This year, the title of the course is Applied Ethical and Governance Challenges in Artificial Intelligence. It is a seminar, which means that we invite speakers for most of the classes and usually talk about their papers and their work. The speakers and the papers were mostly curated by our amazing teaching assistant team - Samantha Bates, John Bowers and Natalie Satiel.

One of the things that Sam does is help prepare for the class by summarizing the paper and the flow of the class and I realized that it was a waste for this work to just be crib notes for the instructors. I asked Sam for permission to publish the notes and the syllabus on my blog as a way for people to learn some of what we are learning and start potentially interesting conversations.

The course is structured as three sets of three classes on three focus areas. Previous classes were more general overviews of the space, but as the area of research matured, we realized that it would be more interesting to go deep in key areas than to go over what a lot of people probably already know.

We chose three main topics: fairness, interpretability, and adversarial examples. We then organized the classes to hit each topic three times, starting with diagnosis (identifying the technical root of the problem), then prognosis (exploring the social impact of those problems) then intervention (considering potential solutions to the problems we've identified while taking into account the costs and benefits of each proposed solution). See the diagram below for a visual of the structure.

The students in the class are half MIT and half Harvard students with diverse areas of expertise including software engineering, law, policy and other fields. The class has really been great and I feel that we're going deeper on many of the topics than I've ever gone before. The downside is that we are beginning to see how difficult the problems are. Personally, I'm feeling a bit overwhelmed by the scale of the work we have ahead of us to try to minimize the harm to society by the deployment of these algorithms.

We just finished the prognosis phase and are about to start intervention. I hope that we find something to be optimistic about as we enter that phase.

Please find below the summary and the syllabus for the introduction and the first phase - the diagnosis phase - by Samantha Bates along with links to the papers.

The tl;dr summary of the first phase is... we have no idea how to define fairness and it probably isn't reducible to a formula or a law, but it is dynamic. Interpretability sounds like a cool word, but as Zachary Lipton said in his talk to our class, it is a "wastebasket taxon" like the word "antelope" where we call anything that sort of looks like an antelope, an antelope, even if it has really no relationship with other antelopes. A bunch of students from MIT made it very clear to us that we are not prepared for adversarial attacks and that it was unclear whether we could build algorithms that were both robust against these attacks and still functionally effective.

Part 1: Introduction and Diagnosis

By Samantha Bates

Syllabus Notes: Introduction and Diagnosis Stage

This first post summarizes the readings assigned for the first four classes, which encompasses the introduction and the diagnosis stage. In the diagnosis stage, the class identified the core problems in AI related to fairness, interpretability, and adversarial examples and considered how the underlying mechanisms of autonomous systems contributed to those problems. As a result, our class discussions involved defining terminology and studying how the technology works. Included below is the first part of the course syllabus along with notes summarizing the main takeaways from each of the assigned readings.

Class Session 1: Introduction

In our first class session, we presented the structure and motivations behind the course, and set the stage for later class discussions by assigning readings that critique the current state of the field.

Both readings challenge the way Artificial Intelligence (AI) research is currently conducted and talked about, but from different perspectives. Michael Jordan's piece is mainly concerned with the need for more collaboration across disciplines in AI research. He argues that we are experiencing the creation of a new branch of engineering that needs to incorporate non-technical as well as engineering challenges and perspectives. "Troubling Trends in Machine Learning Scholarship" focuses more on falling standards and non-rigorous research practices in the academic machine learning community. The authors rightly point out that academic scholarship must be held to the highest standards in order to preserve public and academic trust in the field.

We chose to start out with readings that critique the current state of the field because they encourage students to think critically about the papers they will read throughout the semester. Just as the readings show that the use of precise terminology and explanation of thought are particularly important to prevent confusion, we challenge students to carefully consider how they present their own work and opinions. The readings set the stage for our deep dives into specific topic areas (fairness, interpretability, adversarial AI) and also set some expectations about how students should approach the research we will discuss throughout the course.

Class Session 2: Diagnosing problems of fairness

For our first class in the diagnosis stage, the class was joined by Cathy O'Neil, a data scientist and activist who has become one of the leading voices on fairness in machine learning.

Cathy O'Neil's book, Weapons of Math Destruction, is a great introduction to predictive models, how they work, and how they can become biased. She refers to flawed models that are opaque, scalable, and have the potential to damage lives (frequently the lives of the poor and disadvantaged) as Weapons of Math Destruction (WMDs). She explains that despite good intentions, we are more likely to create WMDs when we don't have enough data to draw reliable conclusions, use proxies to stand in for data we don't have, and try to use simplistic models to understand and predict human behavior, which is much too complicated to accurately model with just a handful of variables. Even worse, most of these algorithms are opaque, so the people impacted by these models are unable to challenge their outputs.

O'Neil demonstrates that the use of these types of models can have serious unforeseen consequences. Because WMDs are a cheap alternative to human review and decision-making, WMDs are more likely to be deployed in poor areas, and thus tend to have a larger impact on the poor and disadvantaged in our society. Additionally, WMDs can actually lead to worse behavior. In O'Neil's example of the Washington D.C. School District's model that used student test scores to identify and root out ineffective teachers, some teachers changed their students' test scores in order to protect their jobs. Although the WMD in this scenario was deployed to improve teacher effectiveness, it actually had the opposite effect by creating an unintended incentive structure.

The optional reading, "The Scored Society: Due Process for Automated Predictions," discusses algorithmic fairness in the credit scoring context. Like Cathy O'Neil, the authors contend that credit scoring algorithms exacerbate existing social inequalities and argue that our legal system has a duty to change that. They propose opening the credit scoring and credit sharing process to public review while also requiring that credit scoring companies educate individuals about how different variables influence their scores. By attacking the opacity problem that Cathy O'Neil identified as one of three characteristics of WMDs, the authors believe the credit scoring system can become more fair without infringing on intellectual property rights or requiring that we abandon the scoring models altogether.

Class Session 3: Diagnosing problems of interpretability

Zachary Lipton, an Assistant Professor at Carnegie Mellon University who is working intensively on defining and addressing problems of interpretability in machine learning, joined the class on Day 3 to discuss what it means for a model to be interpretable.

Class session three was our first day discussing interpretability, so both readings consider how best to define interpretability and why it is important. Lipton's paper asserts that interpretability reflects a number of different ideas and that its current definitions are often too simplistic. His paper primarily raises stage-setting questions: What is interpretability? In what contexts is interpretability most necessary? Does creating a model that is more transparent or can explain its outputs make it interpretable?

Through his examination of these questions, Lipton argues that the definition of interpretability depends on why we want a model to be interpretable. We might demand that a model be interpretable so that we can identify underlying biases and allow those affected by the algorithm to contest its outputs. We may also want an algorithm to be interpretable in order to provide more information to the humans involved in the decision, to give the algorithm more legitimacy, or to uncover possible causal relationships between variables that can then be tested further. By clarifying the different circumstances in which we demand interpretability, Lipton argues that we can get closer to a working definition of interpretability that better reflects its many facets.

Lipton also considers two types of proposals to improve interpretability: increasing transparency and providing post-hoc explanations. The increasing transparency approach can apply to the entire model (simulatability), meaning that a user should be able to reproduce the model's output if given the same input data and parameters. We can also improve transparency by making the different elements of the model (the input data, parameters, and calculations) individually interpretable, or by showing that during the training stage, the model will come to a unique solution regardless of the training dataset. However, as we will discuss further during the interventions stage of the course, providing more transparency at each level does not always make sense depending on the context and the type of model employed (for example a linear model vs. a neural network model). Additionally, improving the transparency of a model may decrease the model's accuracy and effectiveness. A second way to improve interpretability is to require post-hoc interpretability, meaning that the model must explain its decision-making process after generating an output. Post-hoc explanations can take the form of text, visuals, saliency maps, or analogies that show how a similar decision was reached in a similar context. Although post-hoc explanations can provide insight into how individuals affected by the model can challenge or change its outputs, Lipton cautions that these explanations can be unintentionally misleading, especially if they are influenced by our human biases.

Ultimately, Lipton's paper concludes that it is extremely challenging to define interpretability given how much it depends on external factors like context and the motivations for making a model interpretable. Without a working definition of the term it remains unclear how to determine whether a model is interpretable. While the Lipton paper focuses more on defining interpretability and considering why it is important, the optional reading, "Towards a rigorous Science of Interpretable Machine Learning," dives deeper into the various methods used to determine whether a model is interpretable. The authors define interpretability as the "ability to explain or present in understandable terms to a human" and are particularly concerned about the lack of standards for evaluating interpretability.

Class Session 4: Diagnosing vulnerabilities to adversarial examples

In our first session on adversarial examples, the class was joined by LabSix, a student-run AI research group at MIT that is doing cutting-edge work on adversarial techniques. LabSix gave a primer on adversarial examples and presented some of its own work.

The Gilmer et. al. paper is an accessible introduction to adversarial examples that defines them as "inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake." The main thrust of the paper is an examination of the different scenarios in which an attacker may employ adversarial examples. The authors develop a taxonomy to categorize these different types of attacks: "indistinguishable perturbation, content-preserving perturbation, non-suspicious input, content-constrained input, and unconstrained input." For each category of attack, the authors explore the different motivations and constraints of the attacker. By gaining a better understanding of the different types of attacks and the tradeoffs of each type, the authors argue that the designers of machine learning systems will be better able to defend against them.

The paper also includes an overview of the perturbation defense literature, which the authors criticize for failing to consider adversarial example attacks in plausible, real-world situations. For example, a common hypothetical situation posed in the defense literature is an attacker perturbing the image of a stop sign in an attempt to confuse a self-driving car. The Gilmer et. al. paper; however, points out that the engineers of the car would have considered and prepared for naturally occurring misclassification errors caused by the system itself or real world events (for example, the stop sign could be blown over by the wind). The authors also argue that there are likely easier, non technical methods that the attackers could use to confuse the car, so the hypothetical is not the most realistic test case. The authors' other main critique of the defense literature is that it does not acknowledge how improving certain aspects of a system's defense structure can make other aspects of the system less robust and thus more vulnerable to attack.

The recommended reading by Christian Szegedy et. al. is much more technical and requires some machine learning background to understand all of the terminology. Although it is a challenging read, we included it in the syllabus because it introduced the term "adversarial examples" and laid some of the foundation for research on this topic.



Credits

Figure and Notes by Samantha Bates

Syllabus by Samantha Bates, John Bowers and Natalie Saltiel

Supposedly 'Fair' Algorithms Can Perpetuate Discrimination »

How the use of AI runs the risk of re-creating the insurance industry's inequities of the previous century.

The Quest to Topple Science-Stymying Academic Paywalls »

Scientific publishers charge so much that even Harvard can’t afford it anymore. A new publishing infrastructure could help.

What the California Wildfires Can Teach Us About Data Sharing »

Citizen collection of radiation information after Fukushima and of air quality information after the recent fires serve as a model for everyone.

What the Boston School Bus Schedule Can Teach Us About AI »

An MIT team built an algorithm to optimize bell times and bus routes. The furor around the plan offers lessons in how we talk to people when we talk to them about artificial intelligence.

The Next Great (Digital) Extinction »

How today's internet is rapidly and indifferently killing off many systems while allowing new types of organizations to emerge.

The Educational Tyranny of the Neurotypicals »

The current school system is too rigid, and it’s designed for a different world anyway.

Why Westerners Fear Robots and the Japanese Do Not »

The hierarchies of Judeo-Christian religions mean that those cultures tend to fear their overlords. Beliefs like Shinto and Buddhism are more conducive to have faith in peaceful coexistence.

Blog DOI enabled »

As part of my work in developing the Knowledge Futures Group collaboration with the MIT Press, I'm doing a deep dive into trying to understand the world of academic publishing. One of the interesting things that I discovered as I navigated the different protocols and platforms was the Digital Object Identifier (DOI). There is a foundation that manages DOIs and coordinates a federation of registration agencies. DOIs are used for many things, but the general idea is to create a persistent identifier for some digital object like a dataset or a publication and manage it at a meta-level to the...

Fake Meat, Served Six Ways »

Cellular agriculture has the potential to protect animal welfare and curb global warming; Joi Ito, a former vegan, grapples with the future of meat.