Main content

Data for justice: Faculty across disciplines explore equity in AI

Portrait of Michal Arbilly, John W. Patty, Lauren Klein and Maggie Penn, standing on marble steps on Emory's quadrangle.

The multidisciplinary team of (L-R) Michal Arbilly, John W. Patty, Lauren Klein and Maggie Penn is co-teaching a class focused on the emerging field of data justice and the concepts of bias, fairness, discrimination and ethics inherent within data science and automated systems.

— Emory Photo Video

If red-light cameras in Chicago and other cities are supposed to be race-neutral, why do more Black and Latino drivers get busted there?

As data reflects and guides our actions, and society grapples with systemic inequality, four Emory College professors from diverse disciplines are leading 40 students to seek, gather and analyze data with justice in mind.

A new course this semester, QTM 310: Introduction to Data Justice, is not only about quantitative theory and methods (though juniors and seniors in that major quickly snapped up almost every available slot). The faculty team includes biology (Michal Arbilly), English (Lauren Klein) and political science (John W. Patty and Elizabeth Maggie Penn) for the Emory College of Arts and Sciences course.

As an emerging field, data justice “considers how questions about data, its collection and its use are connected to broader social and political concerns, and how data-driven systems can be designed more equitably,” the four said in their syllabus. “Such data is expansive and expanding, and serves as the basis for automated systems that range from resume screening to voting redistricting, predictive policing to cell-phone autocomplete.”

What we talk about when we talk about justice

Inspired to teach the class after the Black Lives Matter movement, the four instructors knew things would get interesting in the class because their fields have differing definitions of power, fairness and justice. And were these concepts possible to measure accurately by the methods that were being used? Explicit and implicit biases surface when collecting and analyzing data.

“The one thing that has surprised me the most was learning about how ingrained biases could be with a scientist or researcher even with them being aware of how biases can affect the groups being studied,” says Christina Chance, a math and computer science major who talked her way into the class. “There are natural biases in how we are taught to approach problems, and the bias of inference and making connections or assumptions. Those were things that I had never thought about until this class.”

No measurements are neutral

Students in this class are learning that within data that influences technical and policy decisions, choices (and trade-offs) are everywhere. In examples like red-light cameras in Chicago, seemingly objective algorithmic systems may not be “neutral” in their effects. Many systems rely upon data — or were designed to achieve goals — that reflect embedded, global biases and inequalities.

Another example: The accuracy of artificial intelligence (AI) is based on a predominant system, such as a beauty contest in which most past winners were white. To predict winners among a new batch of contestants, the AI almost immediately made categorizations based on race. It reflected bias in the system, instead of producing nonbiased results.

It’s not just theory; students learn data justice by doing too. They use software, platforms and programming language in the course lab, seeing for themselves whether designing an unbiased research project is possible. The final includes their recommendations on the issues that they think future students will want to address.

“We want students to believe they can do justice-oriented data science and to tell themselves that it is possible for each of them to do this work,” Klein says. “Nothing that we are doing in the course, computationally or statistically, is outside the scope of any Emory student’s abilities, whether they’ve had prior experience with quantitative analysis or not. One of my core beliefs is that we need everyone contributing, from all of the disciplines, because these problems are so complex and multifaceted, and so deeply entrenched in every area of our society, that they demand expertise from all areas of society.”

Dani Roytburg took QTM 310 after “seeing how people are leaving a digital footprint that can be used with data analytics. In my mind, the mapping of human and social interaction is of paramount importance.” Roytburg has learned multiple ways of modeling inequality and other abstract ideas, and developed “a perspective on how both the humanities and STEM interact with one another. And I think that's the point of the QTM department writ large, to bridge that gap.”

Is DNA data a predictor of behavior?

Students dived into misuses of population genetics with Michal Arbilly, a lecturer in biology and QTM. Research has attempted to link behavior and intelligence to genes, and these supposed links have been used to justify eugenics. Today, genetic sequencing is so easy, and there is an abundance of genetic data we can use for such association studies. In the future, will people’s genes be used in determining, for example, their educational opportunities?

“We know that there's some heritable basis to a lot of our traits, but to what extent is it meaningful?” Arbilly asks. “Something can be heritable and still very sensitive to the environment. Often we can make a big difference by changing the environment, even for highly heritable traits.”

Her message to students: “You have the ability to use all these incredible quantitative tools. You have to use them properly. When you're reading the literature, always ask yourself, ‘What’s the bias here? Is this the proper inference that can be made based on the data?’”

Is fairness even achievable?

Facial recognition software has been shown to be inaccurate in identifying people of color. The example shows how structural inequalities, and racism in particular, enter into datasets and get amplified when those datasets are used to train data systems.

QTM 310 students replicated a facial recognition audit and identified how biased training data led the system to be able only to identify white faces. But they also learned about the harms that come with the oversurveillance of people of color. As a result, students learned to reject quick technical fixes and “create frameworks that help people imagine alternatives,” says Klein, the Winship Distinguished Research Professor of English and Quantitative Methods.

From Penn, professor of QTM and political science and scholar of social choice theory, the students learned that valid statistical inference can still lead to discrimination. She has researched fair representation in the design of electoral systems and questioned the students about whether truly fair representation and political power can be quantified through “algorithms of fair division.”

“If you try to achieve fairness with respect to one criterion, you're necessarily going to violate it with respect to one another,” she says. “And the criteria are very, very sensible.”

Penn urged the students to think creatively about measuring power, which some in her field define as the ability to cast a pivotal vote or unilaterally change an outcome.

“There are all sorts of indices of power, and there are a lot of kind of natural ways of measuring that in many applications, like in network analysis and natural language processing” Penn says. “I think it gave the students some food for thought, and that’s what I want them to get.”

Choosing the rules that an algorithm must follow

Patty, professor of political science and QTM, has researched how bureaucratic agencies make decisions based on data. Some flaws seem obvious; for instance, the federal Department of Education collected data related to the No Child Left Behind policy but did not include the families that pulled their children out of public education.

But data justice can be far more nuanced. From Patty, students learned that justice as a theory contains contradictory notions. Algorithms can be too complicated to explain or interpret. In a lab assignment, students designed college admission rules to see how fairness and efficiency sometimes conflict. When there are competing goals, finding a winner requires choices and justifications.

“It's time for you to find your voice to tell me what you think,” Patty says. “The whole secret of these impossibility theorems is there isn't one right answer. There are multiple right answers, and then you have to be willing to defend yours.”

Current headlines point to QTM 310’s relevance

The class is one way that Emory College is prioritizing the study of race and inequality, part of the college’s strategic plan. The four faculty members plan to keep teaching it.

“We have only begun this conversation about how our approaches inform each other and help each other,” Klein says. “There's so much possibility in this space, and it’s the kind of work that's so urgently needed. Pretty much every week, a new type of algorithmic bias or discriminatory system gets exposed. These problems require everyone in order to be resolved.”


Recent News