Artificial intelligence (AI) researcher Timnit Gebru joined Google in 2018 intent on changing the system from within, sounding an alarm about the ways technology could—and already is—harming underprivileged communities. After an abrupt, contentious, and highly publicized dismissal from the company, she has embraced a new role with the founding of the Distributed AI Research Institute (DAIR), a nonprofit that seeks not just to expose potential AI harms, but to support proactive research into technologies that benefit communities, rather than working against them.
We spoke to Gebru and DAIR Director of Research Alex Hanna (like Gebru, a former member of Google's Ethical AI team) about DAIR's work and its goals for the future.
Let's start with the DAIR mission of creating "a space for independent, community-rooted AI research free from Big Tech's pervasive influence." Can you unpack what that means?
Hanna: Our goal with DAIR is to find an alternative way to go forward with AI—one that starts with people that have been typically excluded from the tech industry, including Black and indigenous people, other people of color, low-income people, people in the "Global South," disabled people, and gender and sexual minorities. How can we build technology in a way that benefits those communities, rather than treating them as an after-thought? We want to do research that starts with community well-being and proceeds in a logical, fair, and principled way.
How do you think about the output of your work, or is it too soon to say? You've spoken before about wanting to avoid the trap of publish-or-perish-driven research.
Hanna: I don't think it's too early to say that we're thinking about outcomes. We're doing fundamental research, but the end goal is to see some kind of improvement for a particular population. So, for example, we have a project that is focused on social media harms, and one of our goals is to minimize the abuse and harassment of people in a specific population.
Gebru: I think the point is that if your goal is to publish another paper, that's not really going to be conducive to ethical research. Publishing is not the end—it's a means to an end. But a lot of times, in our research communities, it's basically considered to be the end, and that's one of the things we're trying to challenge. Alex wrote a paper called "Against Scale" (https://bit.ly/3eNxZw7) to make the point that even if you want to change something "at scale," the way to do it is not by trying to do it everywhere all at once. Instead, you need to think about how you could do something meaningful in a specific context, and then it will reverberate. When you start by asking, "Does it generalize?" like we do in machine learning, you end up making something that doesn't work well for anybody.
The work that your Research Fellow Raesetje Sefala is tackling on the evolution of spatial apartheid in South Africa (https://bit.ly/3Sg54hK) seems like a great example of doing something meaningful in a specific context.
Gebru: It's also a good example of the kind of work that's very difficult to publish in machine learning or computer science.
That's because one of Sefala's biggest innovations was the methodology she created to construct a visual dataset, rather than an algorithm.
Hanna: We now have whole fields like data journalism, but we don't really have good ways of expressing all the labor and judgment that goes into the creation of data. Another one of our fellows, Milagros Miceli, has been researching data annotators in Argentina and Bulgaria (https://bit.ly/3MMqZMl). These annotators have lists of instructions that are hard to understand, and they're not translated very well. In Bulgaria, for instance, the annotators worked in English, but they were refugees from Syria. Some of them spoke English, but most of them spoke Levantine Arabic. And in the end, their work is understood through this very narrow lens of data quality.
What about funding? How do you deal with the fact that most philanthropic organizations are supported or even led by executives of the tech companies from whose influence you strive to be free?
Gebru: We're lucky to be funded by a number of foundations, and most of the grants that we've gotten have been unrestricted funds that are based on our vision. But we don't know how long that's going to last, so we're thinking about how to diversify our funding and build our own sources of revenue, such that even if we lose a specific source of funding, it's not a big deal. We also don't want to start catering to a specific group of funders. We're in a good place, but personally speaking, I want to get to a point where we are relatively independent.
"We now have whole fields like data journalism, but we don't really have good ways of expressing all the labor and judgment that goes into the creation of data."
You've said before that DAIR might get involved in algorithmic audits. can you tell me a little more about your plans?
Gebru: We've been exploring a few different services we might offer, but there aren't any concrete plans. The next step is to analyze what's in line with our mission. We're a research institute, but being plugged into some of those things could help us see where people are having issues.
I suppose your proposal that engineers create datasheets for their datasets (https://bit.ly/3m6Nq2z)—documenting the composition, collection process, and recommended uses of the data they collect for machine learning models—is a step in the direction of seeding more ethical, effective workflows.
Gebru: It's a step, yes. But at some point, you also want to have regulation.
To that end, you've proposed that tech companies should have to prove that their products do not cause any harm, rather than placing the burden on individual citizens to show how they are harmed.
Gebru: I mean, everybody else has to do that. Why is the tech industry exempt? Food, toys, cars, medicine—they all have to warn people about potential harms. The European Union is about to pass the AI Act (https://bit.ly/3eLbJmC) to regulate high-risk applications, but that also has a number of loopholes. For instance, they make an exception for what they call general purpose models. What does that mean? That's quite worrisome.
Hanna: More broadly, it raises questions about what governance looks like in this space. Does it look like an impact assessment that's tied to an algorithm, which is what the Algorithmic Accountability Act (https://bit.ly/3tbVrHe) proposes? I think we need to go further. When you think about the impact regarding the creation of the datasets that are used to train large machine learning models, you can outline them in terms of copyright laws and privacy protections. But there are also potential informational harms at the population level. So, for instance, if people consent to their data being used to train an algorithm that recognizes darker-skinned women, it can still be leveraged in racist or sexist ways. It can still result in due process violations.
Gebru: It's also not enough to have reactive laws like this, because they don't slow anyone down. And by the time companies do slow down, they have already moved on to the next thing that's not regulated. Meanwhile, the rest of us are too bogged down trying to show the harm these technologies are causing. We're not imagining what kind of technology we want to build. Ultimately, you have to do both—you have to slow them down and invest in alternative futures.
©2022 ACM 0001-0782/22/12
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from email@example.com or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2022 ACM, Inc.
No entries found