Sign In

Communications of the ACM

Research Highlights

Technical Perspective: The Impact of Auditing for Algorithmic Bias

sad emoji on pie chart, illustration

If you read news articles on the ethics of AI, you will repeatedly see the phrase "algorithmic bias" popping up. It refers to algorithms producing results that appear racist or sexist or displaying other forms of unfair bias. For example, when Amazon built a machine-learning model to score job applications, trained on historical hiring data at the company, it discovered the system downgraded female applicants—or anyone who mentioned activities associated with women (coach of women's soccer league, for example).a In a ground-breaking paper, researchers Joy Buolamwini and Timnit Gebrub documented substantial discrepancies across skin types and the reliability with which three commercial face classifiers could classify gender in facial images: these classifiers could identify gender with high (99%) accuracy for white men but accuracy for darker-skinned men and all women was lower, with errors as high as 35% on darker-skinned women.

The term "algorithmic bias" is used to refer to these unwarranted or unfair differential results on different groups. Of course, the algorithm itself is not biased, in the sense it is a mathematical object with no views about the world or fairness. The bias is something that we humans attribute to how the algorithm and its model function. Often, the unfair treatment is a consequence of training the model on biased data chosen or generated by humans. The data used to train the Amazon job application classifier was drawn from the history of Amazon hiring decisions, which apparently included a bias against women. The model trained on that data recognized a pattern in human behavior and reproduced—or maybe even amplified—it.c What Buolamwini and Gebru's groundbreaking 2018 "Gender Shades" study drew attention to was the importance of representative training data for facial recognition classifiers. They demonstrated this by conducting an algorithmic audit, testing the performance of classifiers on a benchmark dataset (the Pilot Parliaments Benchmark they constructed) with explicit attention to representation across classes of skin color and gender.


No entries found

Log in to Read the Full Article

Sign In

Sign in using your ACM Web Account username and password to access premium content if you are an ACM member, Communications subscriber or Digital Library subscriber.

Need Access?

Please select one of the options below for access to premium content and features.

Create a Web Account

If you are already an ACM member, Communications subscriber, or Digital Library subscriber, please set up a web account to access premium content on this site.

Join the ACM

Become a member to take full advantage of ACM's outstanding computing information resources, networking opportunities, and other benefits.

Subscribe to Communications of the ACM Magazine

Get full access to 50+ years of CACM content and receive the print version of the magazine monthly.

Purchase the Article

Non-members can purchase this article or a copy of the magazine in which it appears.
Sign In for Full Access
» Forgot Password? » Create an ACM Web Account