acm-header
Sign In

Communications of the ACM

Viewpoint

Trust Is Not Enough: Accuracy, Error, Randomness, and Accountability in an Algorithmic Society


die in a globe stand, illustration

Credit: Fran_Kie / Shutterstock

We now trust many computer systems—hardware running deterministic, hard-coded software programs—to be much more reliable than they once were. And societally, we now have more of a sense of how to hold those who develop and operate software accountable when their products fail. Nevertheless, we are currently amidst a new set of concerns about the trust we grant to algorithmic systems—computational routines that "learn" from real-world data—to accomplish important tasks in our daily lives. While developers of algorithmic systems rightfully strive to make their products more trustworthy, and face room for improvement1,12 "trust" is an insufficient frame for the relationship between these technologies and the impacts they produce. Given the properties of machine learning and the social, political, and legal structures of accountability in which they are enmeshed, there currently unresolvable uncertainties about how such systems produce their results, and who should be held accountable when those results produce harm to individuals, communities, or all of society.

This uncertainty about how, whether, and when algorithmic harms come to pass is not ever going away, at least not completely. So, we remain in need of mechanisms to address both whom to hold accountable and how to hold them accountable, rather than to rely on them to make themselves more trustworthy.

The harms for which accountability mechanisms are needed, and for which no responsible actor should be entrusted to minimize on their own, include harms that have been clearly defined by courts—injury, property loss, workplace hazards—and harms that have been difficult to pursue redress for in court—privacy violations,13 manipulative practices, and many forms of discrimination.

One reason it is so difficult to develop mechanisms of accountability is that such mechanisms speak to one of two complementary aspects of accountability without considering the other. One of these aspects of accountability is the role inhabited by those who ought to be accountable. In this role, accountable actors—engineers, corporate officers, or licensing bodies—must conduct themselves with a willingness to accept blame and with the capacity to see themselves as responsible to others for doing so. They must make themselves trustworthy and accountable. The other aspect of accountability is the ability of others—harmed parties or those acting on their behalf—to hold responsible parties accountable, regardless of their willingness to accept blame. This second aspect of accountability requires social, institutional relations—between individuals, the organizations they might work for or in, and institutions that are themselves held accountable to the public—for accountability to exist particularly when those who ought to be trustworthy fall short.

These two aspects work together, even if accountability frameworks all too often do not. Personal and institutional forms of accountability reinforce each other. But because of the properties of algorithmic systems, they present significant barriers to accountability that go beyond those associated with accountability for traditional computational software systems. In our recent work,4 A.F Cooper, Benjamin Laufer, Helen Nissenbaum, and I identify several of these barriers, and what follows is a discussion of how they work together to prevent individual actors from seeing themselves as accountable, and to prevent our social institutions—regulators, courts, companies, and the public—from holding anyone accountable when algorithmic systems produce harm. This Viewpoint extends this work to suggest immediate steps technical researchers, social scientists, and policymakers can take to make the world safer when the trust we place in developers falls short.

These barriers to accountability are characterized by the tendency of those who build and use algorithmic systems to shift the blame for algorithmic harms onto external, uncontrollable factors in order to recast harms as something for which no one could be seen as blameworthy. The means by which this barrier is erected is through scapegoating: attributing errors to fundamentally intractable statistical properties of the algorithmic system or to the fundamentally stochastic properties of the real world. In traditional software engineering, such errors are bugs, said to be "inevitable" and "endemic to programming." Predictable in their unpredictability, bugs arise from errors in how engineers design, model, and code their software programs. But algorithmic systems present additional types of bug-like behavior to computational systems: misclassifications, statistical error, and nondeterministic outputs that may cause harm. Examples of these "bugs" abound: Google Photos labeling a photo of a Black couple as containing "gorillas,"5 translation and text-generation tools associating stereotypically gendered pronouns with various professions (for example, "he is a doctor," "she is a nurse"), or grocery stores serving Muslim East African immigrant communities being flagged for fraud because their customers pooled their money to make large communal purchases.14 These errors are easily depicted in ways that deflect responsibility away from developers and onto the systems themselves, as the unavoidable and inevitable cost of rapid, but incremental progress.


Uncertainty about how, whether, and when algorithmic harms come to pass is not ever going away.


Several factors contribute to these errors. One is the vested interests powerful institutions hold in allowing these mistakes, which uphold patterns of white supremacy, to go uncorrected. Beyond this is the practical challenge of collecting enough data representative of the phenomena developers are interested in. Many computer vision systems have been trained on datasets that underrepresent minoritized community members, which results in higher error rates for individuals from those communities.2 Another challenge is mapping the available data appropriately onto the phenomena developers are interested in. Developers often must use proxies for the phenomena they are interested in, and so might use the cost of medical care as an approximation for the health needs of various communities without realizing the errors incurred by historical underinvestment in African American communities' healthcare.11 These challenges are often seen as ultimately resolvable with better data or better problem formulations. But a third challenge is fundamentally unresolvable: many phenomena are stochastic. Many processes are too random for individual instances to be predicted. This is certainly true of physical processes such as alpha particle decay,7 but social processes, too, exhibit stochastic properties. When algorithmic systems are built to predict unpredictable phenomena, the error rate cannot be reduced beyond certain necessary thresholds.

To be sure, progress is being made—both rapidly and incrementally—to reduce the prevalence of such errors. But the impacts of imperfect systems already in use suggest parallel efforts are needed to address the inevitability of imperfection inherent in algorithmic systems. One of these efforts should strive to better evaluate whether a system is reliable enough to be used in the domain it is designed for. It stands to reason that in highstakes domains where a great many people are expected to be affected, or a few people are expected to be affected in grave ways, the tolerance for error should be much lower than in trivial applications, and trusting the developer to draw the line is insufficient. But understanding what the stakes are in any given domain, and what the consequences of error are given those stakes, is no simple task.9 Nor is the prospect of providing reliable guarantees of robustness for systems' behavior in-context.6 Qualitative methodologies developed in the social sciences can be adapted to understand the risks and equities for using algorithmic systems for specific purposes in specific places. These methods can support better decision making for those who must choose whether or not an application is "ready for prime time" before integrating it into people's lives and communities.15 Proposed regulation in the E.U. and U.S. acknowledges the importance of this effort, stratifying compliance requirements by the risk profile and domain, and calling for the prior study of algorithmic systems before they are deployed, in ways that incorporate community input and multidisciplinary expertise.

A second effort is required to make whole again those who have been harmed when a system errs. In this effort scapegoating constitutes the most serious barrier to accountability, as shifting blame for error toward an automated system shifts the responsibility to do something about that error away from those who are best positioned to offer a remedy. While requiring those who develop and operate algorithmic systems to document the expected impacts of their products go some distance toward establishing who the responsible parties are for any harmful errors,10 these requirements do not resolve the problem on their own. Those who are harmed when systems err also require meaningful recourse, and we can trust that companies that develop algorithmic systems are investing heavily in ways of limiting their own liability.3 Harmed parties must know who to contact, how to contact them, and what the circumstances of an adverse effect were in order to ask for a reconsideration or reversal of an algorithmic outcome. They also need to know a system was in use at all, which is not always readily apparent to users. Additionally, a private right of action8 would allow a means for redress when avenues of access to system operators is forestalled. A private right of action gives those who have been harmed by algorithmic systems standing in courts, to seek remedy for algorithmic injury, and to enter into discovery about the properties of systems that lead to their injury. The laudatory effects of such a right are two-fold—liability for harms is a powerful motivation for developers to resist deployed systems with unreasonably high likelihoods of erring in harmful ways, and it provides a compensatory avenue for those who have been hurt by harmful errors.

What the preceding suggests, most strongly, is that a new program of research is sorely needed. Developers are indeed making themselves more trustworthy.1 But while significant progress is being made on developing algorithmic systems that are more accurate, and that sacrifice less accuracy when they pursue social and legal goals like fairness, the inherent stochasticity of such systems call for a better understanding of how irreducible errors are distributed across society. This means incorporating the above concerns—knowing when error rates are acceptable for a context of use and providing avenues to remedy for those who are harmed—into existing fields of computational research.


The impacts of imperfect systems already in use suggest parallel efforts are needed to address the inevitability of imperfection inherent in algorithmic systems.


Doing so can be accomplished in three ways, each of which fills gaps created when trust fails.

  • Increased audit and assessment throughout the machine learning pipeline can help to both clarify the relationship between design choices and resultant harms, as well as to identify approaches where error is irreducible and unsafe for use within specific contexts. Computational work on algorithmic robustness, in partnership with social scientists who understand the contexts of deployment is crucial for this.
  • Rigorous standards of care can both motivate audit and assessment, but also can provide a rationale through which individual practitioners can see themselves as responsibilized to intervene when designs or deployments go awry.
  • Lawmakers can enact legislation that enshrines a private right of action for those harmed by algorithmic systems. The technical expertise needed to demonstrate this harm can serve a dual purpose when it is applied prior to harm, making systems safer and limiting their use when not appropriate.

Current work on human-computer interaction can also be expanded to include an emphasis on how to provide recourse to those who interact with algorithmic systems, to know that such systems are in use, and to contest adverse decisions. Both efforts require better and more practical documentation of algorithmic systems, which would lay out for all to see what the expected behaviors of algorithmic systems are, what contexts they are intended to be used in, what consequences might occur if used beyond their specifications, and who the relevant actors are who can review decisions, grant recourse, and be held accountable by others. Finally, the rise of computational superbugs requires a measure of modesty on the part of developers, to avoid rushing systems to market or making grandiose claims about their systems' capabilities, in favor of building systems that—when they err—err on the side of caution.

Back to Top

References

1. Avin, S. et al. Filling gaps in trustworthy development of AI. Science. 374, (2021), 1327–1329.

2. Buolamwini, J. and Gebru, T. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of Machine Learning Research 18 (2018).

3. Burton, S. et al. Mind the gaps: Assuring the safety of autonomous systems from an engineering, ethical, and legal perspective. Artificial Intelligence 279, 103201 (2020).

4. Cooper, A.F. et al. Accountability in an Algorithmic Society: Relationality, Responsibility, and Robustness in Machine Learning. (2022); https://bit.ly/3KZhdWY

5. Dougherty, C. Google photos mistakenly labels black people "gorillas." Bits Blog (2015): https://nyti.ms/2HuInr2

6. Hancox-Li, L. Robustness in machine learning explanations: Does it matter? In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (ACM, Barcelona Spain, 2020; https://bit.ly/40ptWYx

7. Loveland, W.D. et al. Modern Nuclear Chemistry. John Wiley & Sons, Hoboken, NJ, 2006.

8. Mendelson, Stop Discrimination by Algorithms Act of 2021 (2021); https://bit.ly/3LlsEta

9. Metcalf, J. et al. Algorithmic impact assessments and accountability: The co-construction of impacts. In Proceedings of the ACM Conference on Fairness, Accountability and Transparency (ACM, Toronto, ON, 2021; https://bit.ly/40ocHqt

10. Metcalf, J. et al. A relationship and not a thing: A relational approach to algorithmic accountability and assessment documentation. arXiv preprint, 19 (2020).

11. Obermeyer, Z. et al. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, (2019), 447–453.

12. Shilton, K. et al. Excavating awareness and power in data science: A manifesto for trustworthy pervasive data research. Big Data & Society 8 (2021), 1–12.

13. Solove, D.J. A taxonomy of privacy. University of Pennsylvania Law Review. 154 (2006), 477.

14. Sloane, M. et al. AI and Procurement: A Primer. New York University, New York, (2021); doi:10.17609/BXZF-DF18.

15. Sloane, M. and Moss, E. AI's social sciences deficit. Nat. Mach. Intell. 1, 330–331 (2019).

Back to Top

Author

Emanuel Moss (emanuel.moss@intel.com) is a Sociotechnical Systems Research Scientist at Intel Labs and former Postdoctoral Fellow at Cornell Tech and Data and Society Research Institute, New York, NY, USA.


Copyright held by author.
Request permission to (re)publish from the owner/author

The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account
Article Contents: