477ee22 - Partnership on AI

Social Risks of Non-Use

When demographic data is discussed in the context of algorithmic decision-making, it is most often to make the case for why disaggregated data is necessary to make fairer decisions. In this section, “Social Risks of Non-Use,” we identify some key features of what the addition of demographic data does for efforts to detect and mitigate discrimination in institutional decision-making more broadly. In the next section, “Social Risks of Use,” we will delve into some of the less frequently considered risks of actually collecting and using this data.

Hidden Discrimination

As algorithmic decision-making systems become more widespread, there is greater risk for the systems to reinforce historical inequalities and engender new forms of discrimination in ways that are difficult to assess. In most cases, when ADMS discriminate against protected groups, they do so indirectly. While it is certainly possible for machine learning systems to base decisions off of features like race, more often the tools uncover trends and correlations that have the effect of discriminating across groups.

In order to understand how algorithms can discriminate, it is important to consider the different ways in which bias can enter the picture. The first point of entry is most obviously the data used to build the system. Biases in the data collection process and existing social inequalities will dictate the types of correlation that can be utilized by a machine learning system. If a group is underrepresented in the dataset or if the dataset embeds the results of historical discrimination and oppression in the form of biased features, it is to be expected that ADMS will have worse performance for or undervalue certain groups (For a detailed discussion of the many kinds of data bias, see Ntoutsi et al., 2020 Ntoutsi, E., Fafalios, P., Gadiraju, U., Iosifidis, V., Nejdl, W., Vidal, M., Ruggieri, S., Turini, F., Papadopoulos, S., Krasanakis, E., Kompatsiaris, I., Kinder‐Kurlanda, K., Wagner, C., Karimi, F., Fernandez, M., Alani, H., Berendt, B., Kruegel, T., Heinze, C., … Staab, S. (2020). Bias in data‐driven artificial intelligence systems—An introductory survey. WIREs Data Mining and Knowledge Discovery, 10(3). https://doi.org/10.1002/widm.1356, and Olteanu et al., 2019 Olteanu, A., Castillo, C., Diaz, F., & Kıcıman, E. (2019). Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Frontiers in Big Data, 2, 13. https://doi.org/10.3389/fdata.2019.00013.)

If optimizing for a goal that is poorly defined, or even discriminatorily defined, it is likely that a system will reproduce historical inequity and discrimination, just under a guise of objectivity

Using biased data, however, is not the only way that ADMS can have a discriminatory impact. How ADMS are designed and towards what kinds of objectives have a large bearing on how discriminatory their outcomes are. If optimizing for a goal that is poorly defined, or even discriminatorily defined, it is likely that a system will reproduce historical inequity and discrimination, just under a guise of objectivity and disinterestedness. For example, the UK higher education admission algorithm attempted to define aptitude as a combination of a predicted performance and secondary school quality, systematically biasing the outcomes for those coming from poorer or less-established secondary schools Rimfeld, K., & Malanchini, M. (2020, August 21). The A-Level and GCSE scandal shows teachers should be trusted over exams results. Inews.Co.Uk. https://inews.co.uk/opinion/a-level-gcse-results-trust-teachers-exams-592499. Similarly, ADMS that ignore contextual differences between groups in an attempt to treat everyone equally often lead to discriminatory outcomes, such as in the case of hate speech detection systems that do not consider the identities of the speaker Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language. Proceedings of the International AAAI Conference on Web and Social Media, 11(1), 512–515. Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial Bias in Hate Speech and Abusive Language Detection Datasets. ArXiv:1905.12516 (Cs). http://arxiv.org/abs/1905.12516.

Though the types of discrimination discussed here represent a small subset of the myriad ways that ADMS can discriminate, we are still confronted with a difficult question — how should practitioners assess all the potential discriminatory impacts of their systems? The nascent field of Algorithmic Fairness has contributed a number of strategies for identifying and even mitigating discrimination by ADMS, but almost all of the proposed methods require that the datasets in use include the potentially discriminated against demographic attributes. Generally speaking, however, prior work has shown that demographic attributes are only collected once a narrow, enforceable definition of discrimination is codified into law or corporate standards Bogen, M., Rieke, A., & Ahmed, S. (2020). Awareness in Practice: Tensions in Access to Sensitive Attribute Data for Antidiscrimination. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 492–500. https://doi.org/10.1145/3351095.3372877. Furthermore, the issue of missing demographic data is often only confronted and explicitly addressed once assessment and/or enforcement efforts begin in earnest (see, for example, Exec. Order No. 13985 [2021] Executive Order On Advancing Racial Equity and Support for Underserved Communities Through the Federal Government. (2021, January 21). The White House. https://www.whitehouse.gov/briefing-room/presidential-actions/2021/01/20/executive-order-advancing-racial-equity-and-support-for-underserved-communities-through-the-federal-government/ and Exec. Order No. 14035 [2021]Executive Order on Diversity, Equity, Inclusion, and Accessibility in the Federal Workforce. (2021, June 25). The White House. https://www.whitehouse.gov/briefing-room/presidential-actions/2021/06/25/executive-order-on-diversity-equity-inclusion-and-accessibility-in-the-federal-workforce/). Even then, we see that anti-discrimination standards and practices vary widely across domains, and in many cases specific types of discrimination are legally sanctioned (e.g., “actuarial fairness” in insurance quotes and “legitimate aims” in employment law).

As such, we frequently see a cycle of ADMS development and deployment, exposure of egregious discrimination through individual reports*, and then ad hoc system redesigns. Without access to demographic attributes, it’s difficult to assess these types of shortcomings before system deployment, and even after deployment it is likely that more insidious forms of discrimination remain hidden.

”Colorblind” Decision-Making

“Colorblind” Decision-Making

Just as an absence of demographic data can prevent practitioners from uncovering various forms of social or institutional discrimination, it can also prevent them from making systems that have the explicit goal of addressing historical discrimination. In fact, under a number of legal and policy frameworks, ignoring or omitting demographic attributes altogether is actually considered non-discriminatory. When ADMS use this approach, often called “fairness through unawareness” or (in cases involving race) “color-blindness,” the results have often been shown to be just as discriminatory as whatever came before algorithmic decision-making Kusner, M. J., Loftus, J., Russell, C., & Silva, R. (2017). Counterfactual Fairness. Advances in Neural Information Processing Systems, 30. https://papers.nips.cc/paper/2017/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html. Often, this is because the decision-making systems we build take in historical data and learn to reproduce historical biases embedded in that data. Sometimes this happens because the system explicitly learns to prioritize accuracy or performance for one group over another by using “proxies” for demographic attributes (e.g., pregnancy status is often a proxy for gender). By cobbling together attributes such as zip code, income, parental status, etc., machine learning systems can “reconstruct” demographic category membership, if doing so is beneficial to the prediction task at hand Harned, Z., & Wallach, H. (2019). Stretching human laws to apply to machines: The dangers of a ’Colorblind’ Computer. Florida State University Law Review, Forthcoming.. In other cases, discrimination stems from prioritizing certain attributes that exhibit disparities across groups as a result of historical oppression, such as wealth or educational attainment.

Addressing these forms of discrimination, however, is not so easy as just introducing demographic data to the dataset. As seen in the debates around the COMPAS recidivism prediction algorithm, fairness or discrimination can be defined in many, often conflicting, ways Washington, A. L. (2018). How to Argue with an Algorithm: Lessons from the COMPAS-ProPublica Debate. Colorado Technology Law Journal, 17, 131.. This raises a second type of unawareness or color-blindness that is more insidious: the belief that if a decision is not made because of a demographic attribute or some proxy thereof, that the decision cannot be discriminatory. For example, credit-scoring institutions now make use of data that is much more closely linked to race and other demographic categories than the concept of “credit-worthiness,” such as criminal history and how one communicates online Rodriguez, L. (2020). All Data Is Not Credit Data: Closing the Gap Between the Fair Housing Act and Algorithmic Decisionmaking in the Lending Industry. Columbia Law Review, 120(7), 1843–1884.. Looking specifically at criminal history, a social constructivist perspective on race would suggest that being subjected to discriminatory (if not outright predatory) policing is part and parcel of what it means to be categorized as Black in the United States Hu, L. (2021, February 22). Law, Liberation, and Causal Inference. LPE Project. https://lpeproject.org/blog/law-liberation-and-causal-inference/.

Attempting to ignore societal differences across demographic groups often works to reinforce or reproduce systems of oppression.

As such, when we treat demographic categories as standalone attributes and blind ourselves to the web of relationships that constitute a demographic category, we espouse a worldview that we should not consider systemically rooted differences across groups, individualizing the responsibility for historical disenfranchisement, oppression, and inequality. As has been thoroughly explored in other work and domains, attempting to ignore societal differences across demographic groups often works to reinforce or reproduce systems of oppression (see, for example, Bonilla-Silva [2010] Bonilla-Silva, E. (2010). Racism Without Racists: Color-blind Racism and the Persistence of Racial Inequality in the United States. Rowman & Littlefield. and Plaut et al. [2018] Plaut, V. C., Thomas, K. M., Hurd, K., & Romano, C. A. (2018). Do Color Blindness and Multiculturalism Remedy or Foster Discrimination and Racism? Current Directions in Psychological Science, 27(3), 200–206. https://doi.org/10.1177/0963721418766068). Within the algorithmic decision-making space specifically, Eubanks (2017) Eubanks, V. (2017). Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin’s Press has referred to this approach as reinforcing “feedback loops of injustice,” where systemic inequalities are reflected in data that is then used to make “objective” decisions that deepen the inequalities.

Thus, while there is potential benefit to collecting demographic data to enable more “attribute aware” decision-making, corporations and public institutions must be committed to addressing historical discrimination and oppression to realize this benefit. The Toolkit for Centering Racial Equity Throughout Data Integration covers in more detail than we can here what patterns of discrimination might be reproduced through the use of historical data and/or algorithmic decision-making.

Invisibility to Institutions of Importance

Beyond uncovering bias and discrimination, access to demographic data can help provide justification for the adequate representation and participation of various groups during the design and implementation of ADMS. Conversely, when data collection efforts omit certain demographic categories, or even demographics entirely, groups can be rendered invisible to the institutions relying on this data. The trajectory of COVID-19 data collection in the U.S. serves as a good example of this. Though the CDC requested racial demographic data to be collected on everyone who was treated for symptoms of COVID-19, racial demographics were frequently omitted in most local and state data collection efforts Banco, E., & Tahir, D. (2021, March 9). CDC under scrutiny after struggling to report Covid race, ethnicity data. POLITICO. https://www.politico.com/news/2021/03/09/hhs-cdc-covid-race-data-474554. As such, the unique vulnerabilities of Black, Indigenous, and Latinx individuals and communities against the virus were largely obscured until data collection and inference methods improved Banco, E., & Tahir, D. (2021, March 9). CDC under scrutiny after struggling to report Covid race, ethnicity data. POLITICO. https://www.politico.com/news/2021/03/09/hhs-cdc-covid-race-data-474554.

The risk of some groups being rendered invisible, however, can be further heightened as institutions turn to inferring demographic attributes instead of collecting them from data subjects directly. Common techniques used by public and private institutions, such as Bayesian Improved Surname Geocoding (BISG), which uses an individual’s name and zip code to predict their race Elliott, M. N., Morrison, P. A., Fremont, A., McCaffrey, D. F., Pantoja, P., & Lurie, N. (2009). Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities. Health Services and Outcomes Research Methodology, 9(2), 69., often rely on a very limited set of demographic categories that obscure subgroups that might need more specialized treatment. For example, there have been many efforts to distinguish between Asian American and Pacific Islander (AAPI) populations in health Shimkhada, R., Scheitler, A. J., & Ponce, N. A. (2021). Capturing Racial/Ethnic Diversity in Population-Based Surveys: Data Disaggregation of Health Data for Asian American, Native Hawaiian, and Pacific Islanders (AANHPIs). Population Research and Policy Review, 40(1), 81–102. https://doi.org/10.1007/s11113-020-09634-3 and education Poon, O. A., Dizon, J. P. M., & Squire, D. (2017). Count Me In!: Ethnic Data Disaggregation Advocacy, Racial Mattering, and Lessons for Racial Justice Coalitions. JCSCORE, 3(1), 91–124. https://doi.org/10.15763/issn.2642-2387.2017.3.1.91-124 due to fears that disenfranchised subgroups are made further invisible by being categorized under the broad umbrella of AAPI. Models like BISG, however, use U.S. census data and thus cannot go beyond the six census categories for race and ethnicity (White, Black, AAPI, American Indian/Alaskan Native, and Multiracial). Similarly, we have seen how inferring genders for the purposes of content recommendation and advertising can misinterpret or outright ignore individuals of minoritized gender identities Fosch-Villaronga, E., Poulsen, A., Søraa, R. A., & Custers, B. H. M. (2021). A little bird told me your gender: Gender inferences in social media. Information Processing & Management, 58(3), 102541. https://doi.org/10.1016/j.ipm.2021.102541. As such, when increasing group visibility is a salient reason for collecting demographic data, it is critical that such data is collected with the involvement and consent of members of that group.

It is important to note, however, that disaggregated data is not the only way that groups facing discrimination or other forms of inequality can become more visible. Small-scale data collection and qualitative methodologies can also be used to identify treatment and outcome disparities. Furthermore, just because a group is made visible by disaggregated data, it does not follow that the institutions making use of the data are committed to better tailoring their systems to the needs of that group. As we have seen time and time again with the hyper-surveillance of Black and Brown communities in the United States by law enforcement and public service agencies, some initial visibility can be used to justify more and more invasive forms of visibility Browne, S. (2015). Dark Matters: On the Surveillance of Blackness. In Dark Matters. Duke University Press. https://doi.org/10.1515/9780822375302 Eubanks, 2017.

*For example, when the Google Photos app automatically tagged images of Black users as gorillas or when the Apple Card reportedly offered lower credit limits to women. Both of these issues were uncovered by users publicly sharing their experiences on social media, a relatively common way that algorithmic mishaps get exposed and end up on PAI’s AI Incidents Database.