Fairer Algorithmic Decision-Making and Its Consequences: Interrogating the Risks and Benefits of Demographic Data Collection, Use, and Non-Use
Introduction and Background
Algorithmic decision-making has been widely accepted as a novel approach to overcoming the purported cognitive and subjective limitations of human decision makers by providing “objective” data-driven recommendations. Yet, as organizations adopt algorithmic decision-making systems (ADMS), countless examples of algorithmic discrimination continue to emerge. Harmful biases have been found in algorithmic decision-making systems in contexts such as healthcare, hiring, criminal justice, and education, prompting increasing social concern regarding the impact these systems are having on the wellbeing and livelihood of individuals and groups across society. In response, algorithmic fairness strategies attempt to understand how ADMS treat certain individuals and groups, often with the explicit purpose of detecting and mitigating harmful biases.
Many current algorithmic fairness techniques require access to data on a “sensitive attribute” or “protected category” (such as race, gender, or sexuality) in order to make performance comparisons and standardizations across groups. These demographic-based algorithmic fairness techniques assume that discrimination and social inequality can be overcome with clever algorithms and collection of the requisite data, removing broader questions of governance and politics from the equation. This paper seeks to challenge this assumption, arguing instead that collecting more data in support of fairness is not always the answer and can actually exacerbate or introduce harm for marginalized individuals and groups. We believe more discussion is needed in the machine learning community around the consequences of “fairer” algorithmic decision-making. This involves acknowledging the value assumptions and trade-offs associated with the use and non-use of demographic data in algorithmic systems. To advance this discussion, this white paper provides a preliminary perspective on these trade-offs derived from workshops and conversations with experts in industry, academia, government, and advocacy organizations as well as literature across relevant domains. In doing so, we hope that readers will better understand the affordances and limitations of using demographic data to detect and mitigate discrimination in institutional decision-making more broadly
Demographic-based algorithmic fairness techniques presuppose the availability of data on sensitive attributes or protected categories. However, previous research has highlighted that data on demographic categories, such as race and sexuality, are often unavailable due to a range of organizational challenges, legal barriers, and practical concerns Andrus, M., Spitzer, E., Brown, J., & Xiang, A. (2021). “What We Can’t Measure, We Can’t Understand”: Challenges to Demographic Data Procurement in the Pursuit of Fairness. ArXiv:2011.02282 (Cs). http://arxiv.org/abs/2011.02282. Some privacy laws, such as the EU’s GDPR, not only require
data subjects to provide meaningful consent when their data is collected, but also prohibit the collection of sensitive data such as race, religion, and sexuality. Some corporate privacy policies and standards, such as Privacy By Design, call for organizations to be intentional with their data collection practices, only collecting data they require and can specify a use for. Given the uncertainty around whether or not it is acceptable to ask users and customers for their sensitive demographic information, most legal and policy teams urge their corporations to err on the side of caution and not collect these types of data unless legally required to do so. As a
result, concerns over privacy often take precedence over ensuring product fairness since the trade-offs between mitigating bias and ensuring individual or group privacy are unclear Andrus et al., 2021.
In cases where sensitive demographic data can be collected, organizations must navigate a number of practical challenges throughout its procurement. For many organizations, sensitive demographic data is collected through self-reporting mechanisms. However, self reported data is often incomplete, unreliable, and unrepresentative, due in part to a lack of incentives for individuals to provide accurate
and full information Andrus et al., 2021. In some cases, practitioners choose to infer protected categories of individuals based on proxy information, a method which is largely inaccurate. Organizations also face difficulty capturing unobserved characteristics, such as disability, sexuality, and religion, as these categories are frequently missing and often unmeasurable Tomasev, N., McKee, K. R., Kay, J., & Mohamed, S. (2021). Fairness for Unobserved Characteristics: Insights from Technological Impacts on Queer Communities. ArXiv:2102.04257 (Cs). https://doi.org/10.1145/3461702.3462540. Overall, deciding on how to classify and categorize demographic data is an ongoing challenge, as demographic categories continue to shift and change over time and between contexts. Once demographic data is collected, antidiscrimination law and policies largely inhibit organizations from using this data since knowledge of sensitive categories opens the door to legal liability if discrimination is uncovered without a plan to successfully mitigate it Andrus et al., 2021.
In the face of these barriers, corporations looking to apply demographic-based algorithmic fairness techniques have called for guidance on how to responsibly collect and use demographic data. However, prescribing statistical definitions of fairness on algorithmic systems without accounting for the social, economic, and political systems in which they are embedded can fail to benefit marginalized
groups and undermine fairness efforts Bakalar, C., Barreto, R., Bogen, M., Corbett-Davies, S., Hall, M., Kloumann, I., Lam, M., Candela, J. Q., Raghavan, M., Simons, J., Tannen, J., Tong, E., Vredenburgh, K., & Zhao, J. (2021). Fairness On The Ground: Applying Algorithmic Fairness Approaches To Production Systems. 12.. Therefore, developing guidance requires a deeper understanding of the risks and trade-offs inherent to the use and non-use of demographic data. Efforts to detect and mitigate harms must account for the wider contexts and power structures that algorithmic systems, and the data that they draw on, are embedded in.
Finally, though this work is motivated by the documented unfairness of ADMS, it is critical to recognize that bias and discrimination are not the only possible harms stemming directly from ADMS. As recent papers and reports have forcefully argued, focusing on debiasing datasets and algorithms is (1) often misguided because proposed debiasing methods are only relevant for a subset of the kinds of bias ADMS introduce or reinforce, and (2) likely to draw attention away from other, possibly more salient harms Balayn, A., & Gürses, S. (2021). Beyond Debiasing. European Digital Rights. https://edri.org/wp-content/ uploads/2021/09/EDRi_Beyond-Debiasing-Report_Online.pdf. In the first case, harms from tools such as recommendation systems, content moderation systems, and computer vision systems might be characterized as a result of various forms of bias, but resolving bias in those systems generally involves adding in more context to better understand differences between groups, not just trying to treat groups more similarly. In the second case, there are many ADMS that are clearly susceptible to bias, yet the greater source of harm could arguably be the deployment of the system in the first place. Pre-trial detention risk scores provide one such example. Using statistical correlations to determine if someone should be held without bail, or, in other words, potentially punishing individuals for attributes outside of their control and past decisions unrelated to what they are currently being charged for, is itself a significant deviation from legal standards and norms, yet most of the debate has focused around how biased the predictions are. Attempting to collect demographic data in these cases will likely do more harm than good, as demographic data will
draw attention away from harms inherent to the system and towards seemingly resolvable issues around bias.