Prioritizing Equity in Algorithmic Systems through Inclusive Data Guidelines

Eliza McCullough

May 14, 2024

Algorithmic systems are increasingly pervasive in our everyday lives, from screening job applicants to recommending content tailored to our interests and figuring out the fastest route for our commutes to work. But with their widespread adoption comes the issue of algorithmic bias. A longstanding issue in AI development, algorithmic bias is the systematic distortion in the data or system development which causes unjust outcomes for certain groups of people. Algorithmic bias disproportionately impacts marginalized communities such as Black, Indigenous, and other people of color, LGBTQIA+ communities, and women. These biases not only lead to discriminatory outcomes for users of these systems but can also perpetuate existing structural inequities.

Consider a scenario where a company implements an algorithmic system to screen job resumes, aiming to identify the strongest candidates. To detect any bias, the company decides to collect demographic data from job applicants to assess the system’s performance across different groups of people. The company could then use these results to inform a fairness intervention if they discover algorithmic bias in the system. This might include adjusting components of the system, retraining the system, further investigating the sources of bias, or choosing to discard the system altogether.

However, collecting sensitive demographic data from users can cause additional harm, particularly to groups already impacted by algorithmic discrimination. The company’s demographic data collection process could exclude trans or gender non-conforming applicants by only providing binary gender options, erasing any biased outcomes these applicants might face. The company could also wrongfully repurpose the demographic dataset collected for the fairness assessment to inform job ad-targeting efforts, even though participants only consented to the use of their dataset for the defined assessment. Finally, the company could fail to securely store the dataset and expose sensitive information to data leaks (such as data related to sexuality) which puts applicants at risk of harm or violence. These harms highlight a fundamental issue in the development of these systems: the apparent need to collect demographic data to address algorithmic discrimination and the imperative to prevent the harms that can stem from this process.

AI public policy has increasingly pointed to the challenge of algorithmic discrimination and the need to build fair algorithmic systems. The European AI Act calls for organizations to evidence how they will safeguard against bias in their systems while the US AI Executive Order states that the federal government will enforce existing consumer protection laws to prevent algorithmic discrimination. Yet how to achieve algorithmic fairness is not clear-cut. Given the lack of formal regulation in this space, organizations are often left to create their own internal fairness testing practices and policies. Over the last few years, Partnership on AI has explored how organizations assess and address discrimination in their systems. We found that teams primarily turn to quantitative bias testing using demographic data (like in our previous example). But while this data is collected for a good reason (to prevent bias), it can end up creating additional harms – harms that usually impact the same communities most likely to experience algorithmic discrimination.

The Participatory & Inclusive Demographic Data Guidelines

We developed the Participatory & Inclusive Demographic Data Guidelines to address this challenge. The Guidelines aim to provide AI developers, teams within technology companies, and other data practitioners with guidance on how to collect and use demographic data for fairness assessments to advance the needs of data subjects and communities, particularly those most at risk of harm from algorithmic systems.

The Guidelines are organized around the demographic data lifecycle. For each stage, we identify the key risks faced by data subjects and communities (especially marginalized groups), baseline requirements and recommended practices that organizations should undertake to prevent these risks, and guiding questions that organizations can use to achieve the recommended practices. Alongside this resource, we also built an Implementation Workbook and Case Study for further guidance.

Central to the Guidelines is the concept of data justice which asserts that people have a right to choose if, when, how, under what circumstances, and to what ends they are represented in a dataset. To uphold data justice, we outline four key principles:

Prioritize the Right to Self-Identification: Organizations should empower data subjects and communities to choose how their identities are represented during data collection.
Co-Define Fairness: Organizations should work with data subjects and communities to understand their expectations of ‘fairness’ when collecting demographic data.
Implement Affirmative, Informed, Accessible, & Ongoing Consent: Organizations must design consent processes that are clear, approachable, and accessible to data subjects, particularly those most at risk of harm by the algorithmic system.
Promote Equity-Based Analysis: Organizations should focus on the needs and risks of groups most at risk of harm by the algorithmic system throughout the demographic data lifecycle.

Like all our work at PAI, we developed these Guidelines by consulting and collaborating with our multistakeholder community. A working group of 16 experts convened monthly, representing perspectives from the technology industry, academia, civil society, and government offices across six countries (US, UK, Canada, South Africa, the Netherlands, and Australia). The group co-drafted each component of the Guidelines, gathered feedback from attendees at several workshops, and commissioned seven equity experts who specialize in topics such as data justice, AI ethics in the Majority World, racial justice, LGBTQ+ justice, and disability rights.

We need your help

While the Guidelines include a range of perspectives, we are seeking additional input from the broader community. That’s why we are calling for public comment on the Participatory & Inclusive Demographic Data Guidelines.

Do you work at an organization that develops or uses algorithmic systems?
Are you a member of a community at particular risk of harm by algorithmic systems or do you advocate on behalf of these communities?
Are you a public official interested in developing further government guidance on algorithmic fairness assessments?
Or are you someone who is especially passionate about data justice and algorithmic fairness?

Then we want to hear from you! Use this form to submit your comment.

We want to know whether you think the Guidelines are understandable, relevant, and applicable to organizations conducting algorithmic fairness assessments.

Do the Guidelines adequately address the needs of individuals and communities?
What considerations related to the use of demographic data in algorithmic fairness assessments are missing?
How can we make the Guidelines more useful for you and your work?

Help us make this resource stronger by submitting a comment. By working together, we can ensure AI developers and data practitioners have the necessary tools to promote the just application of algorithmic systems. For questions or concerns regarding the Guidelines or the Fairness, Transparency, and Accountability workstream, reach out to Eliza McCullough at eliza@partnershiponai.org.

Back to All Posts