Responsible Sourcing Across the Data Supply Line

Stay Informed

First Name

Last Name

Email

Organization

please write the name of your organization in full

Title

Your Comment

Hidden Fields

Program ID

Subscription completed successfully.

Validation error occurred, please confirm the fields and submit again.

Oops, Sorry. Something is wrong. Please try again later.

For any questions, email communications@partnershiponai.org.

Overview

Data labelers, data cleaners, and others who contribute human judgment to artificial intelligence (AI) systems play a critical role in developing this technology. Drawing from a diverse range of perspectives, the Responsible Sourcing workstream aims to develop recommendations and actionable resources to improve the working conditions of these professionals.

Improving Conditions for Data Enrichment Workers

Explore PAI’s library of resources for AI practitioners to improve the well-being of data enrichment workers around the world.

Explore the Library

Research

Research and Publications

AI, Labor, and the Economy/Responsible Sourcing Across the Data Supply Line

Implementing Responsible Data Enrichment Practices at an AI Developer: The Example of DeepMind

Sonam Jindal Nov 16, 2022

Research and Publications

AI, Labor, and the Economy/Responsible Sourcing Across the Data Supply Line

Responsible Sourcing of Data Enrichment Services

PAI Staff Jun 16, 2021

Updates

Blog

AI, Labor, and the Economy/Responsible Sourcing Across the Data Supply Line

AI and Human Rights: Protecting Data Workers

Sonam Jindal May 01, 2025

Blog

AI, Labor, and the Economy/Responsible Sourcing Across the Data Supply Line

Protecting AI’s Essential Workers: Introducing our Vendor Engagement Guidance & Transparency Template

Sonam Jindal Aug 28, 2024

Blog

AI, Labor, and the Economy/Responsible Sourcing Across the Data Supply Line

Protecting AI’s Essential Workers: A Pathway to Responsible Data Enrichment Practices

Sonam Jindal Jul 30, 2024

Research and Publications

AI, Labor, and the Economy/Responsible Sourcing Across the Data Supply Line

Improving Conditions for Data Enrichment Workers

PAI Staff Nov 16, 2022

Blog

AI, Labor, and the Economy/Responsible Sourcing Across the Data Supply Line

PAI and DeepMind Collaborate to Develop Tools for Sourcing Enriched Data Responsibly

Sonam Jindal, Hudson Hongo Nov 16, 2022

Blog

AI, Labor, and the Economy/Responsible Sourcing Across the Data Supply Line

Responsible Sourcing of Data Enrichment Services

Sonam Jindal Jun 16, 2021

Blog

AI, Labor, and the Economy/Responsible Sourcing Across the Data Supply Line

Developing Guidance for Responsible Data Enrichment Sourcing

B Cavello Dec 22, 2020

Testimonials

TESTIMONIALS

“Getting the most out of big data depends on recognizing how dependent we are on workers–often temporary, working offsite–who clean, structure, and manage datasets. The future of advancing AI hinges on investing in work conditions that enhance rather than undermine how data are handled. I’m glad to see PAI’s initiative calling on AI organizations to firmly commit to this vision of responsible tech in all their data pipeline decisions.”

Mary Gray Senior Principal Researcher, Microsoft Research & Author of Ghost Work

“As AI becomes more mainstream, it is important to acknowledge the invisible work and workers that enable the technology. We hope this paper can help contribute to the dialogue around worker wellbeing in the AI supply chain.”

Elonnai Hickok

“Ethical AI is usually focused on the use of AI but the development of AI also involves significant decisions of responsibility. It takes a large human workforce to train, scale, and sustain AI and yet there are very few worker centric resources to help AI companies make these responsible decisions.”

Mark Sears Founder & CEO, CloudFactory

“This whitepaper is a must-read for AI companies which want to practice responsible procurement. We hope its recommendations will be adopted widely across the AI industry, for the benefit of both clients and data enrichment workers”

Iva Gumnishka CEO, Humans in the Loop

“Artificial intelligence is driven by human intelligence, and we have an opportunity for AI to be a force for good in the most overlooked talent pools.”

Byran Dai CEO, Daivergent

Background

With many businesses pursuing automation and personalization with their technology investments, AI applications are becoming an increasingly common feature of industry. Alongside this boom has been the expansion of data enrichment work.

Despite being an essential component of AI development, data enrichment work has for too long been both out of sight and out of mind for AI developers. Without knowledge of (and appreciation for) how it is produced, enriched data can be too easily treated as a simple commodity. This disconnect leads to a devaluing of data enrichment work, poor working conditions for data enrichment workers, and, often, worse outcomes for AI development itself.

Please accept preferences, statistics, marketing cookies to watch this video.

In the fall of 2020, the Partnership on AI hosted a Workshop Series on Responsible Sourcing of Data Enrichment Services. To kick off the event, Mary L. Gray (Microsoft Research, Indiana University) led a conversation with Dean Jansen and Aleli Alcala (Amara, a project of the Participatory Culture Foundation) highlighting alternative models for employment in on-demand work that produce better outcomes for workers.

Increasingly, AI practitioners are recognizing the importance of data enrichment work and the people behind this critical enabling step in the AI development process. Unfortunately, too many AI developers still aren’t aware of the ways they are precipitating harmful and precarious working conditions and those who are don’t know what they can do to help. From AI developers we’ve heard sentiments like “We feel we must care about the transparency of our supply chain. But there is no transparency in data labeling. Guidelines on how to navigate this would be very useful.” Similarly, data enrichment providers express that they “would love the buyers [of data labeling] to be more educated and have realistic expectations when they set the price and terms of tasks.”

The Responsible Sourcing workstream addresses these questions by working to provide actionable guidance for data scientists, AI engineers, and product managers, to empower these critical ecosystem players to do their part in ensuring healthy and fair working conditions across the data supply line.

What Is Data Enrichment?

The concepts of machine learning have been around for more than half a century, but most of the major advances have taken place in the last five to ten years. This is thanks to improvements in hardware performance and the affordability of computing power which have made it possible to collect and analyze data at an unprecedented scale. As Aaron Courville, Ian Goodfellow, and Yoshua Bengio wrote in their 2015 book Deep Learning, “The most important new development is that today we can provide these algorithms with the resources they need to succeed.” Those resources are data.

But today’s AI systems cannot be built with just any data. They require enriched data. Data enrichment is a broadly defined term that encapsulates various types of data preparation and cleaning as well as human-review processes. Enriched data is essential for the training and validation of supervised learning models, the dominant form of applied AI. Examples of data enrichment work include:

Data preparation and cleaning:

Data annotation
Intent recognition
Sentiment analysis
Image recognition
Speech to text validation

Human-review/human in the loop work, which may include:

Content moderation
Creating a continuous feedback loop
Validating algorithmic outputs and models

Events

Please accept preferences, statistics, marketing cookies to watch this video.

At PAI, we have been working to highlight the precarious working conditions faced by a key group that make AI possible: data enrichment professionals. Our recently published white paper Responsible Sourcing of Data Enrichment Services covers how data sourcing decisions impact workers and proposes avenues for AI practitioners to improve their working conditions. We were thrilled to see this issue explored The Gig Is Up.

Please accept preferences, statistics, marketing cookies to watch this video.

This panel discussion moderated by Sonam Jindal, Program Lead for AI, Labor and the Economy unpacks why companies should be prioritizing responsible data enrichment practices, what this accomplishes, and what more we need to do, and through a real-world exploration of PAI Partner DeepMind’s process, challenges, and impact of putting the Guidelines into practice.