Developing Guidance for Responsible Data Enrichment Sourcing

B Cavello

December 22, 2020

In the fall of 2020, the Partnership on AI (PAI) convened a series of online workshops to refine guidelines for the responsible sourcing of data enrichment services. This Workshop Series was part of the larger Responsible Sourcing Across the Data Supply Line initiative, which seeks to develop actionable resources for artificial intelligence (AI) practitioners to ensure quality working conditions for the people who clean and label training data or otherwise contribute human judgment to AI systems. Image annotation, speech-to-text validation, and other types of data enrichment are critical components of developing machine learning systems, and AI practitioners have an important role to play in ensuring that the labor of data enrichment work is valued and respected.

The Workshop Series on Responsible Sourcing of Data Enrichment Services brought together over 30 professionals from different areas of the data enrichment ecosystem, including representatives from data enrichment providers, researchers and product managers at AI companies, as well as leaders of civil society and labor organizations. Over the course of the five-week Workshop Series, participants provided input on the forthcoming Responsible Sourcing whitepaper and developed key insights on how recommendations for AI practitioners can be made more actionable and impactful.

The Workshop Series highlighted the diversity of approaches currently used to source data enrichment services. In-house workforces, automated annotation software, managed services, and crowdsourcing platforms were just some of the solutions mentioned by practitioners. Most often, AI developers use a hybrid approach, relying on more than one of these models depending on the project constraints and resources available. To accommodate this variety, workshop participants developed stakeholder maps to identify key players and their specific responsibilities in the data enrichment supply line.

Through the discussions and activities of the Workshop Series, three particular roles at AI-developing organizations were identified as having the greatest capacity to ensure quality working conditions for data enrichment workers: data scientists, AI engineers, and AI product managers. For these players, five responsibilities were further identified as the points in the data enrichment process where they have the most opportunity to influence the conditions for data enrichment workers:

Defining the enrichment goals,
Writing the instructions,
Choosing the enrichment providers,
Rejecting or accepting the work, and
Defining technical requirements.

As we continue to develop these recommendations, we hope to connect with more data scientists, AI engineers, and product managers who are interested in providing feedback and potentially piloting the recommendations in their workflows. To learn more about the Responsible Sourcing work at PAI and stay up to date on future publications, please visit the project page.