Eyes Off My Data: Exploring Differentially Private Federated Statistics To Support Algorithmic Bias Assessments Across Demographic Groups

PAI Staff

Executive Summary

Executive Summary

Designing and deploying algorithmic systems that work as expected every time for all people and situations remains a challenge and a priority. Rigorous pre- and post-deployment fairness assessments are necessary to surface any potential bias in algorithmic systems. As they often involve collecting new user data, including sensitive demographic data, post-deployment fairness assessments to observe whether the algorithm is operating in ways that disadvantage any specific group of people can pose additional challenges to organizations. The collection and use of demographic data is difficult for organizations because it is entwined with highly contested social, regulatory, privacy, and economic considerations. Over the past several years, Partnership on AI (PAI) has investigated key risks and harms individuals and communities face when companies collect and use demographic data. In addition to well-known data privacy and security risks, such harms can stem from having one’s social identity being miscategorized or data being used beyond data subjects’ expectations, which PAI has explored through our demographic data workstream. These risks and harms are particularly acute for socially marginalized groups, such as people of color, women, and LGBTQIA+ people.

Given these risks and concerns, organizations developing digital technology are invested in the responsible collection and use of demographic data to identify and address algorithmic bias. For example, in an effort to deploy algorithmically driven features responsibly, Apple introduced IDs in Apple Wallet with mechanisms in place to help Apple and their partner issuing state authorities (e.g., departments of motor vehicles) identify any potential biases users may experience when adding their IDs to their iPhones.IDs in Wallet, in partnership with state identification-issuing authorities (e.g., departments of motor vehicles), were only available in select US states at the time of the writing of this report.

In addition to pre-deployment algorithmic fairness testing, Apple followed a post-deployment assessment strategy as well. As part of IDs in Wallet, Apple applied differentially private federated statistics as a way to protect users’ data, including their demographic data. The main benefit of using differentially private federated statistics is the preservation of data privacy by combining the features of differential privacy (e.g., adding statistical noise to data to prevent re-identification) and federated statistics (e.g., analyzing user data on individual devices, rather than on a central server, to avoid the creation and transfer of datasets that can be hacked or otherwise misused). What is less clear is whether differentially private federated statistics can attend to some of the other risks and harms associated with the collection and analysis of demographic data. To understand this, a sociotechnical lens is necessary to understand the potential social impact of the application of a technical approach.

This report is the result of two expert convenings independently organized and hosted by PAI. As a partner organization of PAI, Apple shared details about the use of differentially private federated statistics as part of their post-deployment algorithmic bias assessment for the release of this new feature.

During the convenings, responsible AI, algorithmic fairness, and social inequality experts discussed how algorithmic fairness assessments can be strengthened, challenged, or otherwise unaffected by the use of differentially private federated statistics. While the IDs in Wallet use case is limited to the US context, the participants expanded the scope of their discussion to consider differential private federated statistics in different contexts. Recognizing that data privacy and security are not the only concerns people have regarding the collection and use of their demographic data, participants were directed to consider whether differentially private federated statistics could also be leveraged to attend to some of the other social risks that can arise, particularly for marginalized demographic groups.

The multi-disciplinary participant group repeatedly emphasized the importance of having both pre- and post-deployment algorithmic fairness assessments throughout the development and deployment of an AI-driven system or product/feature. Post-deployment assessments are especially important as they enable organizations to monitor algorithmic systems once deployed in real-life social, political, and economic contexts. They also recognized the importance of thoughtfully collecting key demographic data in order to help identify group-level algorithmic harms.

The expert participants, however, clearly stated that a secure and privacy-preserving way of collecting and analyzing sensitive user data is, on its own, insufficient to deal with the risks and harms of algorithmic bias. In fact, they expressed that such a technique is not entirely sufficient for dealing with the risks and harms of collecting demographic data. Instead, the convening participants identified key choice points facing AI-developing organizations to ensure the use of differentially private federated statistics contributes to overall alignment with responsible AI principles and ethical demographic data collection and use.

This report provides an overview of differentially private federated statistics and the different choice points facing AI-developing organizations in applying differentially private federated statistics in their overall algorithmic fairness assessment strategies. Recommendations for best practices are organized into two parts:

  1. General considerations that any AI-developing organization should factor into their post-deployment algorithmic fairness assessment
  2. Design choices specifically related to the use of differentially private federated statistics within a post-deployment algorithmic fairness strategy

The choice points identified by the expert participants emphasize the importance of carefully applying differentially private federated statistics in the context of algorithmic bias assessment. For example, several features of the technique can be determined in such a way that reduces the efficacy of the privacy-preserving and security-enhancing aspects of differentially private federated statistics. Apple’s approach to using differentially private federated statistics aligned with some of the practices suggested during the expert convenings: the decision to limit the data retention period (90 days), allowing users to actively opt-into data sharing (rather than creating an opt-out model), clearly and simply sharing what data the user will be providing for the assessment, and maintaining organizational oversight of the query process and parameters.

The second set of recommendations surfaced by the expert participants primarily focus on the resources (e.g., financial, time allocation, and staffing) necessary to achieve a level of alignment and clarity on the nature of “fairness” and “equity” AI-developing organizations are seeking for their AI-driven tools and products/features. While these considerations may seem tangential, expert participants emphasized the importance of establishing a robust foundation on which differentially private federated statistics could be effectively utilized. Differentially private federated statistics, in and of itself, does not mitigate all the potential risks and harms related to collecting and analyzing sensitive demographic data. It can, however, strengthen overall algorithmic fairness assessment strategies by supporting better data privacy and security throughout the assessment process.

Eyes Off My Data: Exploring Differentially Private Federated Statistics To Support Algorithmic Bias Assessments Across Demographic Groups

Executive Summary

Introduction

The Challenges of Algorithmic Fairness Assessments

Prioritization of Data Privacy: An Incomplete Approach for Demographic Data Collection?

Premise of the Project

A Sociotechnical Framework for Assessing Demographic Data Collection

Differentially Private Federated Statistics

Differential Privacy

Federated Statistics

Differentially Private Federated Statistics

A Sociotechnical Examination of Differentially Private Federated Statistics as an Algorithmic Fairness Technique

General Considerations for Algorithmic Fairness Assessment Strategies

Design Considerations for Differentially Private Federated Statistics

Conclusion

Acknowledgments

Funding Disclosure

Appendices

Appendix 1: Fairness, Transparency and Accountability Program Area at Partnership on AI

Appendix 2: Case Study Details

Appendix 3: Multistakeholder Convenings

Appendix 4: Glossary

Appendix 5: Detailed Summary of Challenges and Risks Associated with Demographic Data Collection and Analysis

Table of Contents
1
2
3
4
5
6
7
8
9
10

Guidelines for AI and Shared Prosperity


Home

Our economic future is too important to leave to chance.

AI has the potential to radically disrupt people’s economic lives in both positive and negative ways. It remains to be determined which of these we’ll see more of. In the best scenario, AI could widely enrich humanity, equitably equipping people with the time, resources, and tools to pursue the goals that matter most to them.

Our current moment serves as a profound opportunity — one that we will miss if we don’t act now. To achieve a better future with AI, we must put in the work today.

In medicine and other fields, new innovations are put through rigorous testing to ensure they are fit for purpose. The AI community, however, has no established practice for assessing the impact of AI systems on inequality or job quality. Without one, it remains difficult to ensure AI deployments are bringing us closer to the economic future we want to live in.

You can help guide AI’s impact on jobs

AI developers, AI users, policymakers, labor organizations, and workers can all help steer AI so its economic benefits are shared by all. Using Partnership on AI’s (PAI) Shared Prosperity Guidelines, these stakeholders can minimize the chance that individual AI systems worsen shared prosperity-relevant outcomes.

The Shared Prosperity Guidelines can be used by following a guided, three-step process.

 

Get Involved

Partnership on AI needs your help to refine, test, and drive adoption of the Guidelines for AI and Shared Prosperity.

Fill out the form below to share your feedback on the Guidelines, ask about collaboration opportunities, and receive updates about events and other future work by the AI and Shared Prosperity Initiative.

Get in Touch

Guidelines for AI and Shared Prosperity

Home

Step 1: Learn About the Guidelines

The Need for the Guidelines

The Origin of the Guidelines

Design of the Guidelines

Key Principles for Using the Guidelines

Step 2: Apply the Job Impact Assessment Tool

Instructions for Performing a Job Impact Assessment

Signals of Opportunity to Advance Shared Prosperity

Signals of Risk to Shared Prosperity

STEP 3: Stakeholder-Specific Recommendations

For AI-Creating Organizations

For AI-Using Organizations

For Policymakers

For Labor Organizations and Workers

Get Involved

Endorsements

Acknowledgments

AI and Shared Prosperity Initiative’s Steering Committee

Sources Cited

  1. ​​Acemoglu, D. (Ed.). (2021). Redesigning AI: Work, democracy, and justice in the age of automation. Boston Review.
  2. Korinek, A., and Stiglitz, J.E. (2020, April). Steering technological progress. In NBER Conference on the Economics of AI.
  3. Acemoglu, D., and Johnson, S. (2023). Power and Progress: Our Thousand-Year Struggle Over Technology and Prosperity. Public Affairs, New York.
  4. International Labour Organization. (n.d.). Decent work. https://tinyurl.com/yur776yd
  5. US Department of Commerce and US Department of Labor. (n.d.). Department of Commerce and Department of Labor Good Jobs Principles, DOL. https://tinyurl.com/mtbpemkn
  6. Institute for the Future of Work. (n.d.). The Good Work Charter. https://tinyurl.com/ycxtaax4
  7. Klinova, K., and Korinek, A. (2021). AI and shared prosperity. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (pp. 645-651).
  8. Bell, S. A. (2022). AI and Job Quality: Insights from Frontline Workers. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4337611
  9. Partnership on AI, 2021. Redesigning AI for Shared Prosperity: an Agenda. https://partnershiponai.org/paper/redesigning-ai-agenda/
  10. Negrón, W. (2021). Little Tech is Coming for Workers. Coworker.org. https://home.coworker.org/wp-content/uploads/2021/11/Little-Tech-Is-Coming-for-Workers.pdf.
  11. Korinek, A., 2022. How innovation affects labor markets: An impact assessment.
  12. Brynjolfsson, E., Collis, A., Diewert, W.E., Eggers, F., and Fox, K.J. (2019). GDP-B: Accounting for the value of new and free goods in the digital economy (No. w25695). National Bureau of Economic Research.
  13. Acemoglu, D., and Restrepo, P. (2019). Automation and new tasks: How technology displaces and reinstates labor. Journal of Economic Perspectives, 33(2), 3-30.
  14. Bell, S. A. (2022). AI and Job Quality: Insights from Frontline Workers. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4337611
  15. Valentine, M., and Hinds, R. (2022). How Algorithms Change Occupational Expertise by Prompting Explicit Articulation and Testing of Experts’ Theories. https://tinyurl.com/pxyr8ev3
  16. Autor, D. (2022). The labor market impacts of technological change: From unbridled enthusiasm to qualified optimism to vast uncertainty (No. w30074). National Bureau of Economic Research.
  17. Mateescu, A., and Elish, M. (2019). AI in context: the labor of integrating new technologies.
  18. Elish, M. C. (2019). Moral crumple zones: Cautionary tales in human-robot interaction (pre-print). Engaging Science, Technology, and Society (pre-print).
  19. World Bank. (2017). World development report 2018: Learning to realize education's promise. The World Bank.
  20. Korinek, A., and Stiglitz, J.E. (2021). Artificial intelligence, globalization, and strategies for economic development (No. w28453). National Bureau of Economic Research.
  21. Diao, X., Ellis, M., McMillan, M. S., and Rodrik, D. (2021). Africa's manufacturing puzzle: Evidence from Tanzanian and Ethiopian firms (No. w28344). National Bureau of Economic Research.
  22. Rodrik, D. (2022). 4 Prospects for global economic convergence under new technologies. An inclusive future? Technology, new dynamics, and policy challenges, 65.
  23. O'Keefe, C., Cihon, P., Garfinkel, B., Flynn, C., Leung, J., and Dafoe, A. (2020, February). The windfall clause: Distributing the benefits of AI for the common good. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 327-331).
  24. Bell, S. A. (2022). AI and Job Quality: Insights from Frontline Workers. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4337611
  25. Scherer, M., and Brown, L. X. (2021). Warning: Bossware May Be Hazardous to Your Health. Center for Democracy and Technology. https://cdt.org/wp-content/uploads/2021/07/2021-07-29-Warning-Bossware-May-Be-Hazardous-To-Your-Health-Final.pdf
  26. Bell, S. A. (2022). AI and Job Quality: Insights from Frontline Workers. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4337611
  27. Acemoglu, D., and Restrepo, P. (2022). Tasks, automation, and the rise in US wage inequality. Econometrica, 90(5), 1973-2016.
  28. Valentine, M., and Hinds, R. (2022). How Algorithms Change Occupational Expertise by Prompting Explicit Articulation and Testing of Experts’ Theories. https://tinyurl.com/pxyr8ev3
  29. Nurski, L., and Hoffmann, M. (2022). The Impact of Artificial Intelligence on the Nature and Quality of Jobs. Working Paper. Bruegel. https://tinyurl.com/jxayzdcz
  30. Pritchett, L. (2020). The future of jobs is facing one, maybe two, of the biggest price distortions ever. Middle East Development Journal, 12(1), 131-156.
  31. Eloundou, T., Manning, S., Mishkin, P., and Rock, D. (2023). GPTs are GPTs: An early look at the labor market impact potential of large language models. arXiv preprint arXiv:2303.10130.
  32. Noy, S., and Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4375283
  33. Korinek, A. (2023). Language models and cognitive automation for economic research (No. w30957). National Bureau of Economic Research.
  34. Case, A., and Deaton, A. (2020). Deaths of Despair and the Future of Capitalism. Princeton University Press.
  35. Gihleb, R., Giuntella, O., Stella, L., and Wang, T. (2022). Industrial robots, workers’ safety, and health. Labour Economics, 78, 102205.
  36. Pritchett, L. (2020). The future of jobs is facing one, maybe two, of the biggest price distortions ever. Middle East Development Journal, 12(1), 131-156.
  37. Pritchett, L. (2023). Choose People. LaMP Forum. https://lampforum.org/2023/03/02/choose-people/
  38. Gray, M. L., and Suri, S. (2019). Ghost work: How to stop Silicon Valley from building a new global underclass. Eamon Dolan Books.
  39. Dubal, V. (2023). On Algorithmic Wage Discrimination. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4331080
  40. Bell, S. A. (2022). AI and Job Quality: Insights from Frontline Workers. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4337611
  41. Schneider, D., and Harknett, K. (2017, April). Schedule Instability and Unpredictability and Worker and Family Health and Well-being. In PAA 2017 Annual Meeting. PAA.
  42. Williams, J. et al. (2022). Stable scheduling study: Health outcomes report. https://ssrn.com/abstract=4019693
  43. Bell, S. A. (2022). AI and Job Quality: Insights from Frontline Workers. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4337611
  44. Dzieza, J. (2020). Robots aren’t taking our jobs — They’re becoming our bosses. The Verge. https://tinyurl.com/5a9mxeuz
  45. Levy, K. (2022). Data Driven: truckers, technology, and the new workplace surveillance. Princeton University Press.
  46. Moore, P.V. (2017). The quantified self in precarity: Work, technology and what counts. Routledge.
  47. Scherer, M., and Brown, L. X. (2021). Warning: Bossware May Be Hazardous to Your Health. Center for Democracy and Technology. https://cdt.org/wp-content/uploads/2021/07/2021-07-29-Warning-Bossware-May-Be-Hazardous-To-Your-Health-Final.pdf.
  48. Brand, J., Dencik, L. and Murphy, S. (2023). The Datafied Workplace and Trade Unions in the UK. Data Justice Lab. https://datajusticeproject.net/wp-content/uploads/sites/30/2023/04/Unions-Report_final.pdf.
  49. Nurski, L., and Hoffmann, M. (2022). The Impact of Artificial Intelligence on the Nature and Quality of Jobs. Working Paper. Bruegel. https://tinyurl.com/2a943p8f
  50. Nanavaty, R. (2023). Interview with Reema Nanavaty, Self-Employed Women’s Association.
  51. Beane, M. (2022). Today's Robotic Surgery Turns Surgical Trainees into Spectators: Medical Training in the Robotics Age Leaves Tomorrow's Surgeons Short on Skills. IEEE Spectrum, 59(8), 32-37. https://tinyurl.com/wyhxukhk
  52. Gray, M. L., and Suri, S. (2019). Ghost work: How to stop Silicon Valley from building a new global underclass. Eamon Dolan Books.
  53. Center for Democracy and Technology et al. 2022
  54. Buolamwini, J., and Gebru, T. (2018, January). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency (pp. 77-91). PMLR.
  55. Benjamin, R. (2019). Race After Technology: Abolitionist Tools for the New Jim Code. John Wiley and Sons.
  56. Keyes, O. (2018). The misgendering machines: Trans/HCI implications of automatic gender recognition. Proceedings of the ACM on human-computer interaction, 2(CSCW), 1-22.
  57. Rosales, A., and Fernández-Ardèvol, M. (2019). Structural ageism in big data approaches. Nordicom Review, 40(s1), 51-64.
  58. Klinova, K. (2022) Governing AI to Advance Shared Prosperity. In Justin B. Bullock et al. (Eds.), The Oxford Handbook of AI Governance. Oxford Handbooks.
  59. Park, H., Ahn, D., Hosanagar, K., and Lee, J. (2021, May). Human-AI interaction in human resource management: Understanding why employees resist algorithmic evaluation at workplaces and how to mitigate burdens. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1-15).
  60. Bernhardt, A., Suleiman, R., and Kresge, L. (2021). Data and algorithms at work: the case for worker technology rights. https://laborcenter.berkeley.edu/wp-content/uploads/2021/11/Data-and-Algorithms-at-Work.pdf.
  61. Colclough, C.J. (2022). Righting the Wrong: Putting Workers’ Data Rights Firmly on the Table. https://tinyurl.com/26ycnpv2
  62. Pasquale, F. (2020). New Laws of Robotics. Harvard University Press.
  63. Rodrik, D. (2022). 4 Prospects for global economic convergence under new technologies. An inclusive future? Technology, new dynamics, and policy challenges, 65.
  64. Anderson, E. (2019). Private Government: How Employers Rule Our Lives (and Why We Don’t Talk about it). Princeton University Press.
  65. Korinek, A. (2022). How innovation affects labor markets: An impact assessment.
  66. Institute for the Future of Work. (2023). Good Work Algorithmic Impact Assessment Version 1: An approach for worker involvement. https://tinyurl.com/mr4yn5yt
  67. Bernhardt, A., Suleiman, R., and Kresge, L. (2021). Data and algorithms at work: the case for worker technology rights. https://laborcenter.berkeley.edu/wp-content/uploads/2021/11/Data-and-Algorithms-at-Work.pdf.
  68. Colclough, C.J. (2022). Righting the Wrong: Putting Workers’ Data Rights Firmly on the Table. https://tinyurl.com/26ycnpv2
  69. Brand, J., Dencik, L. and Murphy, S. (2023). The Datafied Workplace and Trade Unions in the UK. Data Justice Lab. https://datajusticeproject.net/wp-content/uploads/sites/30/2023/04/Unions-Report_final.pdf.
  70. Park, H., Ahn, D., Hosanagar, K., and Lee, J. (2021, May). Human-AI interaction in human resource management: Understanding why employees resist algorithmic evaluation at workplaces and how to mitigate burdens. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1-15).
  71. Mateescu, A., and Elish, M. (2019). AI in context: the labor of integrating new technologies.
  72. Elish, M. C. (2019). Moral crumple zones: Cautionary tales in human-robot interaction (pre-print). Engaging Science, Technology, and Society (pre-print).
Table of Contents

Implementing Responsible Data Enrichment Practices at an AI Developer: The Example of DeepMind

Sonam Jindal

Executive Summary

Executive Summary

As demand for AI services grows, so, too, does the need for the enriched data used to train and validate machine learning (ML) models. While these datasets can only be prepared by humans, the data enrichment workers who do so (performing tasks like data annotation, data cleaning, and human review of algorithmic outputs) are an often-overlooked part of the development lifecycle, frequently working in poor conditions continents away from AI-developing companies and their customers.

Workers

For the purposes of this white paper we refer to individuals completing data enrichment as “workers.” In doing so, we recognize the variety of employment statuses that can exist in the data enrichment industry, including independent contractors on self-service crowdsourcing platforms, subcontractors of data enrichment providers, and full-time employees.

Last year, the Partnership on AI (PAI) published “
Responsible Sourcing of Data Enrichment Services
,” a white paper exploring how the choices made by AI practitioners could improve the working conditions of these data enrichment professionals. This case study documents an effort to put that paper’s recommendations into practice at one AI developer: DeepMind, a PAI Partner.

In addition to creating guidance for responsible AI development and deployment, PAI’s Theory of Change includes collaborating with Partners and others to implement our recommendations in practice. From these collaborations, PAI collects findings which help us further develop our curriculum of responsible AI resources. This case study serves as one such resource, offering a detailed account of DeepMind’s process and learnings for other organizations interested in improving their data enrichment sourcing practices.

Sourcing enriched data
Sourcing data enrichment work is a process that requires a number of steps including, but not limited to, defining the enrichment goal, choosing the enrichment provider, defining the enrichment tools, defining the technical requirements, writing instructions, ensuring that instructions make sense, setting worker hours, determining time spent on a particular task, communicating with enrichment workers, rejecting or accepting work, defining a project budget, determining workers’ payment, checking work quality, and providing performance feedback.

After assessing DeepMind’s existing practices and identifying what was needed to consistently source enriched data responsibly, PAI and DeepMind worked together to prototype the necessary policies and resources. The Responsible Data Enrichment Implementation Team (which consisted of PAI and members of DeepMind’s Responsible Development and Innovation team, which we will refer to as “the implementation team” in this case study) then collected multiple rounds of feedback, testing the following outputs and changes with smaller teams before they were rolled out organization-wide:

A two-page document offering fundamental guidelines for responsible data enrichment sourcing
An updated ethics review process
A checklist detailing what constitutes “good instructions” for data enrichment workers
A table to easily compare the salient features of various data enrichment platforms and vendors
A spreadsheet listing the living wages in areas where data enrichment workers commonly live

Versions of these resources have been added to PAI’s responsible data enrichment sourcing library and are now available for any organization that wishes to improve its data enrichment sourcing practices.

Ultimately, DeepMind’s multidisciplinary teams developing AI research, including applied AI researchers (or “researchers” for the purposes of this case study, though this term might be defined differently elsewhere) said that these new processes felt efficient and helped them think more deeply about the impact of their work on data enrichment workers. They also expressed gratitude for centralized guidance that had been developed through a rigorous process, removing the burden for them to individually figure out how to set up data enrichment projects.

Data Enrichment

Data enrichment is curation of data for the purposes of machine learning model development that requires human judgment and intelligence. This can include data preparation, cleaning, labeling, and human review of algorithmic outputs, sometimes performed in real time.

Examples of data enrichment work:

Data preparation, annotation, cleaning, and validation:
Intent recognition, Sentiment tagging, Image labeling

Human review (sometimes referred to as “human in the loop”):
Content moderation, Validating low confidence algorithmic predictions, Speech-to-text error correction

While organizations hoping to adopt these resources may want to similarly engage with their teams to make sure their unique use cases are accounted for, we hope these tested resources will provide a better starting point to incorporate responsible data enrichment practices into their own workflows. Furthermore, to identify where the implemented changes fall short of ideal, we plan to continue developing this work through engagement and convenings. To stay informed, sign up for updates on PAI’s Responsible Sourcing Across the Data Supply Line Workstream page.

This case study details the process by which DeepMind adopted responsible data enrichment sourcing recommendations as organization-wide practice, how challenges that arose during this process were addressed, and the impact on the organization of adopting these recommendations. By sharing this account of how DeepMind did it and why they chose to invest time to do so, we intend to inspire other organizations developing AI to undertake similar efforts. It is our hope that this case study and these resources will empower champions within AI organizations to create positive change.

Implementing Responsible Data Enrichment Practices at an AI Developer: The Example of DeepMind

Executive Summary

Background

Importance of Data Enrichment Workers and Pathways to Improve Working Conditions

Case Study as a Method of Increasing Transparency and Sharing Actionable Guidance

Background on DeepMind’s Motivations

Process and Outcomes of the DeepMind and PAI Collaboration

Changes and Resources Introduced to Support Adoption of Recommendations

Two-Page Data Enrichment Sourcing Guidelines Document

Adapted Review Process

Good Instructions Checklist

Vendor and Platform Feature Comparison Table

Living Wages Spreadsheet

Addressing Practical Complexities That Arose While Finalizing Changes

Assessing Clarity of Guidelines and Rolling Out Changes Organization-Wide

Reactions, Impact, and Next Steps

Response from Research and Development Teams

Key Stakeholders/Leadership Reflections and Motivations

Continued Work for DeepMind

Limitations of Case Study Applicability

Conclusion

Acknowledgements

Appendix A: Initial Discovery Process and Getting Reactions to PAI Responsible Sourcing Recommendations

Sources Cited

  1. Geiger, R. Stuart, et al. “Garbage in, garbage out? Do machine learning application papers in social computing report where human-labeled training data comes from?.” Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 2020
  2. Denton, Emily, et al. “On the genealogy of machine learning datasets: A critical history of ImageNet.” Big Data u0026amp; Society 8.2 (2021): 20539517211035955.
  3. Hutchinson, Ben, et al. “Towards accountability for machine learning datasets: Practices from software engineering and infrastructure.” Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 2021
  4. Gray, Mary L., and Siddharth Suri. Ghost work: How to stop Silicon Valley from building a new global underclass. Eamon Dolan Books, 2019
  5. Geiger, R. Stuart, et al. “Garbage in, garbage out? Do machine learning application papers in social computing report where human-labeled training data comes from?.” Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 2020
Table of Contents
1
2
3
4
5
6
7
8