ABOUT ML Reference Document

Last Updated

September 7, 2021

Section 1: Project Overview

The ABOUT ML project objective is to work towards a new industry emphasis on transparent machine learning (ML) systems. By providing a guide for practitioners to start taking transparency seriously, this document serves as a first step. The goal of this document is to synthesize insights and recommendations from the existing body of literature to begin a public multistakeholder conversation about how to improve ML transparency.

1.1 Statement of Importance for ABOUT ML Project

1.1
Statement of Importance for ABOUT ML Project

As machine learning becomes central to many decision-making processes — including high-stakes decisions in criminal justice, healthcare, and banking — organizations using ML systems to aid or automate decisions face increased pressure for transparency on how these decisions are made. In a 2019 Harvard Business Review article, Eric Colson states that routine decisions based on structured data are best handled by artificial intelligence as AI is “less prone to human’s cognitive bias.” However, the author goes on to warn, developers and deployers of AI, specifically ML systems, should consider the inherent “risk of using biased data that may cause AI to find specious relationships that are unfair.” Annotation and Benchmarking on Understanding and Transparency of Machine learning Lifecycles (ABOUT ML) is a project of the Partnership on AI (PAI) working towards establishing new norms on transparency by identifying best practices for documenting and characterizing key components and phases throughout the ML system lifecycle from design to deployment, including annotations of data, algorithms, performance, and maintenance requirements.

Transparency

As noted in Jobin et al. (2019), the “interpretation, justification, domain of application, and mode of achievement” of AI transparency vary from one publication to another.
For this document, we adopt a meaning for transparency that includes all “efforts to increase explainability, interpretability or other acts of communication and disclosure.”

Presently, there is neither consensus on which documentation practices work best nor on what information needs to be disclosed and for which goals. Moreover, the definition of transparency itself is highly contextual. Because there is currently no standardized process across the industry, each team that wants to improve transparency in an ML system must address the entire suite of questions about what transparency means for their team, product, and organization within the context of their specific goals and constraints. Our goal is to provide a start to that process of exploration. We will offer a summary of recommendations and practices that is mindful of the variance in transparency expectations and outcomes. We hope to provide an adaptive resource to highlight common themes about transparency, rather than a rigid list of requirements. This should serve to guide teams to identify and address context-specific challenges.

ABOUT ML

The ABOUT ML initiative was presented at the Human-Centric Machine Learning workshop at the 2019 Neural Information Processing Systems conference. In this work, Deb Raji and Jingying Yang note that “transparency through documentation is a promising practical intervention that can integrate into existing workflows to provide clarity in decision making for users, external auditors, procurement departments, and other stakeholders alike.”

While substantial decentralized experimentation is currently taking placeHolstein, K., Vaughan, J.W., Daumé, H., Dudík, M., & Wallach, H.M. (2018). Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? CHI., the ABOUT ML project aims to accelerate progress by pooling insights more quickly, sharing resources, and reducing redundancy of highly similar efforts. In doing this together, the community can improve quality, reproducibility, rigor, and consistency of these efforts by gathering evaluation data for a variety of proposals. The Partnership on AI (PAI) aims to provide a gathering place for researchers, AI practitioners, civil society organizations, and especially those affected by AI products to discuss, debate, and ultimately decide on broadly applicable recommendations. ABOUT ML seeks to bring together representatives from a wide range of relevant stakeholder groups to improve public discussion and promulgate best practices into new industry norms that will reflect diverse interests and chart a path forward for greater transparency in ML. We encourage any organization undertaking transparency initiatives to share their practices and lessons learned to PAI for incorporation into future versions of this document and/or artifacts in the forthcoming PLAYBOOK.

This is an ongoing project with regular evaluation points to keep up with the rapidly evolving field of AI. PAI’s broad range of partner organizations, including corporate developers of AI, civil society organizations, and academic institutions, will be involved in the drafting and vetting of documentation themes recommended in this document. In addition, PAI engaged with the Tech Policy Lab at the University of Washington to run a Diverse VoicesYoung, M., Magassa, L. and Friedman, B. (2019) Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents. Ethics and Information Technology 21(2), 89-103. panel to gather opinions from stakeholders whose perspectives might not otherwise be captured. Through this process, PAI has gained deeper insights into the Diverse Voices process in order to inform the ABOUT ML recommendations on how to incorporate the perspectives of diverse stakeholders.

We began by highlighting recurrent themes in ML research about documentation, but our ambitious aim is to identify all practices that have sufficient positive data of efficacy to be deemed best practices in ML transparency. PAI has welcomed a public discussion of what it takes to have sufficient data to be deemed best practices alongside the design of ABOUT ML PILOTS. Now that the input from the Diverse Voices process has been incorporated in this current version of the document, PAI aims to continue investigating and refining best practices so they can be disseminated broadly into new norms to improve transparency in the AI industry. We will also continue to highlight promising but insufficiently well-supported practices that are especially deserving of further study.

1.1.0 Importance of Transparency: Why a Company Motivated by the Bottom Line Should Adopt ABOUT ML Recommendations

1.1.0
Importance of Transparency: Why a Company Motivated by the Bottom Line Should Adopt ABOUT ML Recommendations

Companies can showcase and implement their commitment to responsible AI by adopting the tenets set forth in this ABOUT ML Reference Document and any forthcoming components of the PLAYBOOK. This work is meant to empower that intention with scientifically supported recommendations and artifacts to support the “actioning” of transparency and accountability. As noted in Section 2.2: Documentation to Operationalize AI Ethics Goals, documentation provides important benefits even in contexts where full external sharing is not possible.

The ABOUT ML effort aims to encourage organizations to invest in and build the internal processes and infrastructure needed to implement and scale the creation of documentation artifacts. Internal documentation (for other teams inside the same organization, with more details) and external documentation (for broader consumption, with fewer sensitive details) are both valuable and should be undertaken together as they provide complementary incentives and benefits. Organizations will benefit from the alignment of internal and external incentives with the incentives behind proper documentation.

The ABOUT ML effort aims to serve the ML documentation stakeholder community by positioning itself as a convener of recommendations and templates. This is meant to support a centralized governance structure with near-consensus standardization for ML documentation processes and artifacts. A coordinated effort within the community could benefit users and impacted non-users of ML systems.

Section 0: How to Use this Document

Section 1: Project Overview

1.1 Statement of Importance for ABOUT ML Project

1.1.0 Importance of Transparency: Why a Company Motivated by the Bottom Line Should Adopt ABOUT ML Recommendations

1.1.1 About This Document and Version Numbering

1.1.2 ABOUT ML Goals and Plan

1.1.3 ABOUT ML Project Process and Timeline Overview

1.1.4 Who Is This Project For?

1.1.4.1 Audiences for the ABOUT ML Resources

1.1.4.2 Stakeholders That Should Be Consulted While Putting Together ABOUT ML Resources

1.1.4.3 Audiences for ABOUT ML Documentation Artifacts

1.1.4.4 Whose Voices Are Currently Reflected in ABOUT ML?

1.1.4.5 Origin Story

Section 2: Literature Review (Current Recommendations on Documentation for Transparency in the ML Lifecycle)

2.1 Demand for Transparency and AI Ethics in ML Systems

2.2 Documentation to Operationalize AI Ethics Goals

2.2.1 Documentation as a Process in the ML Lifecycle

2.2.2 Key Process Considerations for Documentation

2.3 Research Themes on Documentation for Transparency

2.3.1 System Design and Set Up

2.3.2 System Development

2.3.3 System Deployment

Section 3: Preliminary Synthesized Documentation Suggestions

3.4.1 Suggested Documentation Sections for Datasets

3.4.1.1 Data Specification

3.4.1.1.1 Motivation

3.4.1.2 Data Curation

3.4.1.2.1 Collection

3.4.1.2.2 Processing

3.4.1.2.3 Composition

3.4.1.2.4 Types and Sources of Judgement Calls

3.4.1.3 Data Integration

3.4.1.3.1 Use

3.4.1.3.2 Distribution

3.4.1.4 Maintenance

3.4.2 Suggested Documentation Sections for Models

3.4.2.1 Model Specifications

3.4.2.2 Model Training

3.4.2.3 Evaluation

3.4.2.4 Model Integration

3.4.2.5 Maintenance

Section 4: Current Challenges of Implementing Documentation

Section 5: Conclusions

Version 0

Version 1

Appendix A: Compiled List of Documentation Questions

Fact Sheets (Arnold et al. 2018)

Data Sheets (Gebru et al. 2018)

Model Cards (Mitchell et al. 2018)

A “Nutrition Label” for Privacy (Kelley et al. 2009)

The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards (Holland et al. 2019)

Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science (Bender and Friedman 2018)

ABOUT ML Reference Document

Section 1: Project Overview

Section 1: Project Overview

1.1 Statement of Importance for ABOUT ML Project

1.1
Statement of Importance for ABOUT ML Project

1.1.0 Importance of Transparency: Why a Company Motivated by the Bottom Line Should Adopt ABOUT ML Recommendations

1.1.0
Importance of Transparency: Why a Company Motivated by the Bottom Line Should Adopt ABOUT ML Recommendations

ABOUT ML Reference Document

Section 0: How to Use this Document

Section 1: Project Overview

Section 2: Literature Review (Current Recommendations on Documentation for Transparency in the ML Lifecycle)

Section 3: Preliminary Synthesized Documentation Suggestions

Section 4: Current Challenges of Implementing Documentation

Section 5: Conclusions

Appendix A: Compiled List of Documentation Questions

Appendix B: Diverse Voices Process and Artifacts

Appendix C: Glossary

Sources Cited