ABOUT ML Reference Document

Last Updated

September 7, 2021

1.1.1 About This Document and Version Numbering

1.1.1
About This Document and Version Numbering

Given the growing influence of AI transparency research at a global scale, leveraging PAI’s position as a multistakeholder organization to amalgamate key contributions across separate partner initiatives will serve to organize and advocate for the underlying themes of current ML system documentation proposals. Initiatives like ABOUT ML that offer guidance on ML documentation will give companies a head start on a path of overseeing, auditing, and monitoring ML technologies, contributing to the key business goal of earning and keeping trust from consumers and policymakers.

In order to make the ABOUT ML resource more digestible, ABOUT ML has added a section giving stakeholders guidance on how to use the document. Additionally, there is now a distinction between the ABOUT ML Reference Document and the forthcoming PLAYBOOK with downloadable templates, specifications, guidance, and recommendations. ABOUT ML still plans to seek public comment, input from the Steering Committee, and feedback from at least one Diverse Voices panel as the reference evolves.

The original Version 0 (v0) draft of the ABOUT ML Reference Document focused on extracting major themes from recent research on recommendations for transparency documentation. This new version of the Reference Document builds on major themes from recent research on documentation from v0 and adds initial feedback from public commenting, a working session at PAI’s All Partners Meeting on institutional challenges and enablers of success in implementing documentation, and the Diverse Voices process to create an initial resource. The primary target audience for the ABOUT ML Reference Document are individual champions at organizations who work in positions where they may be able to advocate for the adoption of documentation processes in their team and organization. Others may also find value in its contents.

Components of the forthcoming PLAYBOOK, and to a lesser extent future versions of this reference document, will merge existing practices with insights from research, formalize best practices through an investigation into attempts to implement recommendations, and set new industry norms for documentation in ML lifecycles. Future drafts may also include commentary on other enablers of ML transparency, such mechanisms to adjust team and institutional settings, model interpretability tools, test suites and modified evaluation procedures, and more detail on necessary feedback loops for transparency. See Section 1.1.3 ABOUT ML Project Process and Timeline Overview below for further detail.

For transparency, here are the Sections of ABOUT ML released as of mid-2021 and which feedback mechanisms each one has received thus far:

Section		Latest version	Public Comment	Steering Committee	Diverse Voices
1.	Project Overview	v1.0	•	•	•
2.	Literature review	v1.0	•	•	•
3.	Preliminary Synthesized Documentation Suggestions	v1.0	•	•	•
4.	Challenges	v1.0	•	•	•
5.	Promising interventions to try	v1.0	—	•	•
6.	ML primer	v1.0	•	•	•
7.	Appendix	v1.0	•	•	•

1.1.2 ABOUT ML Goals and Plan

1.1.2
ABOUT ML Goals and Plan

There are numerous goals and stakeholders for the ABOUT ML resources. In order to prioritize, the Steering Committee recommends that the ABOUT ML Reference Document is developed towards goals in the following order aimed at building momentum in the most practical sequence to achieve widespread adoption of a set of documentation questions that serve to enable both internal and external auditability.

ABOUT ML Reference Document

The ABOUT ML Reference Document will, going forward, continue to evolve. A few guides, specifications, and other useful artifacts contained within the Reference Document will also be accessible as standalone resources.

Further work to operationalize practices noted in the ABOUT ML Reference Document will be showcased on the ABOUT ML website in the form of a PLAYBOOK. This PLAYBOOK will serve as an evolving repository of resources for stakeholders within the ML documentation community to use.

Documentation is important to consider as both an institutional process and an artifact (this idea will be expanded upon later in this document) because many teams and individuals have to incorporate completing and updating such an artifact into their work in order for it to be useful with all of the necessary information. This means that ABOUT ML’s eventual end goal is not only to recommend what information should go into documentation for all ML systems but also recommend how organizations can reliably reshape their processes to enable the reliable completion and maintenance of documentation in an ongoing manner. Thus, when considering the subgoals of ABOUT ML and how to sequence them, it is important to think about which subgoals have outcomes that can enable subsequent subgoals.

Subgoal 1 is to create documentation for internal accountability because this motivates organizations to invest in and build the internal processes and infrastructure needed to implement and scale the creation of documentation artifacts. The call for internal accountability can come from top-down buy-in, public commitments to AI ethics principles, and a set of motivated individual champions who advocate for the need for robust internal oversight over ML systems that impact many communities and people. The role of the ABOUT ML Reference Document released in 2021 will be to empower individual champions at all levels and roles inside organizations that build ML systems who are interested in advocating for and implementing ML system documentation inside their organization. The internal changes needed to be able to create documentation for all ML systems will include building tooling that reduces friction for collecting and collating information from people at different parts of the ML system lifecycle (e.g., development, testing, deployment, and maintenance), change management to convince all of these people to incorporate these tools and steps into their workflow, alignment of executives to provide resources and mandates to complete all of this work, and coordination of all teams involved in this process, spanning product teams, legal, compliance, policy and more. All of this will be a significant investment of time and resources, but the end product will be infrastructure for internal oversight and review of ML systems which also yields artifacts that can disseminate documentation information of ML systems easily between internal teams.

Accountability

We follow the lexicon of algorithms research by Kohli et. al. (2018) in defining accountability as “the answerability of actors for outcomes” and the tracing and verification of system action as well as those who take responsibility for those actions.

The infrastructure and documentation artifacts created by Subgoal 1 can then be modified to also enable Subgoal 2, external accountability. The main change would be to modify the sets of questions and information to be shared externally based on the constraints of what organizations are willing to share and what information external stakeholders need to consider the ML system sufficiently transparent. There should be a broad and public conversation between organizations that build ML systems and external stakeholders that should be consulted — including civil society organizations, policymakers, end users, and non-users impacted by ML systems — to determine what information would be necessary in documentation for external accountability. After some initial approximate agreement is reached, organizations can experiment with implementing and sharing this set of information and the external stakeholders can provide feedback on whether this information does, in practice, enable the level of accountability desired. Both sets of stakeholders can iterate together until they are satisfied with the results, with perhaps agreements to continue reevaluating at regular intervals. On the process side, because there will already be an internal process for review of the documentation, this can be extended to review what information will be released with external documentation.

With Subgoal 2 complete, there will be a broadly agreed upon set of questions that stakeholders consider sufficient for external accountability. At this stage, ABOUT ML resources will include templates for documentation for external and internal consumption as well as information on how to implement documentation as a process from pilot to scale. Subgoal 3 is to scale adoption of ABOUT ML recommendations across the AI industry. PAI can provide assistance here as there are many Partner companies that can lead general adoption. The end result would truly be new industry norms on documentation practices which could enable many of the responsible and ethical AI development goals that companies and consortiums have put forth in recent years.

None of these steps would be easy and each involves a lot of investment and coordination with numerous sets of stakeholders. However, these are worthy goals to work towards because the end result would yield a lot of benefits for the responsible development and deployment of ML systems.

1.1.3 ABOUT ML Project Process and Timeline Overview

1.1.3
ABOUT ML Project Process and Timeline Overview

PAI launched the ABOUT ML iterative multistakeholder process with the initial v0 draft in order to initiate a broader community discussion. The draft has been updated into the current release in an effort to move towards best practices by going through the following phases:

Phase 0

Set up and maintain infrastructure for public input, including:
Diverse Voices panel,
Steering Committee,
Public comment.

Phase 1

Understand the latest research

Phase 2

Understand current practice

Current Phase

Phase 3

Combine research theory and results of current practice into testable pilots

Phase 4

Run pilot tests with PAI Partners and organizations

Phase 5

Collect data from pilot tests for transparency practices

Phase 6

Iterate on pilots with the latest research and practice

Phase 7

When there is sufficient body of evidence for a certain practice, elevate it to a best practice

Phase 8

Promulgate effective practices to establish new industry norms for transparency

PAI recognizes that this effort can only succeed with input from as broad a set of stakeholders as possible, and will be seeking input not only from our many Partners, but also from stakeholders from academia, civil society organizations, companies designing and deploying ML technology, and the general public. We welcome your participation.

The process is modeled after iterative ongoing processes to design internet standards (such as W3CWorld Wide Web Consortium Process Document (W3C) process outlined here: https://www.w3.org/2019/Process-20190301/, IETFInternet Engineering Task Force (IETF) process outlined here: https://www.ietf.org/standards/process/, and WHATWGThe Web Hypertext Application Technology Working Group (WHATWG) process outlined here: https://whatwg.org/faq#process) and will include a revamped public forum for discussion and a place to submit any proposed changes. We will announce instructions for accessing this online community and will welcome you to join in the public discussion and to submit proposed changes as many times as desired.

Public comments were collected and batch evaluated by the ABOUT ML Steering Committee, which has included dozens of experts, researchers, and practitioners recruited from a diverse set of PAI Partner organizations. The Steering Committee guided the process of updating ABOUT ML drafts based on the public comments submitted and new developments in research and practice. The current Reference Document, which includes feedback from the Diverse Voices process, will be reviewed and voted on to approve new releases by “rough consensus”Oever, N., Moriarty, K. The Tao of IETF: A novice’s guide to the Internet Engineering Task Force. https://www.ietf.org/about/participate/tao/. which is commonly used by other multistakeholder working groups.

The Steering Committee reconvened on April 13th, 2021 to review and refine the ABOUT ML Reference Document in preparation for release.

To ensure that diverse perspectives — especially those from communities historically excluded from technology decision-making — contribute to any ABOUT ML recommendations, PAI engaged with the Tech Policy Lab at the University of Washington, a Partner organization, to conduct Diverse VoicesYoung, M., Magassa, L. and Friedman, B. (2019) Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents. Ethics and Information Technology 21(2), 89-103. panels. This method was designed to gather feedback from stakeholders whose perspectives might not otherwise be consulted and to ensure that those perspectives are reflected in the released text. Thus, for any ABOUT ML releases that go through the Diverse Voices process, the panel feedback will be the last edits incorporated before a new release. This also means that each round of Diverse Voices panels will cause public comment on the document to be closed for several months, although the public forum will remain open for discussion during that time. Public comment on the document itself will re-open with the new release of the draft. The first round of Diverse Voices panels for ABOUT ML was held in late 2019.

Section 0: How to Use this Document

Section 1: Project Overview

1.1 Statement of Importance for ABOUT ML Project

1.1.0 Importance of Transparency: Why a Company Motivated by the Bottom Line Should Adopt ABOUT ML Recommendations

1.1.1 About This Document and Version Numbering

1.1.2 ABOUT ML Goals and Plan

1.1.3 ABOUT ML Project Process and Timeline Overview

1.1.4 Who Is This Project For?

1.1.4.1 Audiences for the ABOUT ML Resources

1.1.4.2 Stakeholders That Should Be Consulted While Putting Together ABOUT ML Resources

1.1.4.3 Audiences for ABOUT ML Documentation Artifacts

1.1.4.4 Whose Voices Are Currently Reflected in ABOUT ML?

1.1.4.5 Origin Story

Section 2: Literature Review (Current Recommendations on Documentation for Transparency in the ML Lifecycle)

2.1 Demand for Transparency and AI Ethics in ML Systems

2.2 Documentation to Operationalize AI Ethics Goals

2.2.1 Documentation as a Process in the ML Lifecycle

2.2.2 Key Process Considerations for Documentation

2.3 Research Themes on Documentation for Transparency

2.3.1 System Design and Set Up

2.3.2 System Development

2.3.3 System Deployment

Section 3: Preliminary Synthesized Documentation Suggestions

3.4.1 Suggested Documentation Sections for Datasets

3.4.1.1 Data Specification

3.4.1.1.1 Motivation

3.4.1.2 Data Curation

3.4.1.2.1 Collection

3.4.1.2.2 Processing

3.4.1.2.3 Composition

3.4.1.2.4 Types and Sources of Judgement Calls

3.4.1.3 Data Integration

3.4.1.3.1 Use

3.4.1.3.2 Distribution

3.4.1.4 Maintenance

3.4.2 Suggested Documentation Sections for Models

3.4.2.1 Model Specifications

3.4.2.2 Model Training

3.4.2.3 Evaluation

3.4.2.4 Model Integration

3.4.2.5 Maintenance

Section 4: Current Challenges of Implementing Documentation

Section 5: Conclusions

Version 0

Version 1

Appendix A: Compiled List of Documentation Questions

Fact Sheets (Arnold et al. 2018)

Data Sheets (Gebru et al. 2018)

Model Cards (Mitchell et al. 2018)

A “Nutrition Label” for Privacy (Kelley et al. 2009)

The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards (Holland et al. 2019)

Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science (Bender and Friedman 2018)

ABOUT ML Reference Document

1.1.1 About This Document and Version Numbering

1.1.1
About This Document and Version Numbering

1.1.2 ABOUT ML Goals and Plan

1.1.2
ABOUT ML Goals and Plan

1.1.3 ABOUT ML Project Process and Timeline Overview

1.1.3
ABOUT ML Project Process and Timeline Overview

Phase 0

Phase 1

Phase 2

Phase 3

Phase 4

Phase 5

Phase 6

Phase 7

Phase 8

ABOUT ML Reference Document

Section 0: How to Use this Document

Section 1: Project Overview

Section 2: Literature Review (Current Recommendations on Documentation for Transparency in the ML Lifecycle)

Section 3: Preliminary Synthesized Documentation Suggestions

Section 4: Current Challenges of Implementing Documentation

Section 5: Conclusions

Appendix A: Compiled List of Documentation Questions

Appendix B: Diverse Voices Process and Artifacts

Appendix C: Glossary

Sources Cited