ABOUT ML Reference Document

Last Updated

September 7, 2021

Appendix C: Glossary

API:
Application programming interface that defines the kinds of calls or requests that can be made, how to make them, the data formats that should be used, the conventions to follow, etc.

Autonomous vehicle:
A vehicle capable of sensing its environment and operating without human involvement.

Black box:
Any complex piece of equipment, typically a unit in an electronic system, with contents that are mysterious to the user.

Chatbot:
A computer program designed to simulate conversation with human users, especially over the internet.

Civil Society Organizations (CSOs):
Non-state, not-for-profit, voluntary entities formed by people in the social sphere that are separate from the State and the market. CSOs represent a wide range of interests and ties. They can include community-based organizations as well as non-governmental organizations (NGOs).

Disseminate:
To spread (something, especially information) widely.

Hyperparameter:
A parameter whose value is used to control the learning process.

Impacted non-users:
Persons or organizations impacted by a system even though they did not directly use it (i.e., a banking customer [nonuser] rejected for a home loan by a 3rd party algorithm [ML System] owned by some organization external to the bank and possibly implemented without the knowledge of the banking customer).

Industry norm:
An authoritative standard.

Intellectual property:
Any work or invention that is the result of creativity, such as a manuscript or a design, to which one has rights and for which one may apply for a patent, copyright, trademark, etc.

Iterative process:
A series of steps that are repeated, tweaking and improving the product with each repetition.

Model card:
A type of documentation artifact accompanying a given model that details the model itself, its intended uses, potential limitations, training parameters, datasets used, experimental information, and model evaluation results. (Definition from Hugging Face)

Multistakeholder process:
A process which aims to bring together the primary stakeholders such as businesses, civil society, governments, research institutions and non-government organizations to cooperate and participate in the dialogue, decision-making and implementation of solutions to common problems or goals.

NLP:
Natural Language Processing, or NLP for short, is broadly defined as the automatic manipulation of natural language, like speech and text, by software.

Objective and Key Result (OKR):
A collaborative goal-setting tool used by teams and individuals to set challenging, ambitious goals with measurable results. OKRs are how you track progress, create alignment, and encourage engagement around measurable goals

Reidentification risk:
The practice of matching anonymous data (also known as de-identified data) with publicly available information, or auxiliary data, in order to discover the individual to which the data belong to. This is a concern because companies with privacy policies, health care providers, and financial institutions may release the data they collect after the data has gone through the de-identification process.

Reproducibility:
The extent to which consistent results are obtained when an experiment is repeated.

Trade secrets:
Defined by the United States Patent and Trademark Office as information that has either actual or potential independent economic value by virtue of not being generally known, has value to others who cannot legitimately obtain the information, and is subject to reasonable efforts to maintain its secrecy.

Section 0: How to Use this Document

Section 1: Project Overview

1.1 Statement of Importance for ABOUT ML Project

1.1.0 Importance of Transparency: Why a Company Motivated by the Bottom Line Should Adopt ABOUT ML Recommendations

1.1.1 About This Document and Version Numbering

1.1.2 ABOUT ML Goals and Plan

1.1.3 ABOUT ML Project Process and Timeline Overview

1.1.4 Who Is This Project For?

1.1.4.1 Audiences for the ABOUT ML Resources

1.1.4.2 Stakeholders That Should Be Consulted While Putting Together ABOUT ML Resources

1.1.4.3 Audiences for ABOUT ML Documentation Artifacts

1.1.4.4 Whose Voices Are Currently Reflected in ABOUT ML?

1.1.4.5 Origin Story

Section 2: Literature Review (Current Recommendations on Documentation for Transparency in the ML Lifecycle)

2.1 Demand for Transparency and AI Ethics in ML Systems

2.2 Documentation to Operationalize AI Ethics Goals

2.2.1 Documentation as a Process in the ML Lifecycle

2.2.2 Key Process Considerations for Documentation

2.3 Research Themes on Documentation for Transparency

2.3.1 System Design and Set Up

2.3.2 System Development

2.3.3 System Deployment

Section 3: Preliminary Synthesized Documentation Suggestions

3.4.1 Suggested Documentation Sections for Datasets

3.4.1.1 Data Specification

3.4.1.1.1 Motivation

3.4.1.2 Data Curation

3.4.1.2.1 Collection

3.4.1.2.2 Processing

3.4.1.2.3 Composition

3.4.1.2.4 Types and Sources of Judgement Calls

3.4.1.3 Data Integration

3.4.1.3.1 Use

3.4.1.3.2 Distribution

3.4.1.4 Maintenance

3.4.2 Suggested Documentation Sections for Models

3.4.2.1 Model Specifications

3.4.2.2 Model Training

3.4.2.3 Evaluation

3.4.2.4 Model Integration

3.4.2.5 Maintenance

Section 4: Current Challenges of Implementing Documentation

Section 5: Conclusions

Version 0

Version 1

Appendix A: Compiled List of Documentation Questions

Fact Sheets (Arnold et al. 2018)

Data Sheets (Gebru et al. 2018)

Model Cards (Mitchell et al. 2018)

A “Nutrition Label” for Privacy (Kelley et al. 2009)

The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards (Holland et al. 2019)

Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science (Bender and Friedman 2018)

ABOUT ML Reference Document

Appendix C: Glossary

Appendix C: Glossary

ABOUT ML Reference Document

Section 0: How to Use this Document

Section 1: Project Overview

Section 2: Literature Review (Current Recommendations on Documentation for Transparency in the ML Lifecycle)

Section 3: Preliminary Synthesized Documentation Suggestions

Section 4: Current Challenges of Implementing Documentation

Section 5: Conclusions

Appendix A: Compiled List of Documentation Questions

Appendix B: Diverse Voices Process and Artifacts

Appendix C: Glossary

Sources Cited