ABOUT ML Reference Document

Last Updated

3.4.1.2 Data Curation

3.4.1.2 Data Curation

3.4.1.2.1 Collection

3.4.1.2.1 Collection

Data Curation

We adopt Divya Singh’s definition from the article entitled “The Role of Data Curation in Big Data”:

“Curation is the end-to-end process of creating good data through the identification and formation of resources with long-term value. In information technology, it refers mainly to the management of data throughout its lifecycle, from creation and initial storage to the time when it is archived for future research and analysis, or becomes obsolete and is deleted.”

The process of data collection should be well-documented for end users of the system, purchasers of the system, as well as any collaborators contributing to the development of the overall ML system. Potential intellectual property issues about data provenance should be flagged at this stage, such as whether any third party could claim that their data was improperly included in this data set at any point in its history. When data is collected from human subjects, the documentation should include information about the consent and notification process or alternatively why consent was not necessary for this use of the personal data. For example, are the subjects aware of all the data being collected about them, are they able to opt out or opt in to the data collection, and have they been notified of the exact uses of that collected data? Decisions around what constitutes meaningful informed consent should consider what Nancy Kim refers to as “consentability.” Questions that ML practitioners should consider in their documentation include how aware the data subject is of what information will be collected about them, whether the data subject has clear alternatives they can exercise to not have their data collected, whether those choices can be exercised without fear of penalty, and whether a data subject can reasonably understand the downstream uses and effects of their data.Selinger, E. (2019). ‘Why You Can’t Really Consent to Facebook’s Facial Recognition’, One Zero. https://onezero.medium.com/why-you-cant-really-consent-to-facebook-s-facial-recognition-6bb94ea1dc8f In the highest risk use cases, teams should take pains to ensure data subjects are fully aware of the exact uses of their data.

Consent vs. Consentability

On page 7 of Kim (2019), the author notes that:

“Consent is distinct from consentability […] The first [meaning of consentability] involves possibility. An act which is consentable means it is possible for there to be consent given the nature of the proposed activity. The second meaning of consentability involves legality. An act which is consentable is (or under the right circumstances, can be) legal. The possibility of valid consent is essential to consentability but it is not sufficient.”

In addition, potential issues of sampling bias should be thoroughly evaluated and noted to the extent possible, recognizing that this is currently quite difficult in many domains. For example, studies have found that certain minority communities are disproportionately targeted by police for arrestLum, K., & Isaac, W. (2016). To predict and serve?. Significance, 13(5), 14-19. https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2016.00960.x which means that they are in effect being over-sampled in the data. As a result, arrest data would over-represent these communities even if they have similar crime rates to other communities.

Pros/Cons

These disclosures help users of the dataset assess potential issues of biased sampling and representation in a dataset. In addition, greater transparency around the collection process and whether the proper consent was obtained can give data subjects more assurance that their privacy is respected. These disclosures can also allow companies to indicate that they have complied with relevant data privacy laws. Companies that make this information available might see a reputational or competitive advantage as consumers might prefer using products built with models where the underlying data collection process is known because of strong consumer preference for transparency.LabelInsight (2016). “Drive Long-Term Trust & Loyalty Through Transparency”. https://www.labelinsight.com/Transparency-ROI-Study Documentation also provides a lever of internal accountability for internal teams to use as it creates a paper trail to detect or identify misused models or data. For a company, more detailed documentation could protect them from liability stemming from third-party misuse by clarifying the intended context of use. Finally, documenting the data collection process enhances replicability by demonstrating how a scientific conclusion was met, a core step in the process of scientific advancement and ML research.

Some potential negative effects of such disclosures, however, include possible legal, privacy, and intellectual property concerns depending on the level of granularity of the disclosures and whether any questionable practices were used for data collection. For example, documentation of the decision-making process for a dataset could be discoverable in potential litigation. This is an area that needs further research to understand the legal ramifications of information disclosure within ML documentation and whether existing or new policy can help to ensure that companies have incentive to share vital information that would be to the public benefit without incurring undue risk or harm. Explorations related to the following research questions could uncover insights into barriers to implementation along with mitigation strategies to overcome those barriers.

Sample Documentation Questions
  • What mechanisms or procedures were used to collect the data? What mechanisms or procedures were used to correct the data for sampling error? How were these mechanisms or procedures validated? (Gebru et al. 2018)
  • If the dataset is a sample from a larger set, what was the sampling strategy? (Gebru et al. 2018)
  • Who was involved in the data collection process, in what kind of working environment, and how were they compensated, if at all? (Gebru et al. 2018)
  • Over what timeframe was the data collected? (Gebru et al. 2018)
  • Who is the data being collected from? Would you be comfortable if that data was being collected from you and used for the intended purpose?
  • Does this data collection undermine individual autonomy or self-determination in any way?
  • How is the data being retained? How long will it be kept?
  • Both genre and topic influence the vocabulary and structural characteristics of texts (Biber 1995) and should be specified. Think of the nature of data sources and how that may affect whether or not the data is a suitable representation of the world. (Bender and Friedman 2018)

Readers are encouraged to explore Section 3.4.1.3.2 of this document to incorporate ethical considerations of consent into their data collection processes including important tenets of consent over time, derivative future use of consent, rescinding consent, and other topics.

3.4.1.2.2 Processing

3.4.1.2.2 Processing

Datasets

In a 2017 blog post titled “What is the Difference Between Test and Validation Datasets?,” Brownlee describes the distinctions between datasets as follows:

Training Dataset: The sample of data used to fit the model.

Validation Dataset: The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.

Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.”

Although data processing can seem like a very straight-forward task, there is significant potential for bias to enter the dataset through it. Datasets are “political interventions” that are not neutral or natural; the act of collecting, categorizing, and labeling data “is itself a form of politics filled with questions about who gets to decide what [data] mean.”Crawford and Paglen, https://www.excavating.ai/ Additionally, it is important to document what steps were taken to de-identify datasets pertaining to people and how that fits in with relevant policies.

There are many factors to document when human labeling is involved in the dataset creation process. Given that data processing often involves some degree of human labeling, it is important to be conscious of the assumptions or choices that must be made in labeling. Some of these choices will be influenced by cultural biases that ML practitioners and labelers may bring into their decision-making processes.Geva, Mor & Goldberg, Yoav & Berant, Jonathan. (2019). Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets. https://arxiv.org/pdf/1908.07898.pdf Disclosures should include information about the number of labelers used and an analysis of demographics of the labelers (e.g., languages spoken for NLP models) to help users gauge the potential for biased labeling or blindspots. There is both bias from the labelers themselves and bias from the choice of labels to include. For example, if sex is considered a binary variable, non-binary individuals are effectively unidentifiable in the data. Defining the taxonomy for the data is thus an important step in establishing the ground truth. In addition, ensuring inter-rater reliability is one important step to addressing the potential for bias from human labelers. Lastly, making labels transparent and auditable is a necessary step to facilitate debugging.

Other datasets collect labels and/or metadata in a more automatic manner. One method is by linking other data sources to collect labels (e.g., age for individuals in a face recognition dataset collected by scraping Wikipedia). In this case, the source of the label should be disclosed. Other datasets use models to predict labels (e.g., gender for individuals in a face recognition dataset by a face analysis model). In this case, a link to the documentation of the model should be provided. In addition, details of any audits assessing the model for bias in its predictions for intersectional groups should be noted.

Pros/Cons

Benefits of such disclosures include replicability and clarifying potential biases. Model developers using the dataset can better understand what they can or cannot do with the data, and any leaps in logic become more apparent. The transparency created by these disclosures can also encourage data collectors to ensure that their data labeling practices align with the original purposes they envisioned for the data.

Some potential downsides include the fact that there might be some privacy concerns for the labelers depending on how much information about them is disclosed. In addition, data cleaning and labeling can be a complex and multi-layered process, so accurately relaying the process can be difficult. Explorations related to the following research questions could uncover insights into barriers to implementation, examples of potential bias, and levels of stakeholder comfort regarding privacy.

Sample Documentation Questions
  • Was any preprocessing/cleaning/labeling of the data done? (Gebru et al. 2018)
  • Was the “raw” data saved in addition to the preprocessed/cleaned/labeled data (e.g., to support unanticipated future uses)? (Gebru et al. 2018)
  • Which data instances were filtered out of the raw dataset and why? What proportion of the “raw” dataset was filtered out during cleaning?
  • What are the demographic characteristics of the annotators and annotation guidelines given to developers? (Gebru et al. 2018)
  • What labeling guidelines were used? What instructions were given to the labelers?

3.4.1.2.3 Composition

3.4.1.2.3 Composition 

It is vital to make it clear to users what is in the dataset. This reflects the operationalization of the motivation section above. In addition to a list of the label annotations and metadata in the dataset, it is important to include information about how representative the dataset is and the potential for sampling bias or other forms of statistical bias. For example, in the context of natural language processing (NLP), it would be relevant to include information about the original language of the text,Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604. especially because label quality can vary greatly if annotators lack the context for making meaningful, accurate labels. In general, background on the demographics reflected in the data can be useful for users to assess potential bias issues.

Demographic Data

As noted in Bogen et. al. (2019), “[m]any machine learning fairness practitioners rely on awareness of sensitive attributes — that is, access to labeled data about people’s race, ethnicity, sex, or similar demographic characteristics — to test the efficacy of debiasing techniques or directly implement fairness interventions.”

Fairness, Transparency, and Accountability research within PAI is focused on examining the challenges organizations face around the collection and use of demographic data to help address algorithmic bias. Learn more

A key area to document is how developers are using your test, training, and validation data sets. It is important to ensure there is no overlap between the test, training, and validation subsets.

Pros/Cons

The benefits of making composition clear are that users can know what to expect from the dataset and how models trained on the data might perform in different domains. In disclosing composition, it is important for developers to refer back to their motivations for creating the dataset to ensure that the composition appropriately reflects those objectives.

Depending on the granularity of the description of composition, privacy could be an issue. As a general rule, developers should distinguish between what information is appropriate to share with whom and be very mindful of disclosing any metadata or labels that might make the dataset personally identifiable.

When including information on the demographic composition of the data, developers should keep in mind that demographic taxonomies are not well-defined. Developers can look to the fields of sociology and psychology for existing standards, but should be aware that some taxonomies might still be problematic in context. For example, a binary gender classification might not be appropriate.Katta Spiel, Oliver L. Haimson, and Danielle Lottridge. (2019). How to do better with gender on surveys: a guide for HCI researchers. Interactions. 26, 4 (June 2019), 62-65. DOI: https://doi.org/10.1145/3338283 In addition, it might not make sense to apply the American racial construct in another region’s cultural context.

Finally, there are still open research questions around both the definition of “representativeness” in datasets and what sample sizes and quality qualify datasets to be used for models that make decisions about subgroups. Representativeness depends on the context of the specific systems where the data is being used. Documentation should assist users with determining what the appropriate contexts are for use of the particular dataset. Explorations related to the following research questions could uncover insights into barriers to implementation along with mitigation strategies to overcome those barriers.

Sample Documentation Questions
  • What data does each instance consist of? (Gebru et al. 2018)
  • Is there a label or target associated with each instance? (Gebru et al. 2018)
  • Are there recommended data splits (e.g., training, development/validation, testing)? (Gebru et al. 2018)
  • Are there any errors, sources of noise, or redundancies in the dataset? (Gebru et al. 2018)
  • Detail source, author contact information, and version history. (Holland et al. 2019)
  • Ground truth correlations: linear correlations between a chosen variable in the dataset and variables from other datasets considered to be “ground truth.” (Holland et al. 2019)

3.4.1.2.4 Types and Sources of Judgement Calls

3.4.1.2.4 Types and Sources of Judgement Calls

In deciding what kinds of data to collect, how to collect it, and how to store it, it is important for teams to document major judgment calls that the team made. Judgement calls are often made when creating, correcting, annotating, striking out, weighing, or enriching data. The extent and nature of these human judgement calls should be made explicit, and the resources or faculties brought to bear to make those judgements should be made explicit as well. Judgment calls can include why the team chose to collect and process data in a particular way and the method for doing so. While it would be overly burdensome to document all decisions reached, research and product teams should determine what constitutes a major judgement call by consulting expected norms that exist in their field and determine how much their decisions deviate from those norms.

Common points of judgement include:

  • Study Design
    • Question Choice
    • Language Used
  • Data Collection
    • Source of Data
    • Subject Selection
  • Data Processing
    • Filtering and Exclusion
    • Bucketing and Thresholds
  • Application and Usage
    • Expansion of Context
    • Usage as a Proxy for Another Feature

ABOUT ML Reference Document

Section 0: How to Use this Document

Recommended Reading Plan

Quick Guides

How We Define

Contact for Support

Section 1: Project Overview

1.1 Statement of Importance for ABOUT ML Project

1.1.0 Importance of Transparency: Why a Company Motivated by the Bottom Line Should Adopt ABOUT ML Recommendations

1.1.1 About This Document and Version Numbering

1.1.2 ABOUT ML Goals and Plan

1.1.3 ABOUT ML Project Process and Timeline Overview

1.1.4 Who Is This Project For?

1.1.4.1 Audiences for the ABOUT ML Resources

1.1.4.2 Stakeholders That Should Be Consulted While Putting Together ABOUT ML Resources

1.1.4.3 Audiences for ABOUT ML Documentation Artifacts

1.1.4.4 Whose Voices Are Currently Reflected in ABOUT ML?

1.1.4.5 Origin Story

Section 2: Literature Review (Current Recommendations on Documentation for Transparency in the ML Lifecycle)

2.1 Demand for Transparency and AI Ethics in ML Systems 

2.2 Documentation to Operationalize AI Ethics Goals

2.2.1 Documentation as a Process in the ML Lifecycle

2.2.2 Key Process Considerations for Documentation

2.3 Research Themes on Documentation for Transparency 

2.3.1 System Design and Set Up

2.3.2 System Development

2.3.3 System Deployment

Section 3: Preliminary Synthesized Documentation Suggestions

3.4.1 Suggested Documentation Sections for Datasets

3.4.1.1 Data Specification

3.4.1.1.1 Motivation

3.4.1.2 Data Curation 

3.4.1.2.1 Collection

3.4.1.2.2 Processing

3.4.1.2.3 Composition

3.4.1.2.4 Types and Sources of Judgement Calls

3.4.1.3 Data Integration

3.4.1.3.1 Use

3.4.1.3.2 Distribution

3.4.1.4 Maintenance

3.4.2 Suggested Documentation Sections for Models

3.4.2.1 Model Specifications

3.4.2.2 Model Training

3.4.2.3 Evaluation

3.4.2.4 Model Integration

3.4.2.5 Maintenance

Section 4: Current Challenges of Implementing Documentation

Section 5: Conclusions

Version 0

Version 1

Appendix A: Compiled List of Documentation Questions 

Fact Sheets (Arnold et al. 2018)

Data Sheets (Gebru et al. 2018)

Model Cards (Mitchell et al. 2018)

A “Nutrition Label” for Privacy (Kelley et al. 2009)

The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards (Holland et al. 2019)

Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science (Bender and Friedman 2018)

Appendix B: Diverse Voices Process and Artifacts

Procurement Recruitment Email

Procurement Confirmation Email 

Appendix C: Glossary

Sources Cited

  1. Holstein, K., Vaughan, J.W., Daumé, H., Dudík, M., u0026amp; Wallach, H.M. (2018). Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? CHI.
  2. Young, M., Magassa, L. and Friedman, B. (2019) Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents. Ethics and Information Technology 21(2), 89-103.
  3. World Wide Web Consortium Process Document (W3C) process outlined here: https://www.w3.org/2019/Process-20190301/
  4. Internet Engineering Task Force (IETF) process outlined here: https://www.ietf.org/standards/process/
  5. The Web Hypertext Application Technology Working Group (WHATWG) process outlined here: https://whatwg.org/faq#process
  6. Oever, N., Moriarty, K. The Tao of IETF: A novice's guide to the Internet Engineering Task Force. https://www.ietf.org/about/participate/tao/.
  7. Young, M., Magassa, L. and Friedman, B. (2019) Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents. Ethics and Information Technology 21(2), 89-103.
  8. Friedman, B, Kahn, Peter H., and Borning, A., (2008) Value sensitive design and information systems. In Kenneth Einar Himma and Herman T. Tavani (Eds.) The Handbook of Information and Computer Ethics., (pp. 70-100) John Wiley u0026amp; Sons, Inc. http://jgustilo.pbworks.com/f/the-handbook-of-information-and-computer-ethics.pdf#page=104; Davis, J., and P. Nathan, L. (2015). Value sensitive design: applications, adaptations, and critiques. Handbook of Ethics, Values, and Technological Design: Sources, Theory, Values and Application Domains. (pp. 11-40) DOI: 10.1007/978-94-007-6970-0_3. https://www.researchgate.net/publication/283744306_Value_Sensitive_Design_Applications_Adaptations_and_Critiques; Borning, A. and Muller, M. (2012). Next steps for value sensitive design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12). (pp 1125-1134) DOI: https://doi.org/10.1145/2207676.2208560 https://dl.acm.org/citation.cfm?id=2208560
  9. Pichai, S., (2018). AI at Google: our principles. The Keyword. https://www.blog.google/technology/ai/ai-principles/; IBM’s Principles for Trust and Transparency. IBM Policy. https://www.ibm.com/blogs/policy/trust-principles/; Microsoft AI principles. Microsoft. https://www.microsoft.com/en-us/ai/our-approach-to-ai; Ethically Aligned Design – Version II. IEEE. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead_v2.pdf
  10. Zeng, Y., Lu, E., and Huangfu, C. (2018) Linking artificial intelligence principles. CoRR https://arxiv.org/abs/1812.04814.
  11. essica Fjeld, Hannah Hilligoss, Nele Achten, Maia Levy Daniel, Sally Kagay, and Joshua Feldman, (2018). Principled artificial intelligence - a map of ethical and rights based approaches, Berkman Center for Internet and Society, https://ai-hr.cyber.harvard.edu/primp-viz.html
  12. Jobin, A., Ienca, M., u0026amp; Vayena, E. (2019). Artificial Intelligence: the global landscape of ethics guidelines. arXiv preprint arXiv:1906.11668. https://arxiv.org/pdf/1906.11668.pdf
  13. Jobin, A., Ienca, M., u0026amp; Vayena, E. (2019). Artificial Intelligence: the global landscape of ethics guidelines. arXiv preprint arXiv:1906.11668. https://arxiv.org/pdf/1906.11668.pdf
  14. Ananny, M., and Kate Crawford (2018). Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. New Media and Society 20 (3): 973-989.
  15. Whittlestone, J., Nyrup, R., Alexandrova, A., u0026amp; Cave, S. (2019, January). The Role and Limits of Principles in AI Ethics: Towards a Focus on Tensions. In Proceedings of the AAAI/ACM Conference on AI Ethics and Society, Honolulu, HI, USA (pp. 27-28). http://www.aies-conference.com/wp-content/papers/main/AIES-19_paper_188.pdf; Mittelstadt, B. (2019). AI Ethics–Too Principled to Fail? https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3391293
  16. Greene, D., Hoffmann, A. L., u0026amp; Stark, L. (2019, January). Better, nicer, clearer, fairer: A critical assessment of the movement for ethical artificial intelligence and machine learning. In Proceedings of the 52nd Hawaii International Conference on System Sciences. https://scholarspace.manoa.hawaii.edu/handle/10125/59651
  17. Raji, I. D., u0026amp; Buolamwini, J. (2019). Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products. In AAAI/ACM Conf. on AI Ethics and Society (Vol. 1). https://www.media.mit.edu/publications/actionable-auditing-investigating-the-impact-of-publicly-naming-biased-performance-results-of-commercial-ai-products/
  18. Algorithmic Impact Assessment (2019) Government of Canada https://www.canada.ca/en/government/system/digital-government/modern-emerging-technologies/responsible-use-ai/algorithmic-impact-assessment.html
  19. Benjamin, M., Gagnon, P., Rostamzadeh, N., Pal, C., Bengio, Y., u0026amp; Shee, A. (2019). Towards Standardization of Data Licenses: The Montreal Data License. arXiv preprint arXiv:1903.12262. https://arxiv.org/abs/1903.12262; Responsible AI Licenses v0.1. RAIL: Responsible AI Licenses. https://www.licenses.ai/ai-licenses
  20. See Citation 5
  21. Safe Face Pledge. https://www.safefacepledge.org/; Montreal Declaration on Responsible AI. Universite de Montreal. https://www.montrealdeclaration-responsibleai.com/; The Toronto Declaration: Protecting the right to equality and non-discrimination in machine learning systems. (2018). Amnesty International and Access Now. https://www.accessnow.org/cms/assets/uploads/2018/08/The-Toronto-Declaration_ENG_08-2018.pdf ; Dagsthul Declaration on the application of machine learning and artificial intelligence for social good. https://www.dagstuhl.de/fileadmin/redaktion/Programm/Seminar/19082/Declaration/Declaration.pdf
  22. Dobbe, R., Dean, S., Gilbert, T., u0026amp; Kohli, N. (2018). A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics. https://arxiv.org/pdf/1807.00553.pdf
  23. Wagstaff, K. (2012). Machine learning that matters. https://arxiv.org/pdf/1206.4656.pdf ; Friedman, B., Kahn, P. H., Borning, A., u0026amp; Huldtgren, A. (2013). Value sensitive design and information systems. In Early engagement and new technologies: Opening up the laboratory (pp. 55-95). Springer, Dordrecht. https://vsdesign.org/publications/pdf/non-scan-vsd-and-information-systems.pdf
  24. Dobbe, R., Dean, S., Gilbert, T., u0026amp; Kohli, N. (2018). A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics. https://arxiv.org/pdf/1807.00553.pdf
  25. Safe Face Pledge. https://www.safefacepledge.org/
  26. Montreal Declaration on Responsible AI. Universite de Montreal. https://www.montrealdeclaration-responsibleai.com/
  27. Diverse Voices How To Guide. Tech Policy Lab, University of Washington. https://techpolicylab.uw.edu/project/diverse-voices/
  28. Bender, E. M., u0026amp; Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604.
  29. Ethically Aligned Design – Version II. IEEE. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead_v2.pdf
  30. Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumeé III, H., u0026amp; Crawford, K. (2018). Datasheets for datasets. https://arxiv.org/abs/1803.09010 https://arxiv.org/abs/1803.09010; Hazard Communication Standard: Safety Data Sheets. Occupational Safety and Health Administration, US Department of Labor. https://www.osha.gov/Publications/OSHA3514.html
  31. Holland, S., Hosny, A., Newman, S., Joseph, J., u0026amp; Chmielinski, K. (2018). The dataset nutrition label: A framework to drive higher data quality standards. https://arxiv.org/abs/1805.03677; Kelley, P. G., Bresee, J., Cranor, L. F., u0026amp; Reeder, R. W. (2009). A nutrition label for privacy. In Proceedings of the 5th Symposium on Usable Privacy and Security (p. 4). ACM. http://cups.cs.cmu.edu/soups/2009/proceedings/a4-kelley.pdf
  32. Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., ... u0026amp; Gebru, T. (2019, January). Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 220-229). ACM. https://arxiv.org/abs/1810.03993
  33. Hind, M., Mehta, S., Mojsilovic, A., Nair, R., Ramamurthy, K. N., Olteanu, A., u0026amp; Varshney, K. R. (2018). Increasing Trust in AI Services through Supplier's Declarations of Conformity. https://arxiv.org/abs/1808.07261
  34. Veale M., Van Kleek M., u0026amp; Binns R. (2018) ‘Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making’ in Proceedings of the ACM Conference on Human Factors in Computing Systems, CHI 2018. https://arxiv.org/abs/1802.01029.
  35. Benjamin, M., Gagnon, P., Rostamzadeh, N., Pal, C., Bengio, Y., u0026amp; Shee, A. (2019). Towards Standardization of Data Licenses: The Montreal Data License. https://arxiv.org/abs/1903.12262
  36. Cooper, D. M. (2013, April). A Licensing Approach to Regulation of Open Robotics. In Paper for presentation for We Robot: Getting down to business conference, Stanford Law School.
  37. Responsible AI Practices. Google AI. https://ai.google/education/responsible-ai-practices
  38. Everyday Ethics for Artificial Intelligence. (2019). IBM. https://www.ibm.com/watson/assets/duo/pdf/everydayethics.pdf
  39. Federal Trade Commission. (2012). Best Practices for Common Uses of Facial Recognition Technologies (Staff Report). Federal Trade Commission, 30. https://www.ftc.gov/sites/default/files/documents/reports/facing-facts-best-practices-common-uses-facial-recognition-technologies/121022facialtechrpt.pdf
  40. Microsoft (2018). Responsible bots: 10 guidelines for developers of conversational AI. https://www.microsoft.com/en-us/research/uploads/prod/2018/11/Bot_Guidelines_Nov_2018.pdf
  41. Tramer, F., Atlidakis, V., Geambasu, R., Hsu, D., Hubaux, J. P., Humbert, M., ... u0026amp; Lin, H. (2017, April). FairTest: Discovering unwarranted associations in data-driven applications. In 2017 IEEE European Symposium on Security and Privacy (EuroSu0026amp;P) (pp. 401-416). IEEE. https://github.com/columbia/fairtest, https://www.mhumbert.com/publications/eurosp17.pdf
  42. Kishore Durg (2018). Testing AI: Teach and Test to raise responsible AI. Accenture Technology Blog. https://www.accenture.com/us-en/insights/technology/testing-AI
  43. Kush R. Varshney (2018). Introducing AI Fairness 360. IBM Research Blog. https://www.ibm.com/blogs/research/2018/09/ai-fairness-360/
  44. Dave Gershgorn (2018). Facebook says it has a tool to detect bias in its artificial intelligence. Quartz. https://qz.com/1268520/facebook-says-it-has-a-tool-to-detect-bias-in-its-artificial-intelligence/
  45. James Wexler. (2018) The What-If Tool: Code-Free Probing of Machine Learning Models. Google AI Blog. https://ai.googleblog.com/2018/09/the-what-if-tool-code-free-probing-of.html
  46. Miro Dudík, John Langford, Hanna Wallach, and Alekh Agarwal (2018). Machine Learning for fair decisions. Microsoft Research Blog. https://www.microsoft.com/en-us/research/blog/machine-learning-for-fair-decisions/
  47. Veale, M., Binns, R., u0026amp; Edwards, L. (2018). Algorithms that Remember: Model Inversion Attacks and Data Protection Law. Phil. Trans. R. Soc. A, 376, 20180083. https://doi.org/10/gfc63m
  48. Floridi, L. (2010, February). Information: A Very Short Introduction.
  49. Data Information Specialists Committee UK, 2007. http://www.disc-uk.org/qanda.html.
  50. Harwell, Drew. “Federal Study Confirms Racial Bias of Many Facial-Recognition Systems, Casts Doubt on Their Expanding Use.” The Washington Post, WP Company, 21 Dec. 2019, www.washingtonpost.com/technology/2019/12/19/federal-study-confirms-racial-bias-many-facial-recognition-systems-casts-doubt-their-expanding-use/
  51. Hildebrandt, M. (2019) ‘Privacy as Protection of the Incomputable Self: From Agnostic to Agonistic Machine Learning’, Theoretical Inquiries in Law, 20(1) 83–121.
  52. D'Amour, A., Heller, K., Moldovan, D., Adlam, B., Alipanahi, B., Beutel, A., ... u0026amp; Sculley, D. (2020). Underspecification presents challenges for credibility in modern machine learning. arXiv preprint arXiv:2011.03395.
  53. Selinger, E. (2019). ‘Why You Can’t Really Consent to Facebook’s Facial Recognition’, One Zero. https://onezero.medium.com/why-you-cant-really-consent-to-facebook-s-facial-recognition-6bb94ea1dc8f
  54. Lum, K., u0026amp; Isaac, W. (2016). To predict and serve?. Significance, 13(5), 14-19. https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2016.00960.x
  55. LabelInsight (2016). “Drive Long-Term Trust u0026amp; Loyalty Through Transparency”. https://www.labelinsight.com/Transparency-ROI-Study
  56. Crawford and Paglen, https://www.excavating.ai/
  57. Geva, Mor u0026amp; Goldberg, Yoav u0026amp; Berant, Jonathan. (2019). Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets. https://arxiv.org/pdf/1908.07898.pdf
  58. Bender, E. M., u0026amp; Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604.
  59. Desmond U. Patton et al (2017).
  60. See Cynthia Dwork et al.,
  61. Katta Spiel, Oliver L. Haimson, and Danielle Lottridge. (2019). How to do better with gender on surveys: a guide for HCI researchers. Interactions. 26, 4 (June 2019), 62-65. DOI: https://doi.org/10.1145/3338283
  62. A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012
  63. Momin M. Malik. (2019). Can algorithms themselves be biased? Medium. https://medium.com/berkman-klein-center/can-algorithms-themselves-be-biased-cffecbf2302c
  64. Fire, Michael, and Carlos Guestrin (2019). “Over-Optimization of Academic Publishing Metrics: Observing Goodhart’s Law in Action.” GigaScience 8 (giz053). https://doi.org/10.1093/gigascience/giz053.
  65. Vogelsang, A., u0026amp; Borg, M. (2019, September). Requirements engineering for machine learning: Perspectives from data scientists. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW) (pp. 245-251). IEEE
  66. Eckersley, P. (2018). Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064.
  67. Partnership on AI. Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System, Requirement 5.
  68. Eckersley, P. (2018). Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064.https://arxiv.org/abs/1901.00064
  69. If it is not, there is likely a bug in the code. Checking a predictive model's performance on the training set cannot distinguish irreducible error (which comes from intrinsic variance of the system) from error introduced by bias and variance in the estimator; this is universal, and has nothing to do with different settings or
  70. Selbst, Andrew D. and Boyd, Danah and Friedler, Sorelle and Venkatasubramanian, Suresh and Vertesi, Janet (2018). “Fairness and Abstraction in Sociotechnical Systems”, ACM Conference on Fairness, Accountability, and Transparency (FAT*). https://ssrn.com/abstract=3265913
  71. Tools that can be used to explore and audit the predictive model fairness include FairML, Lime, IBM AI Fairness 360, SHAP, Google What-If Tool, and many others
  72. Wagstaff, K. (2012). Machine learning that matters. arXiv preprint arXiv:1206.4656. https://arxiv.org/abs/1206.4656
Table of Contents
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16