ABOUT ML Reference Document

Last Updated

3.4.2 Suggested Documentation Sections for Models

3.4.2 Suggested Documentation Sections for Models

Machine learning models use statistical techniques to make predictions based on known inputs.Momin M. Malik. (2019). Can algorithms themselves be biased? Medium. https://medium.com/berkman-klein-center/can-algorithms-themselves-be-biased-cffecbf2302c They are incorporated into many real-world systems and business processes where prediction and estimation is valuable.

Model transparency is important because ML models are used in making decisions, and a society can only be accountable and fair if the decision-making within it is understandable, accountable, and fair. Clarifying the basis of a recommendation helps in achieving these objectives. When people know that the models they are designing will be accountable and understandable, they have strong reasons to aim at fairness.

Model documentation becomes even more important as machine learning gets incorporated into systems making high-stakes decisions. For example, some states in the US are implementing ML-based risk assessment tools in the criminal justice system. From a societal perspective, it is important that any products with so much potential impact on individual well-being are accountable to the people they impact, so it is particularly untenable for these products to remain wholly “black boxes.” Other high-stakes applications of machine learning include models that determine the distribution of public benefits, models used in the healthcare industry that impact consumer premiums under risk-based payment models, or facial recognition models used by law enforcement.

The documentation steps outlined in this section apply to models built on static data, which does not change after being recorded, using various methods including supervised learning, unsupervised learning, and reinforcement learning. While these models are also relevant, at this time, the guidelines below are less applicable to models that use streaming data, such as online learning models, where the dataset or metrics are dynamically changing.

It is very important to tailor the documentation to meet the specific goal of disclosing model-related information, including considering the most relevant audiences for achieving that goal. If the key audience is end users of a consumer-facing product, the level of disclosure should be less technical to avoid overwhelming the users. In particular, companies should avoid making disclosures so complicated that they reach a similar status as Terms of Service  (ToS) documents, which unfortunately can be so cumbersome that they serve only to protect institutions rather than inform or help the users. Policymakers and advocacy groups can play a role in ensuring that transparency disclosures do not evolve in that direction. In contrast, if the largest audience for a set of ML documentation is other developers at the same company, the disclosures can be much more technical and detailed. Of course, various details differ depending on the audience and context of use — one of the goals of later establishing best practices is to outline the requirements and expectations for transparent documentation in various common scenarios. For example, a non-technical one-pager may be suitable for the average consumer but is insufficient as an auditable document for policymakers and advocacy groups in high-stakes contexts.

Internal disclosures can be helpful to allow developers from the same organization to learn from each other’s work. That said, internal disclosures should be careful to avoid legitimizing or spreading bad practices. The company should work independently to set and enforce high standards for models by making sure to provide enough human and capital resources to support the integration of transparency practices.

A common theme throughout this section is the importance of ensuring that the model disclosures do not create security or IP risks. Depending on what information about the model is disclosed and whether the documentation is for internal vs. external consumption, there might be concerns that malicious actors might use this information to attack the system more effectively or that the company’s trade secret protections might be compromised.

Once a measurement becomes a target, it is no longer a good measurement.

Finally, developers should be wary of Goodhart’s Law when making model-related disclosures. Goodhart’s Law suggests that once a measurement becomes a target, it is no longer a good measurement. In this context, the worry would be that disclosing the details of the model might incentivize individuals to game the system by adjusting their actions to achieve their desired outcome. For example, Goodhart’s Law has been observed in current academic publishing practices, with researchers gaming metrics intended to measure academic publishing success by increasing the number of self-citations, slicing studies into the smallest quantum acceptable for publication, and indexing false papers.Fire, Michael, and Carlos Guestrin (2019). “Over-Optimization of Academic Publishing Metrics: Observing Goodhart’s Law in Action.” GigaScience 8 (giz053). https://doi.org/10.1093/gigascience/giz053.

Another problematic unintended consequence would be if companies hid key information by disclosing a high volume of less crucial information, which highlights the importance of looking at ML documentation as a process to follow which aims to prompt deep reflection about the impact of products that include ML models where documentation artifacts are a byproduct, rather than documentation for the sake of being able to claim that documentation was created.

3.4.2.1 Model Specifications

Specifications

We borrow from Vogelsang and Borg (2019) and note that model specifications can include information about:

  • Quantitative targets
  • Data requirements
  • Explainability
  • Freedom from discrimination
  • Legal and regulatory constraints
  • Quality requirements
3.4.2.1 Model Specifications

This section assumes that the intention for building the model has been documented earlier in the process, including task and system specification. There are three subjects to consider in specification:

  1. about building models,
  2. about evaluating models, and
  3. extra specifications for models used in high-stakes or high-risk scenarios.Vogelsang, A., & Borg, M. (2019, September). Requirements engineering for machine learning: Perspectives from data scientists. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW) (pp. 245-251). IEEE

Within building models, key questions to document include the choice of structure (e.g., features, architecture, pretrained embeddings and other complex inputs), choice of output structure, choice of loss function and regularization, where random seeds come from and where they are saved, hyperparameters, optimization algorithm, and generalizability measured by how much difference in test they expect to see.

Generalizability

Generalization usually refers to the ability of an algorithm to be effective across a range of inputs and applications. It is related to repeatability in that we expect a consistent outcome based on the inputs.

To create good predictive models in machine learning that are capable of generalizing, one needs to know when to stop training the model so that it doesn’t overfit.

For evaluating models, it is key to discuss what kind of tests the model developer does regarding output, how the developer plans to identify and mitigate sampling bias (e.g., using a second source of truth to mitigate selection bias via reweighting), and how to evaluate model performance on real-world data relative to test set (what is the threshold of acceptable and what kind of use cases should be disallowed based on results).

If the use case involves high stakes for affected parties, it is essential to ensure and document that the choice of output structure and loss function appropriately encode and convey uncertainty both about predictions and across possible system goals.Eckersley, P. (2018). Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064.

Pros/Cons

The benefits of documenting these details of model specification include reproducibility, spotting potential failure modes, and helping people choose between models for different use cases. There are potential security risks with revealing certain types of information. Proactive communication and thoroughly explaining the severity of risk across the spectrum of documentation and sharing the risk mitigation plan may help to alleviate these concerns. The risk of revealing “trade secrets” applies more to black box models, as disclosing some of these specifications may make it easier for others to reverse-engineer the model and to thus obtain information that a company considers trade secret. Explorations related to the following research questions could uncover insights into barriers to implementation along with mitigation strategies to overcome those barriers.

Sample Documentation Questions
  • What is the intended use of the service (model) output? (Arnold et al. 2018)
  • Primary intended uses
  • Primary intended users
  • Out-of-scope and under-represented use cases
  • What algorithms or techniques does this service implement? (Arnold et al. 2018)
  • Model Details. Basic information about the model. (Mitchell et al. 2018)
  • Person or organization developing model and contact information
  • Model date
  • Model version
  • Model type
  • Information about training algorithms, parameters, fairness constraints or other applied approaches, and features
  • Paper or other resource for more information
  • Citation details
  • License

3.4.2.2 Model Training

3.4.2.2 Model Training

The focus of this stage in the ML lifecycle is on sharing how the model was architected and trained and the process that was used for debugging.

Choices of ML model architectures have numerous consequences that are relevant to downstream users so it is essential to document both the choices and the rationales behind them. Did the designers choose a random forest, recurrent network, or convolutional network, and why? What was the capacity of the model, how does it line up with the dataset size, and what are the risks of overfitting? What was being optimized for, and what regularization terms and methods were used?

Some particular considerations may apply to architectures for models that will be used for high-stakes purposes: the wrong choice of optimization function or prediction objective can create significant risks of unintended consequences in deployment. In general, sufficiently high-stakes ML systems should produce outputs that are explicitly uncertain both about predictionPartnership on AI. Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System, Requirement 5. and across different competing specifications of the system’s goals.Eckersley, P. (2018). Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064.https://arxiv.org/abs/1901.00064

A separate datasheet should be attached to all datasets used in this process, likely to include training data and validation data used while adjusting the model. If federated learning or other cryptographic privacy techniques are used in the model, the datasheet may need to be adapted accordingly. Key questions for the validation data include how closely the data match real-world distributions, whether relevant subpopulations are sufficiently represented in the data, and whether the validation set was a hold-out set or if there was an effort made to be more representative of the real-world data distribution. Additionally, documentation should note any preprocessing steps taken, such as calibration corrections.

Another option is to add a link to the source code, which is again more likely for academic and open source models than for industry/commercial models. It is important to document the version of all libraries used, github links, machine types, and hyperparameters involved in training. This increases reproducibility and helps future users of the model debug in case of difficulty. For very large datasets, sharing information about the compute platform and rationale behind hardware choices also helps future researchers and model developers to contextualize the model. Lastly, it is highly valuable to disclose how long the model took to train and with what magnitude of compute resources as this allows future researchers to understand what level of resourcing a similar model would require to build.

Pros/Cons

As mentioned above, much of the documentation in this section is for the purpose of allowing other parties to build similar models, increasing reproducibility. Information on compute and hardware resources used also gives researchers the ability to judge how accessible the model is.

Debugging is the other large benefit of such robust documentation. For example, if a model has 94% percent accuracy in training, but 87% in test, knowing the original settings allows evaluators to identify whether this difference in performance comes for different settings or from other factors. Any evaluation of performance, though, needs to keep in mind that test performance will always be worse than the training performance. Combining model documentation with datasheets for the training data gives evaluators information to rule out performance changes due to changes in data or parameters. The evaluators can be internal stakeholders from testing teams or external stakeholders such as customers who purchase the model for deployment in their business processes.

Finally, this information builds trust between research labs, the general public, and policymakers, as each party gains insight into how otherwise “black box” models were constructed. It also informs and educates the public on typical practices which can be important for later reputational considerations, ex ante regulation, or common law concepts of reasonableness.

Documentation for models can be both highly technical and lengthy, which runs into readability risks. It is important to present the information in a reader-friendly manner to ensure the hard work of documentation yields the benefits outlined above and to prevent creating burdensome documentation that pushes the work unnecessarily onto users and consumers of the model. Explorations related to the following research questions could uncover insights into barriers to implementation along with mitigation strategies to overcome those barriers and produce checkpoints for testing impacts to certain demographics.

Sample Documentation Questions
  • What training data is used? May not be possible to provide in practice. When possible, this section should mirror evaluation data. If such detail is not possible, minimal allowable information should be provided here, such as details of the distribution over various factors in the training datasets. (Mitchell et al. 2018)
  • What type of algorithm is used to train the model? What are the details of the algorithm’s architecture? (e.g., a ResNet neural net). Include a diagram if possible

ABOUT ML Reference Document

Section 0: How to Use this Document

Recommended Reading Plan

Quick Guides

How We Define

Contact for Support

Section 1: Project Overview

1.1 Statement of Importance for ABOUT ML Project

1.1.0 Importance of Transparency: Why a Company Motivated by the Bottom Line Should Adopt ABOUT ML Recommendations

1.1.1 About This Document and Version Numbering

1.1.2 ABOUT ML Goals and Plan

1.1.3 ABOUT ML Project Process and Timeline Overview

1.1.4 Who Is This Project For?

1.1.4.1 Audiences for the ABOUT ML Resources

1.1.4.2 Stakeholders That Should Be Consulted While Putting Together ABOUT ML Resources

1.1.4.3 Audiences for ABOUT ML Documentation Artifacts

1.1.4.4 Whose Voices Are Currently Reflected in ABOUT ML?

1.1.4.5 Origin Story

Section 2: Literature Review (Current Recommendations on Documentation for Transparency in the ML Lifecycle)

2.1 Demand for Transparency and AI Ethics in ML Systems 

2.2 Documentation to Operationalize AI Ethics Goals

2.2.1 Documentation as a Process in the ML Lifecycle

2.2.2 Key Process Considerations for Documentation

2.3 Research Themes on Documentation for Transparency 

2.3.1 System Design and Set Up

2.3.2 System Development

2.3.3 System Deployment

Section 3: Preliminary Synthesized Documentation Suggestions

3.4.1 Suggested Documentation Sections for Datasets

3.4.1.1 Data Specification

3.4.1.1.1 Motivation

3.4.1.2 Data Curation 

3.4.1.2.1 Collection

3.4.1.2.2 Processing

3.4.1.2.3 Composition

3.4.1.2.4 Types and Sources of Judgement Calls

3.4.1.3 Data Integration

3.4.1.3.1 Use

3.4.1.3.2 Distribution

3.4.1.4 Maintenance

3.4.2 Suggested Documentation Sections for Models

3.4.2.1 Model Specifications

3.4.2.2 Model Training

3.4.2.3 Evaluation

3.4.2.4 Model Integration

3.4.2.5 Maintenance

Section 4: Current Challenges of Implementing Documentation

Section 5: Conclusions

Version 0

Version 1

Appendix A: Compiled List of Documentation Questions 

Fact Sheets (Arnold et al. 2018)

Data Sheets (Gebru et al. 2018)

Model Cards (Mitchell et al. 2018)

A “Nutrition Label” for Privacy (Kelley et al. 2009)

The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards (Holland et al. 2019)

Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science (Bender and Friedman 2018)

Appendix B: Diverse Voices Process and Artifacts

Procurement Recruitment Email

Procurement Confirmation Email 

Appendix C: Glossary

Sources Cited

  1. Holstein, K., Vaughan, J.W., Daumé, H., Dudík, M., u0026amp; Wallach, H.M. (2018). Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? CHI.
  2. Young, M., Magassa, L. and Friedman, B. (2019) Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents. Ethics and Information Technology 21(2), 89-103.
  3. World Wide Web Consortium Process Document (W3C) process outlined here: https://www.w3.org/2019/Process-20190301/
  4. Internet Engineering Task Force (IETF) process outlined here: https://www.ietf.org/standards/process/
  5. The Web Hypertext Application Technology Working Group (WHATWG) process outlined here: https://whatwg.org/faq#process
  6. Oever, N., Moriarty, K. The Tao of IETF: A novice's guide to the Internet Engineering Task Force. https://www.ietf.org/about/participate/tao/.
  7. Young, M., Magassa, L. and Friedman, B. (2019) Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents. Ethics and Information Technology 21(2), 89-103.
  8. Friedman, B, Kahn, Peter H., and Borning, A., (2008) Value sensitive design and information systems. In Kenneth Einar Himma and Herman T. Tavani (Eds.) The Handbook of Information and Computer Ethics., (pp. 70-100) John Wiley u0026amp; Sons, Inc. http://jgustilo.pbworks.com/f/the-handbook-of-information-and-computer-ethics.pdf#page=104; Davis, J., and P. Nathan, L. (2015). Value sensitive design: applications, adaptations, and critiques. Handbook of Ethics, Values, and Technological Design: Sources, Theory, Values and Application Domains. (pp. 11-40) DOI: 10.1007/978-94-007-6970-0_3. https://www.researchgate.net/publication/283744306_Value_Sensitive_Design_Applications_Adaptations_and_Critiques; Borning, A. and Muller, M. (2012). Next steps for value sensitive design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12). (pp 1125-1134) DOI: https://doi.org/10.1145/2207676.2208560 https://dl.acm.org/citation.cfm?id=2208560
  9. Pichai, S., (2018). AI at Google: our principles. The Keyword. https://www.blog.google/technology/ai/ai-principles/; IBM’s Principles for Trust and Transparency. IBM Policy. https://www.ibm.com/blogs/policy/trust-principles/; Microsoft AI principles. Microsoft. https://www.microsoft.com/en-us/ai/our-approach-to-ai; Ethically Aligned Design – Version II. IEEE. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead_v2.pdf
  10. Zeng, Y., Lu, E., and Huangfu, C. (2018) Linking artificial intelligence principles. CoRR https://arxiv.org/abs/1812.04814.
  11. essica Fjeld, Hannah Hilligoss, Nele Achten, Maia Levy Daniel, Sally Kagay, and Joshua Feldman, (2018). Principled artificial intelligence - a map of ethical and rights based approaches, Berkman Center for Internet and Society, https://ai-hr.cyber.harvard.edu/primp-viz.html
  12. Jobin, A., Ienca, M., u0026amp; Vayena, E. (2019). Artificial Intelligence: the global landscape of ethics guidelines. arXiv preprint arXiv:1906.11668. https://arxiv.org/pdf/1906.11668.pdf
  13. Jobin, A., Ienca, M., u0026amp; Vayena, E. (2019). Artificial Intelligence: the global landscape of ethics guidelines. arXiv preprint arXiv:1906.11668. https://arxiv.org/pdf/1906.11668.pdf
  14. Ananny, M., and Kate Crawford (2018). Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. New Media and Society 20 (3): 973-989.
  15. Whittlestone, J., Nyrup, R., Alexandrova, A., u0026amp; Cave, S. (2019, January). The Role and Limits of Principles in AI Ethics: Towards a Focus on Tensions. In Proceedings of the AAAI/ACM Conference on AI Ethics and Society, Honolulu, HI, USA (pp. 27-28). http://www.aies-conference.com/wp-content/papers/main/AIES-19_paper_188.pdf; Mittelstadt, B. (2019). AI Ethics–Too Principled to Fail? https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3391293
  16. Greene, D., Hoffmann, A. L., u0026amp; Stark, L. (2019, January). Better, nicer, clearer, fairer: A critical assessment of the movement for ethical artificial intelligence and machine learning. In Proceedings of the 52nd Hawaii International Conference on System Sciences. https://scholarspace.manoa.hawaii.edu/handle/10125/59651
  17. Raji, I. D., u0026amp; Buolamwini, J. (2019). Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products. In AAAI/ACM Conf. on AI Ethics and Society (Vol. 1). https://www.media.mit.edu/publications/actionable-auditing-investigating-the-impact-of-publicly-naming-biased-performance-results-of-commercial-ai-products/
  18. Algorithmic Impact Assessment (2019) Government of Canada https://www.canada.ca/en/government/system/digital-government/modern-emerging-technologies/responsible-use-ai/algorithmic-impact-assessment.html
  19. Benjamin, M., Gagnon, P., Rostamzadeh, N., Pal, C., Bengio, Y., u0026amp; Shee, A. (2019). Towards Standardization of Data Licenses: The Montreal Data License. arXiv preprint arXiv:1903.12262. https://arxiv.org/abs/1903.12262; Responsible AI Licenses v0.1. RAIL: Responsible AI Licenses. https://www.licenses.ai/ai-licenses
  20. See Citation 5
  21. Safe Face Pledge. https://www.safefacepledge.org/; Montreal Declaration on Responsible AI. Universite de Montreal. https://www.montrealdeclaration-responsibleai.com/; The Toronto Declaration: Protecting the right to equality and non-discrimination in machine learning systems. (2018). Amnesty International and Access Now. https://www.accessnow.org/cms/assets/uploads/2018/08/The-Toronto-Declaration_ENG_08-2018.pdf ; Dagsthul Declaration on the application of machine learning and artificial intelligence for social good. https://www.dagstuhl.de/fileadmin/redaktion/Programm/Seminar/19082/Declaration/Declaration.pdf
  22. Dobbe, R., Dean, S., Gilbert, T., u0026amp; Kohli, N. (2018). A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics. https://arxiv.org/pdf/1807.00553.pdf
  23. Wagstaff, K. (2012). Machine learning that matters. https://arxiv.org/pdf/1206.4656.pdf ; Friedman, B., Kahn, P. H., Borning, A., u0026amp; Huldtgren, A. (2013). Value sensitive design and information systems. In Early engagement and new technologies: Opening up the laboratory (pp. 55-95). Springer, Dordrecht. https://vsdesign.org/publications/pdf/non-scan-vsd-and-information-systems.pdf
  24. Dobbe, R., Dean, S., Gilbert, T., u0026amp; Kohli, N. (2018). A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics. https://arxiv.org/pdf/1807.00553.pdf
  25. Safe Face Pledge. https://www.safefacepledge.org/
  26. Montreal Declaration on Responsible AI. Universite de Montreal. https://www.montrealdeclaration-responsibleai.com/
  27. Diverse Voices How To Guide. Tech Policy Lab, University of Washington. https://techpolicylab.uw.edu/project/diverse-voices/
  28. Bender, E. M., u0026amp; Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604.
  29. Ethically Aligned Design – Version II. IEEE. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead_v2.pdf
  30. Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumeé III, H., u0026amp; Crawford, K. (2018). Datasheets for datasets. https://arxiv.org/abs/1803.09010 https://arxiv.org/abs/1803.09010; Hazard Communication Standard: Safety Data Sheets. Occupational Safety and Health Administration, US Department of Labor. https://www.osha.gov/Publications/OSHA3514.html
  31. Holland, S., Hosny, A., Newman, S., Joseph, J., u0026amp; Chmielinski, K. (2018). The dataset nutrition label: A framework to drive higher data quality standards. https://arxiv.org/abs/1805.03677; Kelley, P. G., Bresee, J., Cranor, L. F., u0026amp; Reeder, R. W. (2009). A nutrition label for privacy. In Proceedings of the 5th Symposium on Usable Privacy and Security (p. 4). ACM. http://cups.cs.cmu.edu/soups/2009/proceedings/a4-kelley.pdf
  32. Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., ... u0026amp; Gebru, T. (2019, January). Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 220-229). ACM. https://arxiv.org/abs/1810.03993
  33. Hind, M., Mehta, S., Mojsilovic, A., Nair, R., Ramamurthy, K. N., Olteanu, A., u0026amp; Varshney, K. R. (2018). Increasing Trust in AI Services through Supplier's Declarations of Conformity. https://arxiv.org/abs/1808.07261
  34. Veale M., Van Kleek M., u0026amp; Binns R. (2018) ‘Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making’ in Proceedings of the ACM Conference on Human Factors in Computing Systems, CHI 2018. https://arxiv.org/abs/1802.01029.
  35. Benjamin, M., Gagnon, P., Rostamzadeh, N., Pal, C., Bengio, Y., u0026amp; Shee, A. (2019). Towards Standardization of Data Licenses: The Montreal Data License. https://arxiv.org/abs/1903.12262
  36. Cooper, D. M. (2013, April). A Licensing Approach to Regulation of Open Robotics. In Paper for presentation for We Robot: Getting down to business conference, Stanford Law School.
  37. Responsible AI Practices. Google AI. https://ai.google/education/responsible-ai-practices
  38. Everyday Ethics for Artificial Intelligence. (2019). IBM. https://www.ibm.com/watson/assets/duo/pdf/everydayethics.pdf
  39. Federal Trade Commission. (2012). Best Practices for Common Uses of Facial Recognition Technologies (Staff Report). Federal Trade Commission, 30. https://www.ftc.gov/sites/default/files/documents/reports/facing-facts-best-practices-common-uses-facial-recognition-technologies/121022facialtechrpt.pdf
  40. Microsoft (2018). Responsible bots: 10 guidelines for developers of conversational AI. https://www.microsoft.com/en-us/research/uploads/prod/2018/11/Bot_Guidelines_Nov_2018.pdf
  41. Tramer, F., Atlidakis, V., Geambasu, R., Hsu, D., Hubaux, J. P., Humbert, M., ... u0026amp; Lin, H. (2017, April). FairTest: Discovering unwarranted associations in data-driven applications. In 2017 IEEE European Symposium on Security and Privacy (EuroSu0026amp;P) (pp. 401-416). IEEE. https://github.com/columbia/fairtest, https://www.mhumbert.com/publications/eurosp17.pdf
  42. Kishore Durg (2018). Testing AI: Teach and Test to raise responsible AI. Accenture Technology Blog. https://www.accenture.com/us-en/insights/technology/testing-AI
  43. Kush R. Varshney (2018). Introducing AI Fairness 360. IBM Research Blog. https://www.ibm.com/blogs/research/2018/09/ai-fairness-360/
  44. Dave Gershgorn (2018). Facebook says it has a tool to detect bias in its artificial intelligence. Quartz. https://qz.com/1268520/facebook-says-it-has-a-tool-to-detect-bias-in-its-artificial-intelligence/
  45. James Wexler. (2018) The What-If Tool: Code-Free Probing of Machine Learning Models. Google AI Blog. https://ai.googleblog.com/2018/09/the-what-if-tool-code-free-probing-of.html
  46. Miro Dudík, John Langford, Hanna Wallach, and Alekh Agarwal (2018). Machine Learning for fair decisions. Microsoft Research Blog. https://www.microsoft.com/en-us/research/blog/machine-learning-for-fair-decisions/
  47. Veale, M., Binns, R., u0026amp; Edwards, L. (2018). Algorithms that Remember: Model Inversion Attacks and Data Protection Law. Phil. Trans. R. Soc. A, 376, 20180083. https://doi.org/10/gfc63m
  48. Floridi, L. (2010, February). Information: A Very Short Introduction.
  49. Data Information Specialists Committee UK, 2007. http://www.disc-uk.org/qanda.html.
  50. Harwell, Drew. “Federal Study Confirms Racial Bias of Many Facial-Recognition Systems, Casts Doubt on Their Expanding Use.” The Washington Post, WP Company, 21 Dec. 2019, www.washingtonpost.com/technology/2019/12/19/federal-study-confirms-racial-bias-many-facial-recognition-systems-casts-doubt-their-expanding-use/
  51. Hildebrandt, M. (2019) ‘Privacy as Protection of the Incomputable Self: From Agnostic to Agonistic Machine Learning’, Theoretical Inquiries in Law, 20(1) 83–121.
  52. D'Amour, A., Heller, K., Moldovan, D., Adlam, B., Alipanahi, B., Beutel, A., ... u0026amp; Sculley, D. (2020). Underspecification presents challenges for credibility in modern machine learning. arXiv preprint arXiv:2011.03395.
  53. Selinger, E. (2019). ‘Why You Can’t Really Consent to Facebook’s Facial Recognition’, One Zero. https://onezero.medium.com/why-you-cant-really-consent-to-facebook-s-facial-recognition-6bb94ea1dc8f
  54. Lum, K., u0026amp; Isaac, W. (2016). To predict and serve?. Significance, 13(5), 14-19. https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2016.00960.x
  55. LabelInsight (2016). “Drive Long-Term Trust u0026amp; Loyalty Through Transparency”. https://www.labelinsight.com/Transparency-ROI-Study
  56. Crawford and Paglen, https://www.excavating.ai/
  57. Geva, Mor u0026amp; Goldberg, Yoav u0026amp; Berant, Jonathan. (2019). Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets. https://arxiv.org/pdf/1908.07898.pdf
  58. Bender, E. M., u0026amp; Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604.
  59. Desmond U. Patton et al (2017).
  60. See Cynthia Dwork et al.,
  61. Katta Spiel, Oliver L. Haimson, and Danielle Lottridge. (2019). How to do better with gender on surveys: a guide for HCI researchers. Interactions. 26, 4 (June 2019), 62-65. DOI: https://doi.org/10.1145/3338283
  62. A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012
  63. Momin M. Malik. (2019). Can algorithms themselves be biased? Medium. https://medium.com/berkman-klein-center/can-algorithms-themselves-be-biased-cffecbf2302c
  64. Fire, Michael, and Carlos Guestrin (2019). “Over-Optimization of Academic Publishing Metrics: Observing Goodhart’s Law in Action.” GigaScience 8 (giz053). https://doi.org/10.1093/gigascience/giz053.
  65. Vogelsang, A., u0026amp; Borg, M. (2019, September). Requirements engineering for machine learning: Perspectives from data scientists. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW) (pp. 245-251). IEEE
  66. Eckersley, P. (2018). Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064.
  67. Partnership on AI. Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System, Requirement 5.
  68. Eckersley, P. (2018). Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064.https://arxiv.org/abs/1901.00064
  69. If it is not, there is likely a bug in the code. Checking a predictive model's performance on the training set cannot distinguish irreducible error (which comes from intrinsic variance of the system) from error introduced by bias and variance in the estimator; this is universal, and has nothing to do with different settings or
  70. Selbst, Andrew D. and Boyd, Danah and Friedler, Sorelle and Venkatasubramanian, Suresh and Vertesi, Janet (2018). “Fairness and Abstraction in Sociotechnical Systems”, ACM Conference on Fairness, Accountability, and Transparency (FAT*). https://ssrn.com/abstract=3265913
  71. Tools that can be used to explore and audit the predictive model fairness include FairML, Lime, IBM AI Fairness 360, SHAP, Google What-If Tool, and many others
  72. Wagstaff, K. (2012). Machine learning that matters. arXiv preprint arXiv:1206.4656. https://arxiv.org/abs/1206.4656
Table of Contents
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16