ABOUT ML Reference Document

Last Updated

2.3 Research Themes on Documentation for Transparency

2.3 Research Themes on Documentation for Transparency

There is substantial existing research on documentation for each of the steps outlined above. The following section provides a brief review of key insights from the current literature on three of the steps: System Design and Set Up, System Development, and System Deployment.

2.3.1 System Design and Set Up

2.3.1 System Design and Set Up

Minimizing harm resulting from ML systems is a major theme in recent transparency research. Adverse impacts can result from model fragility or intended or unintended misuse, which can result from applying an ML system in a context it was not designed for or using the system for a purpose it was not built for (among other possibilities). Transparent documentation, especially at the system design and set up phase, about how and why an ML system was built as well as inappropriate use contexts potentially reduces misuse by empowering builders, users, activists, policymakers, and other stakeholders with the information necessary to call out intended and unintended misuse. Positive progress is happening through efforts such as the “Safe Face Pledge”Safe Face Pledge. https://www.safefacepledge.org/ and the “Montreal Declaration on Responsible AI,”Montreal Declaration on Responsible AI. Universite de Montreal. https://www.montrealdeclaration-responsibleai.com/ which improve the design and set up of an ML system by outlining dangerous use cases for the deployment of AI services in sensitive contexts and gaining public commitment against AI misuse from corporations through a signed pledge. Documentation also allows more people to spot potential blind spots, contributing to more robust models that are less likely to create unintended harm.

Documenting system feedback mechanisms from the outset is also essential for minimizing harm to the intended users and impacted non-users, since explicit documentation can help surface when existing feedback mechanisms might not be sufficient (for example, if feedback systems do not formalize the inclusion of the perspectives of those most affected by the ML system, especially people from underrepresented communities or communities with limited socio-political power). Documenting feedback loops is a way to commit to the feedback process. The Diverse Voices methodDiverse Voices How To Guide. Tech Policy Lab, University of Washington. https://techpolicylab.uw.edu/project/diverse-voices/ from the Tech Policy Lab at the University of Washington is one way organizations can address this issue. The process involves identifying communities that will be highly impacted by the technology being considered, prioritizing based on which communities are least likely to be consulted by the developers of the technology, convening a panel of experiential experts from that community, asking for their feedback in a structured panel, incorporating that feedback into the design documents, and finally confirming with the panelists that their perspectives have been accurately reflected. This feedback loop should also be designed to surface and disseminate issues that may arise after initial deployment, which is when problems are often noticed.

2.3.2 System Development

2.3.2 System Development

A central theme of promoting greater transparency in system development is detailed reporting on defining characteristics and intended uses of the system. There are well-researched sets of documentation questions meant to prompt thoughtful reflection prior to building datasets as well as models, including for different types of applications like NLP,Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604. autonomous vehicles,Ethically Aligned Design – Version II. IEEE. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead_v2.pdf and other domains. These documentation templates are often modeled on those used in other industries, such as safety data sheets from the electronics industryGebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumeé III, H., & Crawford, K. (2018). Datasheets for datasets. https://arxiv.org/abs/1803.09010 https://arxiv.org/abs/1803.09010; Hazard Communication Standard: Safety Data Sheets. Occupational Safety and Health Administration, US Department of Labor. https://www.osha.gov/Publications/OSHA3514.html or nutrition labels from the food industry.Holland, S., Hosny, A., Newman, S., Joseph, J., & Chmielinski, K. (2018). The dataset nutrition label: A framework to drive higher data quality standards. https://arxiv.org/abs/1805.03677; Kelley, P. G., Bresee, J., Cranor, L. F., & Reeder, R. W. (2009). A nutrition label for privacy. In Proceedings of the 5th Symposium on Usable Privacy and Security (p. 4). ACM. http://cups.cs.cmu.edu/soups/2009/proceedings/a4-kelley.pdf These suggested templates vary widely in length and appearance, ranging from a single concise page of succinct statements, symbols, and visualizations to upwards of 10 pages of detailed prose and graphs. Whether the documentation is meant for internal or external consumption also impacts length and contents, as internal documentation can be more detailed and thus can be longer. Since teams explicitly declare intended goals for the project within all of these templates, they can create greater internal accountability as the ML project proceeds as the team can refer back to initial goals to ensure ongoing consistency with their declared intentions.

A common focus across data-related templates is on clarifying why the dataset is being created and explicitly stating its intended use and limitations. Documentation questions across papers also consistently address the risks that arise at various stages of data creation and distribution, with the goal of encouraging practitioners to reflect on ethical concerns at every stage preceding data use and release. Some templates focus more on addressing specific risks like privacy.

Interpretability

According to Lipton (2017), interpretability holds no agreed upon meaning. However, we see the benefit of interpreting “opaque models after-the-fact” and are comfortable using the post-hoc interpretation approach which includes “natural language explanations, visualization of learned representations or models, and explanations by example.”

Model- and system-level documentation efforts have since emerged from this earlier work on data documentation, introducing questions more specific to overall model objectives. This includes commentary on design decisions, such as model architecture and reporting on fair performance metrics,Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., … & Gebru, T. (2019, January). Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 220-229). ACM. https://arxiv.org/abs/1810.03993 as well as general “purpose, performance, safety, security, and provenance information to be completed by AI service providers for examination by consumers.”Hind, M., Mehta, S., Mojsilovic, A., Nair, R., Ramamurthy, K. N., Olteanu, A., & Varshney, K. R. (2018). Increasing Trust in AI Services through Supplier’s Declarations of Conformity. https://arxiv.org/abs/1808.07261 Determining organizationally acceptable rates of performance in advance of development can help guide trade-offs later on, such as those concerning interpretability of models or the inclusion of optional variables.Veale M., Van Kleek M., & Binns R. (2018) ‘Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making’ in Proceedings of the ACM Conference on Human Factors in Computing Systems, CHI 2018. https://arxiv.org/abs/1802.01029.

In addition to reporting for collaborative knowledge and potential auditing, recent work has also suggested that extending the role of documentation towards a legally binding contract similar to open software licenses may be appropriate for certain applications.Benjamin, M., Gagnon, P., Rostamzadeh, N., Pal, C., Bengio, Y., & Shee, A. (2019). Towards Standardization of Data Licenses: The Montreal Data License. https://arxiv.org/abs/1903.12262 This type of documentation could be onerous for research or during highly iterative development cycles, so any recommended implementation needs to be designed with these limitations in mind. Documentation could become a mechanism for restricting use, particularly in high-risk or high-impact scenarios out of scope of the dataset’s suitable context. Although initial steps have begun to study potential regulation of models and automation software,Cooper, D. M. (2013, April). A Licensing Approach to Regulation of Open Robotics. In Paper for presentation for We Robot: Getting down to business conference, Stanford Law School. most existing efforts focus on the promotion of best practices for model development rather than legally binding documentation. These efforts focus on broad recommendations for best practices for responsible machine learningResponsible AI Practices. Google AI. https://ai.google/education/responsible-ai-practices and ethicsEveryday Ethics for Artificial Intelligence. (2019). IBM. https://www.ibm.com/watson/assets/duo/pdf/everydayethics.pdf to guide ML practitioners on ethical considerations as they prepare the model for training and deployment. These guidelines also include procedural guidance and suggestions specific to particular use cases of concern, specifically facial recognitionFederal Trade Commission. (2012). Best Practices for Common Uses of Facial Recognition Technologies (Staff Report). Federal Trade Commission, 30. https://www.ftc.gov/sites/default/files/documents/reports/facing-facts-best-practices-common-uses-facial-recognition-technologies/121022facialtechrpt.pdf and chatbots.Microsoft (2018). Responsible bots: 10 guidelines for developers of conversational AI. https://www.microsoft.com/en-us/research/uploads/prod/2018/11/Bot_Guidelines_Nov_2018.pdf

2.3.3 System Deployment

2.3.3 System Deployment

The goal of documentation for system deployment is to write down the societally salient aspects of performance, including fairness, robustness, explicability, and other topics. Relevant and difficult-to-answer questions include what tests, monitoring, and evaluation have been done, and how does monitoring relate to social outcomes. This section of the documentation considers the ML system in the context where it will be used, so being explicit about the intended effects and plans to minimize side effects is important.

For example, if fairness is one of the stated objectives of the model, then a team can document how the model performs on one or more of the many different fairness tests developed by academia (such as FairTestTramer, F., Atlidakis, V., Geambasu, R., Hsu, D., Hubaux, J. P., Humbert, M., … & Lin, H. (2017, April). FairTest: Discovering unwarranted associations in data-driven applications. In 2017 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 401-416). IEEE. https://github.com/columbia/fairtest, https://www.mhumbert.com/publications/eurosp17.pdf), or by various companies such as AccentureKishore Durg (2018). Testing AI: Teach and Test to raise responsible AI. Accenture Technology Blog. https://www.accenture.com/us-en/insights/technology/testing-AI, IBMKush R. Varshney (2018). Introducing AI Fairness 360. IBM Research Blog. https://www.ibm.com/blogs/research/2018/09/ai-fairness-360/, FacebookDave Gershgorn (2018). Facebook says it has a tool to detect bias in its artificial intelligence. Quartz. https://qz.com/1268520/facebook-says-it-has-a-tool-to-detect-bias-in-its-artificial-intelligence/, GoogleJames Wexler. (2018) The What-If Tool: Code-Free Probing of Machine Learning Models. Google AI Blog. https://ai.googleblog.com/2018/09/the-what-if-tool-code-free-probing-of.html, and MicrosoftMiro Dudík, John Langford, Hanna Wallach, and Alekh Agarwal (2018). Machine Learning for fair decisions. Microsoft Research Blog. https://www.microsoft.com/en-us/research/blog/machine-learning-for-fair-decisions/ that have released open source toolkits. Although each of these toolkits remain grounded in statistical fairness definitions, some toolkits also emphasize the need for the qualitative documentation of the model’s performance. For instance, the What If Tool from Google heavily emphasizes enabling data visualizations to guide the practitioner’s judgment on data diversity, and the Accenture toolkit involves a survey of high-level as well as detailed questions to consider before model deployment.

Furthermore, it is important to highlight ways in which it would be unwise to deploy particular models. For example, if such a model might be at risk of revealing personal data were it possible to either publicly examine the weights or target it through repeated querying, it could create privacy and data protection risks with associated legal or ethical consequences.Veale, M., Binns, R., & Edwards, L. (2018). Algorithms that Remember: Model Inversion Attacks and Data Protection Law. Phil. Trans. R. Soc. A, 376, 20180083. https://doi.org/10/gfc63m Highlighting aspects or concerns such as these helps provide an institutional memory of potential failure modes which future users can choose to either take at face value or focus due diligence efforts around.

ABOUT ML Reference Document

Section 0: How to Use this Document

Recommended Reading Plan

Quick Guides

How We Define

Contact for Support

Section 1: Project Overview

1.1 Statement of Importance for ABOUT ML Project

1.1.0 Importance of Transparency: Why a Company Motivated by the Bottom Line Should Adopt ABOUT ML Recommendations

1.1.1 About This Document and Version Numbering

1.1.2 ABOUT ML Goals and Plan

1.1.3 ABOUT ML Project Process and Timeline Overview

1.1.4 Who Is This Project For?

1.1.4.1 Audiences for the ABOUT ML Resources

1.1.4.2 Stakeholders That Should Be Consulted While Putting Together ABOUT ML Resources

1.1.4.3 Audiences for ABOUT ML Documentation Artifacts

1.1.4.4 Whose Voices Are Currently Reflected in ABOUT ML?

1.1.4.5 Origin Story

Section 2: Literature Review (Current Recommendations on Documentation for Transparency in the ML Lifecycle)

2.1 Demand for Transparency and AI Ethics in ML Systems 

2.2 Documentation to Operationalize AI Ethics Goals

2.2.1 Documentation as a Process in the ML Lifecycle

2.2.2 Key Process Considerations for Documentation

2.3 Research Themes on Documentation for Transparency 

2.3.1 System Design and Set Up

2.3.2 System Development

2.3.3 System Deployment

Section 3: Preliminary Synthesized Documentation Suggestions

3.4.1 Suggested Documentation Sections for Datasets

3.4.1.1 Data Specification

3.4.1.1.1 Motivation

3.4.1.2 Data Curation 

3.4.1.2.1 Collection

3.4.1.2.2 Processing

3.4.1.2.3 Composition

3.4.1.2.4 Types and Sources of Judgement Calls

3.4.1.3 Data Integration

3.4.1.3.1 Use

3.4.1.3.2 Distribution

3.4.1.4 Maintenance

3.4.2 Suggested Documentation Sections for Models

3.4.2.1 Model Specifications

3.4.2.2 Model Training

3.4.2.3 Evaluation

3.4.2.4 Model Integration

3.4.2.5 Maintenance

Section 4: Current Challenges of Implementing Documentation

Section 5: Conclusions

Version 0

Version 1

Appendix A: Compiled List of Documentation Questions 

Fact Sheets (Arnold et al. 2018)

Data Sheets (Gebru et al. 2018)

Model Cards (Mitchell et al. 2018)

A “Nutrition Label” for Privacy (Kelley et al. 2009)

The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards (Holland et al. 2019)

Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science (Bender and Friedman 2018)

Appendix B: Diverse Voices Process and Artifacts

Procurement Recruitment Email

Procurement Confirmation Email 

Appendix C: Glossary

Sources Cited

  1. Holstein, K., Vaughan, J.W., Daumé, H., Dudík, M., & Wallach, H.M. (2018). Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? CHI.
  2. Young, M., Magassa, L. and Friedman, B. (2019) Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents. Ethics and Information Technology 21(2), 89-103.
  3. World Wide Web Consortium Process Document (W3C) process outlined here: https://www.w3.org/2019/Process-20190301/
  4. Internet Engineering Task Force (IETF) process outlined here: https://www.ietf.org/standards/process/
  5. The Web Hypertext Application Technology Working Group (WHATWG) process outlined here: https://whatwg.org/faq#process
  6. Oever, N., Moriarty, K. The Tao of IETF: A novice's guide to the Internet Engineering Task Force. https://www.ietf.org/about/participate/tao/.
  7. Young, M., Magassa, L. and Friedman, B. (2019) Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents. Ethics and Information Technology 21(2), 89-103.
  8. Friedman, B, Kahn, Peter H., and Borning, A., (2008) Value sensitive design and information systems. In Kenneth Einar Himma and Herman T. Tavani (Eds.) The Handbook of Information and Computer Ethics., (pp. 70-100) John Wiley & Sons, Inc. http://jgustilo.pbworks.com/f/the-handbook-of-information-and-computer-ethics.pdf#page=104; Davis, J., and P. Nathan, L. (2015). Value sensitive design: applications, adaptations, and critiques. Handbook of Ethics, Values, and Technological Design: Sources, Theory, Values and Application Domains. (pp. 11-40) DOI: 10.1007/978-94-007-6970-0_3. https://www.researchgate.net/publication/283744306_Value_Sensitive_Design_Applications_Adaptations_and_Critiques; Borning, A. and Muller, M. (2012). Next steps for value sensitive design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12). (pp 1125-1134) DOI: https://doi.org/10.1145/2207676.2208560 https://dl.acm.org/citation.cfm?id=2208560
  9. Pichai, S., (2018). AI at Google: our principles. The Keyword. https://www.blog.google/technology/ai/ai-principles/; IBM’s Principles for Trust and Transparency. IBM Policy. https://www.ibm.com/blogs/policy/trust-principles/; Microsoft AI principles. Microsoft. https://www.microsoft.com/en-us/ai/our-approach-to-ai; Ethically Aligned Design – Version II. IEEE. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead_v2.pdf
  10. Zeng, Y., Lu, E., and Huangfu, C. (2018) Linking artificial intelligence principles. CoRR https://arxiv.org/abs/1812.04814.
  11. essica Fjeld, Hannah Hilligoss, Nele Achten, Maia Levy Daniel, Sally Kagay, and Joshua Feldman, (2018). Principled artificial intelligence - a map of ethical and rights based approaches, Berkman Center for Internet and Society, https://ai-hr.cyber.harvard.edu/primp-viz.html
  12. Jobin, A., Ienca, M., & Vayena, E. (2019). Artificial Intelligence: the global landscape of ethics guidelines. arXiv preprint arXiv:1906.11668. https://arxiv.org/pdf/1906.11668.pdf
  13. Jobin, A., Ienca, M., & Vayena, E. (2019). Artificial Intelligence: the global landscape of ethics guidelines. arXiv preprint arXiv:1906.11668. https://arxiv.org/pdf/1906.11668.pdf
  14. Ananny, M., and Kate Crawford (2018). Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. New Media and Society 20 (3): 973-989.
  15. Whittlestone, J., Nyrup, R., Alexandrova, A., & Cave, S. (2019, January). The Role and Limits of Principles in AI Ethics: Towards a Focus on Tensions. In Proceedings of the AAAI/ACM Conference on AI Ethics and Society, Honolulu, HI, USA (pp. 27-28). http://www.aies-conference.com/wp-content/papers/main/AIES-19_paper_188.pdf; Mittelstadt, B. (2019). AI Ethics–Too Principled to Fail? https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3391293
  16. Greene, D., Hoffmann, A. L., & Stark, L. (2019, January). Better, nicer, clearer, fairer: A critical assessment of the movement for ethical artificial intelligence and machine learning. In Proceedings of the 52nd Hawaii International Conference on System Sciences. https://scholarspace.manoa.hawaii.edu/handle/10125/59651
  17. Raji, I. D., & Buolamwini, J. (2019). Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products. In AAAI/ACM Conf. on AI Ethics and Society (Vol. 1). https://www.media.mit.edu/publications/actionable-auditing-investigating-the-impact-of-publicly-naming-biased-performance-results-of-commercial-ai-products/
  18. Algorithmic Impact Assessment (2019) Government of Canada https://www.canada.ca/en/government/system/digital-government/modern-emerging-technologies/responsible-use-ai/algorithmic-impact-assessment.html
  19. Benjamin, M., Gagnon, P., Rostamzadeh, N., Pal, C., Bengio, Y., & Shee, A. (2019). Towards Standardization of Data Licenses: The Montreal Data License. arXiv preprint arXiv:1903.12262. https://arxiv.org/abs/1903.12262; Responsible AI Licenses v0.1. RAIL: Responsible AI Licenses. https://www.licenses.ai/ai-licenses
  20. See Citation 5
  21. Safe Face Pledge. https://www.safefacepledge.org/; Montreal Declaration on Responsible AI. Universite de Montreal. https://www.montrealdeclaration-responsibleai.com/; The Toronto Declaration: Protecting the right to equality and non-discrimination in machine learning systems. (2018). Amnesty International and Access Now. https://www.accessnow.org/cms/assets/uploads/2018/08/The-Toronto-Declaration_ENG_08-2018.pdf ; Dagsthul Declaration on the application of machine learning and artificial intelligence for social good. https://www.dagstuhl.de/fileadmin/redaktion/Programm/Seminar/19082/Declaration/Declaration.pdf
  22. Dobbe, R., Dean, S., Gilbert, T., & Kohli, N. (2018). A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics. https://arxiv.org/pdf/1807.00553.pdf
  23. Wagstaff, K. (2012). Machine learning that matters. https://arxiv.org/pdf/1206.4656.pdf ; Friedman, B., Kahn, P. H., Borning, A., & Huldtgren, A. (2013). Value sensitive design and information systems. In Early engagement and new technologies: Opening up the laboratory (pp. 55-95). Springer, Dordrecht. https://vsdesign.org/publications/pdf/non-scan-vsd-and-information-systems.pdf
  24. Dobbe, R., Dean, S., Gilbert, T., & Kohli, N. (2018). A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics. https://arxiv.org/pdf/1807.00553.pdf
  25. Safe Face Pledge. https://www.safefacepledge.org/
  26. Montreal Declaration on Responsible AI. Universite de Montreal. https://www.montrealdeclaration-responsibleai.com/
  27. Diverse Voices How To Guide. Tech Policy Lab, University of Washington. https://techpolicylab.uw.edu/project/diverse-voices/
  28. Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604.
  29. Ethically Aligned Design – Version II. IEEE. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead_v2.pdf
  30. Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumeé III, H., & Crawford, K. (2018). Datasheets for datasets. https://arxiv.org/abs/1803.09010 https://arxiv.org/abs/1803.09010; Hazard Communication Standard: Safety Data Sheets. Occupational Safety and Health Administration, US Department of Labor. https://www.osha.gov/Publications/OSHA3514.html
  31. Holland, S., Hosny, A., Newman, S., Joseph, J., & Chmielinski, K. (2018). The dataset nutrition label: A framework to drive higher data quality standards. https://arxiv.org/abs/1805.03677; Kelley, P. G., Bresee, J., Cranor, L. F., & Reeder, R. W. (2009). A nutrition label for privacy. In Proceedings of the 5th Symposium on Usable Privacy and Security (p. 4). ACM. http://cups.cs.cmu.edu/soups/2009/proceedings/a4-kelley.pdf
  32. Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., ... & Gebru, T. (2019, January). Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 220-229). ACM. https://arxiv.org/abs/1810.03993
  33. Hind, M., Mehta, S., Mojsilovic, A., Nair, R., Ramamurthy, K. N., Olteanu, A., & Varshney, K. R. (2018). Increasing Trust in AI Services through Supplier's Declarations of Conformity. https://arxiv.org/abs/1808.07261
  34. Veale M., Van Kleek M., & Binns R. (2018) ‘Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making’ in Proceedings of the ACM Conference on Human Factors in Computing Systems, CHI 2018. https://arxiv.org/abs/1802.01029.
  35. Benjamin, M., Gagnon, P., Rostamzadeh, N., Pal, C., Bengio, Y., & Shee, A. (2019). Towards Standardization of Data Licenses: The Montreal Data License. https://arxiv.org/abs/1903.12262
  36. Cooper, D. M. (2013, April). A Licensing Approach to Regulation of Open Robotics. In Paper for presentation for We Robot: Getting down to business conference, Stanford Law School.
  37. Responsible AI Practices. Google AI. https://ai.google/education/responsible-ai-practices
  38. Everyday Ethics for Artificial Intelligence. (2019). IBM. https://www.ibm.com/watson/assets/duo/pdf/everydayethics.pdf
  39. Federal Trade Commission. (2012). Best Practices for Common Uses of Facial Recognition Technologies (Staff Report). Federal Trade Commission, 30. https://www.ftc.gov/sites/default/files/documents/reports/facing-facts-best-practices-common-uses-facial-recognition-technologies/121022facialtechrpt.pdf
  40. Microsoft (2018). Responsible bots: 10 guidelines for developers of conversational AI. https://www.microsoft.com/en-us/research/uploads/prod/2018/11/Bot_Guidelines_Nov_2018.pdf
  41. Tramer, F., Atlidakis, V., Geambasu, R., Hsu, D., Hubaux, J. P., Humbert, M., ... & Lin, H. (2017, April). FairTest: Discovering unwarranted associations in data-driven applications. In 2017 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 401-416). IEEE. https://github.com/columbia/fairtest, https://www.mhumbert.com/publications/eurosp17.pdf
  42. Kishore Durg (2018). Testing AI: Teach and Test to raise responsible AI. Accenture Technology Blog. https://www.accenture.com/us-en/insights/technology/testing-AI
  43. Kush R. Varshney (2018). Introducing AI Fairness 360. IBM Research Blog. https://www.ibm.com/blogs/research/2018/09/ai-fairness-360/
  44. Dave Gershgorn (2018). Facebook says it has a tool to detect bias in its artificial intelligence. Quartz. https://qz.com/1268520/facebook-says-it-has-a-tool-to-detect-bias-in-its-artificial-intelligence/
  45. James Wexler. (2018) The What-If Tool: Code-Free Probing of Machine Learning Models. Google AI Blog. https://ai.googleblog.com/2018/09/the-what-if-tool-code-free-probing-of.html
  46. Miro Dudík, John Langford, Hanna Wallach, and Alekh Agarwal (2018). Machine Learning for fair decisions. Microsoft Research Blog. https://www.microsoft.com/en-us/research/blog/machine-learning-for-fair-decisions/
  47. Veale, M., Binns, R., & Edwards, L. (2018). Algorithms that Remember: Model Inversion Attacks and Data Protection Law. Phil. Trans. R. Soc. A, 376, 20180083. https://doi.org/10/gfc63m
  48. Floridi, L. (2010, February). Information: A Very Short Introduction.
  49. Data Information Specialists Committee UK, 2007. http://www.disc-uk.org/qanda.html.
  50. Harwell, Drew. “Federal Study Confirms Racial Bias of Many Facial-Recognition Systems, Casts Doubt on Their Expanding Use.” The Washington Post, WP Company, 21 Dec. 2019, www.washingtonpost.com/technology/2019/12/19/federal-study-confirms-racial-bias-many-facial-recognition-systems-casts-doubt-their-expanding-use/
  51. Hildebrandt, M. (2019) ‘Privacy as Protection of the Incomputable Self: From Agnostic to Agonistic Machine Learning’, Theoretical Inquiries in Law, 20(1) 83–121.
  52. D'Amour, A., Heller, K., Moldovan, D., Adlam, B., Alipanahi, B., Beutel, A., ... & Sculley, D. (2020). Underspecification presents challenges for credibility in modern machine learning. arXiv preprint arXiv:2011.03395.
  53. Selinger, E. (2019). ‘Why You Can’t Really Consent to Facebook’s Facial Recognition’, One Zero. https://onezero.medium.com/why-you-cant-really-consent-to-facebook-s-facial-recognition-6bb94ea1dc8f
  54. Lum, K., & Isaac, W. (2016). To predict and serve?. Significance, 13(5), 14-19. https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2016.00960.x
  55. LabelInsight (2016). “Drive Long-Term Trust & Loyalty Through Transparency”. https://www.labelinsight.com/Transparency-ROI-Study
  56. Crawford and Paglen, https://www.excavating.ai/
  57. Geva, Mor & Goldberg, Yoav & Berant, Jonathan. (2019). Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets. https://arxiv.org/pdf/1908.07898.pdf
  58. Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604.
  59. Desmond U. Patton et al (2017).
  60. See Cynthia Dwork et al.,
  61. Katta Spiel, Oliver L. Haimson, and Danielle Lottridge. (2019). How to do better with gender on surveys: a guide for HCI researchers. Interactions. 26, 4 (June 2019), 62-65. DOI: https://doi.org/10.1145/3338283
  62. A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012
  63. Momin M. Malik. (2019). Can algorithms themselves be biased? Medium. https://medium.com/berkman-klein-center/can-algorithms-themselves-be-biased-cffecbf2302c
  64. Fire, Michael, and Carlos Guestrin (2019). “Over-Optimization of Academic Publishing Metrics: Observing Goodhart’s Law in Action.” GigaScience 8 (giz053). https://doi.org/10.1093/gigascience/giz053.
  65. Vogelsang, A., & Borg, M. (2019, September). Requirements engineering for machine learning: Perspectives from data scientists. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW) (pp. 245-251). IEEE
  66. Eckersley, P. (2018). Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064.
  67. Partnership on AI. Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System, Requirement 5.
  68. Eckersley, P. (2018). Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064.https://arxiv.org/abs/1901.00064
  69. If it is not, there is likely a bug in the code. Checking a predictive model's performance on the training set cannot distinguish irreducible error (which comes from intrinsic variance of the system) from error introduced by bias and variance in the estimator; this is universal, and has nothing to do with different settings or
  70. Selbst, Andrew D. and Boyd, Danah and Friedler, Sorelle and Venkatasubramanian, Suresh and Vertesi, Janet (2018). “Fairness and Abstraction in Sociotechnical Systems”, ACM Conference on Fairness, Accountability, and Transparency (FAT*). https://ssrn.com/abstract=3265913
  71. Tools that can be used to explore and audit the predictive model fairness include FairML, Lime, IBM AI Fairness 360, SHAP, Google What-If Tool, and many others
  72. Wagstaff, K. (2012). Machine learning that matters. arXiv preprint arXiv:1206.4656. https://arxiv.org/abs/1206.4656
Table of Contents
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16