Appendices
Appendices
Appendix 1: Fairness, Transparency and Accountability Program Area at Partnership on AI
Appendix 1: Fairness, Transparency & Accountability Program Area at Partnership on AI
The Fairness, Transparency, and Accountability program area at Partnership on AI encompasses PAI’s large body of research and programming around issues related to discriminatory harms of algorithmic systems. Since 2020, the team has sought to understand the types of demographic data collection practices and governance frameworks required to ensure that fairness assessments of algorithmic systems are conducted in the public interest. The team has explored data collection and algorithmic fairness practices and processes from both an organizational process perspective and an equity and inclusion perspective. The program area aims to demonstrate the importance of categorization and datafication practices to organizational efforts to: 1) make algorithmic decision-making more “fair”; 2) develop guidelines for how organizations can include participatory, inclusive practices around data collection to achieve “fairness” or “non-discrimination”; and 3) assess the contextual feasibility of existing and emerging fairness techniques.
Appendix 2: Case Study Details
Appendix 2: Case Study Details
In 2022, Apple released a new feature in their Apple Wallet app, IDs in Wallet, which allows users to store a digital copy of their state-issued identification card or driver’s license to be used in lieu of their physical card. Users are required to undergo several identification verification checks to help ensure that the person adding the identity card to Wallet is the same person to whom the identity card belongs. The state is responsible for verifying and approving the user’s request to add their driver’s license or state ID to Wallet.
In order to help Apple and the state issuing authority ensure fairness in the identity verification process, Apple asks users to share select demographic data (such as age range or sex) from a user opt-in screen at the end of the ID in Wallet setup flow. This analysis helps determine if outcomes during the setup and approval process are different for groups of users.
Sharing information is optional and if users agree to share, the information is collected in a way that helps preserve user privacy. Federated statistics uses differential privacy to allow analytics of aggregated information without Apple, or the ID-issuing state, learning individual-level information. No personally identifiable information is collected, stored, or used by Apple or the state issuing authority as part of this process. Users can opt out of sharing this data at any time.
Apple is using differentially private federated statistics as part of a larger fairness assessment strategy for IDs in Wallet that includes rigorous user testing and inclusivity roundtable discussions with state issuing authorities and third-party vendors, and integration of existing inclusivity strategies from other Apple teams, among others.
Appendix 3: Multistakeholder Convenings
Appendix 3: Multistakeholder Convenings
PAI led a series of convenings designed to engage a diverse set of experiences and expertise to explore the questions posed in this project through both social and technical lenses. PAI has had success with facilitating multistakeholder, multi-disciplinary discussions about pressing ethical issues in the field of AI, leveraging the diversity of participants to capture the necessary nuance to address those issues.
Extended semi-structured small group discussions moderated by facilitators from PAI allowed expert participants to uncover questions and considerations around the use of differential private federated statistics for the algorithmic fairness assessments that may not have been considered otherwise.
PAI actively encourages participants to pose additional questions or considerations outside of the initial interview protocol. By grounding these convenings in social scientific research methodologies, discussions are designed to yield more general insights on how organizations can use differentially private federated statistics to ethically and responsibly approach bias and fairness assessments, or to more rigorously examine and improve their own approaches and strategies.
APPENDIX TABLE 1: Convening Discussion Questions
Convening Topic: Methods for Inclusivity in Data Collection
- When beginning a data collection effort to support algorithmic fairness, what processes should be adopted to support the identification of the appropriate and most accurate measurement of demographic categories to include in a fairness assessment?
- What considerations should be taken into account when deciding on the desired demographic data?
- Who should be included in this process?
- What impact might these considerations have on different demographic groups?
- How can we design participatory and inclusive demographic data collection methods that preserve privacy and advance fairness?
- What are the benefits and risks of various methods (perhaps those we discussed previously), particularly for marginalized groups?
- What role does consent play in this process?
Convening Topic: Advancing Privacy and Fairness
- Does this technique as a whole lend itself to both privacy and accuracy for users from all demographic groups?
- Which components of differentially private federated statistics are most vulnerable to breakdowns that could harm marginalized groups?
- How should interacting components of differential private federated statistics be determined when striving towards privacy and accuracy for all demographic groups?
- Under what circumstances would the privacy budget, sets of queries, and lifespan of data retention be adjusted?
- Could this be adjusted in order to better serve different demographic groups?
Appendix 4: Glossary
Appendix 4: Glossary
Refers to instances when an individual is incorrectly classified in a dataset, despite the existence of an accurate (representative) data category.
For example, an algorithmic system that uses racial proxy analysis by assigning a racial category to an individual based on the analysis of an individual’s skin tone in an image may classify someone as “White” due to the perceived “light” skin tone of an individual who racially identifies as “Asian.”
Refers to a system composed of one or more algorithms, or an automated procedure used to perform a computation, and includes systems using machine learning or following a pre-programmed set of rules.
For example, search engines, traffic signals, and facial recognition software all rely on algorithmic systems to function.
Refers to the closeness (accuracy) between the representation of a value or data point and the true value or data point.
For example, high analytical accuracy is achieved when the distribution of gender categories (% of a population in each gender category) in a dataset matches the gender distribution of the measured population.
Refers to an incident where sensitive data or confidential information is stolen or otherwise accessed without the authorization of the system’s owner.
For example, the physical theft of a hard drive containing users’ personal information or a ransomware cyberattack that prevents a company from accessing its own customer data unless a ransom is paid are both considered data breaches.
Refers to the conversion of various aspects of human life into quantitative data which then allows for quantitative analysis of social and individual behaviors.
For example, peoples’ communication, images, and speech are all converted into data points via various technological platforms by messaging platforms (e.g., text from conversations), social media sites (e.g., text- and image-based posts), and digital assistants (e.g., audio recordings of voice commands given to the device), respectively.
Refers to instances when the demographic categories applied in a dataset do not adequately or accurately represent the identity of the individual being counted.
For example, if a survey only provides the options to select “man” or “woman,” an individual who identifies as gender non-binary (as neither a woman or a man) will be represented inaccurately in the dataset.
Refers to situations where the identity of a person or organization is discoverable even though the individual or organization’s name is not available or purposely removed, typically by matching the anonymized dataset with publicly available data or auxiliary data.
For example, an anonymized dataset containing private health information can be re-identified if the identification number used to distinguish individuals from one another is their social security number and a separate list with the individuals’ names and social security number is used.
Refers to a set of techniques used by data analysts to fill in, as accurately as possible, missing or unidentified demographic traits by analyzing other available data.
For example, an individual’s name can be used to guess their gender identity by analyzing how often that name is associated with individuals who identify as a woman versus how often it is associated with individuals who identify as a man.
Refers to the often-used legal interpretation of “fairness” where the emphasis is on determining whether one group experiences different outcomes or treatment (unfair), including when differences emerge unintentionally.
Adverse impact specifically refers to instances of disparate impact when a group disproportionately (with greater frequency or intensity) experiences a negative outcome or treatment.
The “80% rule” refers to one specific, and simple, way to “test” for instances of adverse impact, originally designed by the State of California Fair Employment Practice Commission, by calculating whether the selection rate for a minority group (the group with the lowest selection rate) is less than 80% of the rate for the group with the highest selection rate (typically the majority group).
Refers to a form of consent (permission for something to happen) where individuals fully understand the scope and implications of their participation (including disclosure of their information) and have the ability to refuse or withdraw their participation at any time.
For example, a person is able to provide knowledgeable consent for the use of their blood sample in a scientific experiment because the individual understands the various instances when their sample may be used — and what may happen due to the use of their sample — and is given frequent opportunities throughout the entire duration (when their sample is in possession of the scientists conducting the experiment) to withdraw their sample from use.
Refers to an approach in which social structures and technical systems are understood to co-inform one another. Assessing just technical components of a system obscures the human components that are embedded within them, thereby misrepresenting the consequences and impacts of the system.
For example, a sociotechnical analysis of an algorithmic system would include assessment of the various social components influencing the design, production, and deployment of the system as well as the social impacts of the system.
Refers to a group within a society that is smaller in size (fewer number of people) than another group. In the case of demographic groups, this may overlap with social minorities, which are defined as groups that experience systematic discrimination, prejudice, and harm on the basis of a demographic trait.
For example, people who identify as transgender (someone whose gender identity differs from the one that is typically associated with the sex assigned at birth) is a social minority in the US due to the discrimination and harm experienced and a statistical minority due to the relatively smaller size of the population compared to people who identify as cisgender (someone whose gender identity corresponds with the sex assigned at birth). On the other hand, women are considered a social minority due to the systematic discrimination the group experiences but are not currently a statistical minority.
Refers to a commonly used definition of fairness in machine learning related to the legal doctrine of “disparate impact” where a model is considered to be operating fairly if each group is expected to have the same probability of experiencing the positive, favorable outcome.
For example, statistical parity in a machine learning model used to recommend promotions at a workplace would require that men and women in the dataset have the same likelihood of receiving a recommendation for promotion.
Refers to data-driven tools embedded in the built and/or digital environment that allow for the monitoring of individual behaviors and actions.
Surveillance infrastructure can include tools like traffic light cameras or wearable devices. Due to systematic inequalities like racism and xenophobia which consider specific groups of people as more dangerous or socially deviant (and therefore require constant monitoring), marginalized communities are often disproportionately subject to surveillance infrastructure.
Appendix 5: Detailed Summary of Challenges and Risks Associated with Demographic Data Collection and Analysis
Appendix 5: Detailed Summary of Challenges & Risks Associated with Demographic Data Collection & Analysis
APPENDIX TABLE 2: Challenges and Risks Associated With Demographic Data Collection and Analysis
Challenge or Risk | Definition | Example |
ORGANIZATIONAL CONCERNS | ||
Organizational priorities | Fairness analyses and interventions often do not support, or may even conflict, with key performance indicators used to evaluate employee performance | Adequate (or additional) data collection may be considered too costly to be financially justifiable for an organization whose primary concern is to reduce development and production costs as much as possible to maximize overall profitability private-sector company |
Public relations risk | Efforts to collect demographic data could lead to public suspicion and distrust | Due to increasing public scrutiny of data misuse by organizations, the public is skeptical of any reason given to justify the collection of additional user data and looks for any indication (whether real or imagined) of data misuse |
Discomfort (or lack of expertise) with identifying appropriate demographic groups | The lack of standardized approaches to choosing salient demographic categories and subcategories leads to inaction | Companies are hesitant to define demographic categories in collection efforts at risk of public criticism so often turn to outdated governmental standards, like binary gender categories |
LEGAL BARRIERS | ||
Anti-discrimination laws | In key protected domains, such as finance and healthcare, collection of demographic data may conflict with anti-discrimination laws | Companies selling credit-based products, for example, are barred from collecting demographic data in most instances but are still held to anti-discrimination standards, making robust fairness assessments difficult |
Privacy policies | Increasingly protective privacy regulation has empowered privacy and legal teams to err on the side of caution when it comes to data sensitivity |
The GDPR designation of race as a “special” demographic category that requires companies meet a high set of standards in order to justify collection may dissuade companies from gathering this information |
SOCIAL RISKS TO INDIVIDUALS | ||
Unique privacy risks associated with the sharing of sensitive attributes likely to be the target of fairness analysis | As attributes such as race, ethnicity, country of birth, gender, and sexuality are usually consequential aspects of one’s identity, collecting and usage of this data presents key privacy risks | Usage of demographic information can allow for harmful consequences such as political ad targeting of marginalized groups, leading to racial inequities in information access |
Possible harms stemming from miscategorizing and misrepresenting individuals in the data collection process | Individual misrepresentation can lead to discrimination and disparate impacts | Algorithmically inferred racial category collection practices can further entrench pseudoscientific practices which assume invisible aspects of one’s identity from visible characteristics, such as physiognomy |
Use of sensitive data beyond data subjects’ expectations | Use of demographic information beyond initial intent not only breaks consent of data subjects but can also lead to unintended harmful consequences | The US government developed the Prisoner Assessment Tool Targeting Estimated Risk and Needs to provide guidance on recidivism reduction programming but was then repurposed to inform inmate transfers, leading to racially disparate outcomes |
SOCIAL RISKS TO COMMUNITIES | ||
Expansion of surveillance infrastructure in the name of fairness | Marginalized communities are often subjected to invasive, cumbersome, and experimental data collection methods, which can be further exacerbated by fairness assessments. Expanded surveillance can constrain agency and result in exploitation for these groups. | Data collected from marginalized communities is often used against them, such as in predictive policing technology and other law enforcement surveillance tactics |
Misrepresenting and mischaracterizing what it means to be part of a demographic group or to hold a certain identity | Incorrectly assigned demographic categories can reinforce harmful stereotypes, naturalize schemas of categorization, and cause other forms of “administrative violence” | This can occur because the range of demographic categories is too narrow, such as leaving out options for “non-binary” or “gender-fluid” in the case of gender, leading to undercounting of gender non-conforming individuals |
Data subjects ceding the ability to define for themselves what constitutes biased or unfair treatment | When companies leading the data collection effort alone define unfairness, with no input from marginalized groups, key instances of discrimination can be missed and the status quo can be reinforced | Strictly formalized definitions of fairness measurement can lead to ineffective and even harmful fairness interventions because they ignore the socio-historical conditions that lead to inequities |