Requirement 8: Tool designs, architectures, and training data must be open to research, review and criticism Requirement 8: Tool designs, architectures, and training data must be open to research, review and criticismRequirement 8: Tool designs, architectures, and training data must be open to research, review and criticism
Risk assessment tools embody important public policy decisions made by governments, and must be as open and transparent as any law, regulation, or rule of court. Thus, governments must not deploy any proprietary risk assessments that rely on claims of trade secrets to prevent transparency. For further discussion on the social justice concerns related to using trade secret law to prevent the disclosure of the data and algorithms behind risk assessment tools, see Taylor R. Moore,Trade Secrets and Algorithms as Barriers to Social Justice, Center for Democracy and Technology (August 2017), https://cdt.org/files/2017/08/2017-07-31-Trade-Secret-Algorithms-as-Barriers-to-Social-Justice.pdf.
In particular, the training datasets, architectures, algorithms, and models of all tools under consideration for deployment must be made broadly available to all interested research communities—such as those from statistics, computer science, social science, public policy, law, and criminology, so that they are able to evaluate them before and after deployment. Several countries already publish the details of their risk assessment models. See, e.g., Tollenaar, Nikolaj, et al. StatRec-Performance, validation and preservability of a static risk prediction instrument, Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 129.1 (2016): 25-44 (in relation to the Netherlands); A Compendium of Research and Analysis on the Offender Assessment System (OaSys) (Robin Moore ed., Ministry of Justice Analytical Series, 2015) (in relation to the United Kingdom). Recent legislation also attempts to mandate transparency safeguards, see Idaho Legislature, House Bill No.118 (2019).
We note that much of the technical research literature on fairness that has appeared in the past two years resulted from ProPublica’s pioneering work in publishing a single dataset related to the Northpointe COMPAS risk assessment tool, which was obtained via public records requests in Broward County, Florida. See, e.g., Jeff Larson et al. How We Analyzed the COMPAS Recidivism Algorithm, ProPublica (May 23, 2016), https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm. For a sample of the research that became possible as a result of ProPublica’s data, see https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=propublica+fairness+broward. Data provided by Kentucky’s Administrative Office of the Courts has also enabled scholar’s to examine the impact of the implementation of the PSA tool in that state. Stevenson, Megan, Assessing Risk Assessment in Action (June 14, 2018). Minn. L. Rev, 103, Forthcoming; available at https://ssrn.com/abstract=3016088 Publishing such datasets enables the independent research and public discourse required to evaluate their effectiveness.
However, it is important to note that when such datasets are shared, appropriate de-identification techniques should be used to ensure that non-public personal information cannot be derived from the datasets.For an example of how a data analysis competition dealt with privacy concerns when releasing a dataset with highly sensitive information about individuals, see Ian Lundberg et al., Privacy, ethics, and data access: A case study of the Fragile Families Challenge (Sept. 1, 2018), https://arxiv.org/pdf/1809.00103.pdf. Given increasingly sophisticated information triangulation and re-identification techniques, See Arvind Narayanan et al., A Precautionary Approach to Big Data Privacy (Mar. 19, 2015), http://randomwalker.info/publications/precautionary.pdf. additional measures might be necessary, such as contractual conditions that the recipients use the data only for specific purposes, and that once those purposes are accomplished, they delete their copy of the dataset. See id. at p. 20 and 21 (describing how some sensitive datasets are only shared after the recipient completes a data use course, provides information about the recipient, and physically signs a data use agreement).