ecafb94 - Partnership on AI

Requirement 3: Tools must not conflate multiple distinct predictions

Risk assessment tools must not produce composite scores that combine predictions of different outcomes for which different interventions are appropriate. In other words, the tool should predict the specific risk it is hoping to measure, and produce separate scores for each type of risk (as opposed to a single risk score reflecting the risk of multiple outcomes). For instance, risk assessment tools should not conflate a defendant’s risk of failure to appear for a scheduled court date with the risk of rearrest. Many existing pretrial risk assessment tools, however, do exactly this: they produce a single risk score that represents the risk of failure to appear or rearrest occurring. Sandra G. Mayson, Dangerous Defendants, 127 Yale L.J. 490, 509-510 (2018).In some cases this may violate local law; many jurisdictions only permit one cause as a basis for pretrial detention. And regardless of the legal situation, a hybrid prediction is inappropriate on statistical grounds.

Different causal mechanisms drive each of the phenomena that are combined in hybrid risk scores.Id., at 510. (“The two risks are different in kind, are best predicted by different variables, and are most effectively managed in different ways.”) The reasons for someone not appearing in court, getting re-arrested, and/or getting convicted of a future crime are all very distinct, so a high score would not be readily interpretable and would group together people who are likely to have a less dangerous outcome (not appearing in court) with more dangerous outcomes (being convicted of a future crime). For instance, needing childcare increases the risk of failure to appear (see Brian H. Bornsein, Alan J. Thomkins & Elizabeth N. Neely, Reducing Courts’ Failure to Appear Rate: A Procedural Justice Approach, U.S. DOJ report 234370, available at https://www.ncjrs.gov/pdffiles1/nij/grants/234370.pdf ) but is less likely to increase the risk of recidivism. In addition, as a matter of statistical validity, past convictions for non-violent crimes that have since been decriminalized (e.g., marijuana possession in many states) arguably should be considered differently from other kinds of convictions if the goal is to predict future crime or public safety risk.

Moreover, different types of intervention (both as a policy and a legal matter) are appropriate for each of these different phenomena. For example, if the goal of a risk assessment tool is to advance the twin public policy goals of reducing incarceration and ensuring defendants appear for their court dates, then the tool should not conflate a defendant’s risk of knowingly fleeing justice with their risk of unintentionally failing to appear, since the latter can be mitigated by interventions besides incarceration (e.g. giving the defendant the opportunity to sign up for phone calls or SMS-based reminders about their court date, or ensuring the defendant has transportation to court on the day they are to appear). Risk assessment tools should only be deployed in the specific context for which they were intended, including at the specific stage of a criminal proceeding and to the specific population for which they were meant to predict risk. For example, the potential risk of failing to appear to a court date at a pretrial stage should have no bearing in a sentencing hearing. Notably, part of the holding in Loomis, mandated a disclosure in any Presentence Investigation Report that COMPAS risk assessment information “was not developed for use at sentencing, but was intended for use by the Department of Corrections in making determinations regarding treatment, supervision, and parole,” Wisconsin v. Loomis (881 N.W.2d 749). Likewise, predicting risks for certain segments of the population, such as juveniles, is distinct from predicting risks for the general population.

Risk assessment tools must be clear about which of these many distinct predictions they are making, and steps should be taken to safeguard against conflating different predictions and using risk scores in inappropriate contexts.

Human-Computer Interface Issues

While risk assessment tools provide input and recommendations to decision-making processes, the ultimate decision-making authority still resides in the hands of humans. Judges, court clerks, pretrial services officers, probation officers, and prosecutors all may use risk assessment scores to guide their judgments. Thus, critical human-computer interface issues must also be addressed when considering the use of risk assessment tools.

One of the key challenges of statistical decision-making tools is the phenomenon of automation bias, where information presented by a machine is viewed as inherently trustworthy and above skepticism. M.L. Cummings, Automation Bias in Intelligent Time Critical Decision Support Systems, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.2634&rep=rep1&type=pdf. This can lead humans to over-rely on the accuracy or correctness of automated systems. It is important to note, however, that there is also evidence of the opposite phenomenon, whereby users might simply ignore the risk assessment tools’ predictions. In Christin’s ethnography of risk assessment users, she notes that professionals often “buffer” their professional judgment from the influence of automated tools. She quotes a former prosecutor as saying of risk assessment, “When I was a prosecutor I didn’t put much stock in it, I’d prefer to look at actual behaviors. I just didn’t know how these tests were administered, in which circumstances, with what kind of data.” From Christin, A., 2017, Algorithms in practice: Comparing web journalism and criminal justice, Big Data & Society, 4(2). The holding in Wisconsin v. Loomis See Wisconsin v. Loomis (881 N.W.2d 749). indirectly addressed the issue of automation bias by requiring that any Presentence Investigation Report containing a COMPAS risk assessment be accompanied by a written disclaimer that the scores may be inaccurate and have been shown to disparately classify offenders. “Specifically, any PSI containing a COMPAS risk assessment must inform the sentencing court about the following cautions regarding a COMPAS risk assessment’s accuracy: (1) the proprietary nature of COMPAS has been invoked to prevent disclosure of information relating to how factors are weighed or how risk scores are to be determined; (2) risk assessment compares defendants to a national sample, but no cross- validation study for a Wisconsin population has yet been completed; (3) some studies of COMPAS risk assessment scores have raised questions about whether they disproportionately classify minority offenders as having a higher risk of recidivism; and (4) risk assessment tools must be constantly monitored and re-normed for accuracy due to changing populations and subpopulations.” Wisconsin v. Loomis (881 N.W.2d 749). While disclosure regarding the limitations of risk assessment tools is an important first step, it is still insufficient. Over time, there is the risk that judges become inured to lengthy disclosure language repeated at the beginning of each report. Moreover, the disclosures do not make clear how, if at all, judges should interpret or understand the practical limits of risk assessments.

This section attempts to illustrate how to safeguard against automation bias and other critical human-computer interface issues by ensuring (i) risk assessment tools are easily interpretable by human users, (ii) users of risk assessment tools receive information about the uncertainty behind the tools’ predictions, and (iii) adequate resources are dedicated to fund proper training for use of these tools.