5087157 - Partnership on AI

Requirement 4: Predictions and how they are made must be easily interpretable

While advocates have focused on the issues mentioned above of bias in risk prediction scores, one often overlooked aspect of fairness is the way risk scores are translated for human users. Developers and jurisdictions deploying risk assessment tools must ensure that tools convey their predictions in a way that is straightforward to human users and illustrate how those predictions are made. This means ensuring that interfaces presented to judges, clerks, lawyers, and defendants are clear, easily understandable, and not misleading.Computer interfaces, even for simple tasks, can be highly confusing to users. For example, one study found that users failed to notice anomalies on a screen designed to show them choices they had previously selected for confirmation over 50% of the time, even after carefully redesigning the confirmation screen to maximize the visibility of anomalies. See Campbell, B. A., & Byrne, M. D. (2009). Now do voters notice review screen anomalies? A look at voting system usability, Proceedings of the 2009 Electronic Voting Technology Workshop/Workshop on Trustworthy Elections (EVT/WOTE ’09).

Interpretability involves providing users with an understanding of the relationship between input features and output predictions. We should caution that this may not mean restricting the model to an “interpretable” but less accurate mathematical form, but instead using techniques that provide separate interpretations for more complex predictions. This point depends on the number of input variables used for prediction. With a model that has a large number of features (such as COMPAS), it might be appropriate to use a method like gradient-boosted decision trees or random forests, and then provide the interpretation using an approximation. See Zach Lipton, The Mythos of Model Interpretability, Proc. ICML 2016, available at https://arxiv.org/pdf/1606.03490.pdf, §4.1. For examples of methods for providing explanations of complex models, see, e.g., Gilles Louppe et al., Understanding the variable importances in forests of randomized trees, Proc. NIPS 2013, available at https://papers.nips.cc/paper/4928-understanding-variable-importances-in-forests-of-randomized-trees.pdf; Marco Ribeiro, LIME – Local Interpretable

Providing interpretations for predictions can help users understand how each variable contributes to the prediction, and how sensitive the model is to certain variables. This is crucial for ensuring that decision-makers are consistent in their understandings of how models work and the predictions they produce, and that the misinterpretation of scores by individual judges does not result in the disparate application of justice. Because interpretability is a property of the tools as used by people, it requires consideration of the use of risk assessments in context and depends on how effectively they can be employed as tools by their human users.

At the same time, developers of models should ensure that the intuitive interpretation is not at odds with intended risk prediction. For instance, judges or other users might intuitively guess that ordered categories are of similar size, represent absolute levels of risk rather than relative assessments, and cover the full spectrum of approximate risk levels. Laurel Eckhouse et al., Layers of Bias: A Unified Approach for Understanding Problems With Risk Assessment, 46(2) Criminal Justice and Behavior 185–209 (2018), https://doi.org/10.1177/0093854818811379 Thus, on a 5-point scale, a natural interpretation would be that a score of one implies a 0% to 20% risk of reoffending (or another outcome of interest), category two a 21% to 40% risk, and so on. However, this is not the case for many risk assessment tools.

One study compared the Pretrial Risk Assessment Tool (PTRA), which converts risk scores into a 5-point risk scale, with the actual likelihood of the outcome (in this case, rearrest, violent rearrest, failure to appear, and/or bail revocation). See id. Only 35% of defendants classified at the highest risk level failed to appear for trial or were rearrested before trial. The probabilities of failure to appear and of rearrest for all risk levels were within the intuitive interval for the lowest risk level. See id. Similarly, there are also substantial gaps between the intuitive and the correct interpretations of risk categories in Colorado’s Pretrial Assessment Tool. The lowest risk category for the Colorado Pretrial Assessment Tool (CPAT) included scores 0-17, while the highest risk category included a much broader range of scores: 51-82. In addition, the highest risk category corresponded to a Public Safety Rate of 58% and a Court Appearance Rate of 51%. Pretrial Justice Institute, (2013). Colorado Pretrial Assessment Tool (CPAT): Administration, scoring, and reporting manual, Version 1. Pretrial Justice Institute. Retrieved from http://capscolorado.org/yahoo_site_admin/assets/docs/CPAT_Manual_v1_-_PJI_2013.279135658.pdf In order to mitigate these shortcomings, jurisdictions would need to collect data and conduct further research on user interface choices, information display, and users’ psychological responses to information about prediction uncertainty. User and usability studies such as those from the human-computer interaction field can be employed to study the question of how much deference judges give to pretrial or pre-sentencing investigations. For example, a study could examine how error bands affect judges’ inclination to follow predictions or (when they have other instincts) overrule them.