Requirement 1: Training datasets must measure the intended variables

Requirement 1: Training datasets must measure the intended variables

Datasets pose profound and unresolved challenges to the validity of statistical risk assessments. In almost all cases, errors and bias in measurement and sampling prevent readily available criminal justice datasets from reflecting what they were intended to measure. Building valid risk assessment tools would require (a) a methodology to reweight and debias training data using second sources of truth, and (b) a way to tell whether that process was valid and successful. To our knowledge, no risk assessment tools are presently built with such methods. Some of the experts within the Partnership oppose the use of risk assessment tools specifically because of their pessimism that sufficient data exists or could practically be collected to meet purposes (a) and (b).

Statistical validation of recidivism prediction in particular suffers from a fundamental problem: the ground truth of whether an individual committed a crime is generally unavailable, and can only be estimated via imperfect proxies such as crime reports or arrests. Since the target for prediction (having actually committed a crime) is unavailable, it is tempting to change the goal of the tool to predicting arrest, rather than crime. If the goal, however, of using these tools is to predict a defendant’s risk to public safety—as most risk assessment tools are—the objective must be whether a defendant is likely to commit an offense that justifies pretrial detention, not whether the defendant is likely to be arrested for or convicted of any offense in the future.Moreover, defining recidivism is difficult in the pretrial context. Usually, recidivism variables are defined using a set time period, e.g., whether someone is arrested within 1 year of their initial arrest or whether someone is arrested within 3 years of their release from prison. In the pretrial context, recidivism is defined as whether the individual is arrested during the time after their arrest (or pretrial detention) and before the individual’s trial. That period of time, however, can vary significantly from case to case, so it is necessary to ensure that each risk assessment tool predicts an appropriately defined measure of recidivism or public safety risk.

One problem with using such imperfect proxies is that different demographic groups are stopped, searched, arrested, charged, and are wrongfully convicted at very different rates in the current US criminal justice system. See, e.g., Report: The War on Marijuana in Black and White, ACLU (2013),; ACLU submission to Inter-American Commission on Human Rights, Hearing on Reports of Racism in the Justice System of the United States,, (Oct 2017); Samuel Gross, Maurice Possley, Klara Stephens, Race and Wrongful Convictions in the United States, National Registry of Exonerations,; but see Jennifer L. Skeem and Christopher Lowenkamp, Risk, Race & Recidivism: Predictive Bias and Disparate Impact, Criminology 54 (2016), 690, (For some categories of crime in some jurisdictions, victimization and self-reporting surveys imply crime rates are comparable to arrest rates across demographic groups; an explicit and transparent reweighting process is procedurally appropriate even in cases where the correction it results in is small). Further, different types of crimes are reported and recorded at different rates, and the rate of reporting may depend on the demographics of the perpetrator and victim.See David Robinson and John Logan Koepke, Stuck in a Pattern: Early evidence on ‘predictive policing’ and civil rights, (Aug. 2016). (“Criminologists have long emphasized that crime reports, and other statistics gathered by the police, are not an accurate record of the crime that happens in a community. In short, the numbers are greatly influenced by what crimes citizens choose to report, the places police are sent on patrol, and how police decide to respond to the situations they encounter. The National Crime Victimization Survey (conducted by the Department of Justice) found that from 2006-2010, 52 percent of violent crime victimizations went unreported to police and 60 percent of household property crime victimizations went unreported. Historically, the National Crime Victimization Survey ‘has shown that police are not notified of about half of all rapes, robberies and aggravated assaults.’”) See also Kristian Lum and William Isaac, To predict and serve? (2016): 14-19. For example, it is likely that all (or very nearly all) bank robberies are reported to police. Carl B. Klockars, Some Really Cheap Ways of Measuring What Really Matters, in Measuring What Matters: Proceedings From the Policing Research Meetings, 195, 195-201 (1999), [] (“If I had to select a single type of crime for which its true level—the level at which it is reported—and the police statistics that record it were virtually identical, it would be bank robbery. Those figures are likely to be identical because banks are geared in all sorts of ways…to aid in the reporting and recording of robberies and the identification of robbers. And, because mostly everyone takes bank robbery seriously, both Federal and local police are highly motivated to record such events.”) On the other hand, marijuana possession arrests are notoriously biased, with black Americans much more likely to be arrested than whites, despite similar use rates. ACLU, The War on Marijuana in Black and White: Billions of Dollars Wasted on Racially Biased Arrests, (2013), available at Thus, “arrest, conviction, and incarceration data are most appropriately viewed as measures of official response to criminal behavior,” impacting certain groups disproportionately.Lisa Stoltenberg & Stewart J. D’Alessio, Sex Differences in the Likelihood of Arrest, J. Crim. Justice 32 (5), 2004, 443-454; Lisa Stoltenberg, David Eitle & Stewart J. D’Alessio, Race and the Probability of Arrest, Social Forces 81(4) 2003 1381-1387; Tia Stevens & Merry Morash, Racial/Ethnic Disparities in Boys’ Probability of Arrest and Court Actions in 1980 and 2000: The Disproportionate Impact of ‘‘Getting Tough’’ on Crime, Youth and Juvenile Justice 13(1), (2014).

Estimating such biases can be difficult, although in some cases may be possible by using secondary sources of data collected separately from law enforcement or government agencies. Delbert S. Elliott, Lies, Damn Lies, and Arrest Statistics, (1995),, 11. For example, arrest or conviction data could be reweighted using the National Crime Victimization Survey, which provides a second method of estimating the demographic characteristics for types of crimes where there is a victim who is able to see the perpetrator, or surveys that collect self-reported data about crime perpetration and arrest such as the National Longitudinal Surveys of Youth. Performing such reweighting would be a subtle statistical task that could easily be performed incorrectly, and so a second essential ingredient would be developing a method accepted by the machine learning and statistical research communities for determining whether data reweighting had produced valid results that accurately reflect the world.

Beyond the difficulty in measuring certain outcomes, data is also needed to properly distinguish between different causes of the same outcome. For instance, just looking at an outcome of failure to appear in court obscures the fact that there are many different possible reasons for such an outcome. Given that there are legitimate reasons for failing to appear for court that would not suggest that the individuals pose a danger to society (e.g., a family emergency or limited transportation options), Simply reminding people to appear improves appearance rates. Pretrial Justice Center for Courts, Use of Court Date Reminder Notices to Improve Court Appearance Rates, (Sept. 2017). grouping together all individuals who fail to appear for court would unfairly increase the probability that individuals that tend to have more legitimate reasons for failing to appear in court (e.g., people with dependants or who have limited transportation options) would be unfairly detained. Thus, if the goal of a risk assessment tool is to make predictions about whether or not a defendant will flee justice, data would need to be collected that distinguish between individuals that intentionally versus unintentionally fail to appear for court dates. There are a number of obstacles that risk assessment toolmakers have identified towards better predictions on this front. Firstly, there is a lack of consistent data and definitions to help disentangle willful flight from justice from failures to appear for reasons that are either unintentional or not indicative of public safety risk. Policymakers may need to take the lead in defining and collecting data on these reasons, as well as identifying interventions besides incarceration that may be most appropriate for responding to them.

Given that validity often depends on local context to ensure a tool’s utility, where possible, the data discussed above should be collected on a jurisdiction-by-jurisdiction basis in order to capture significant differences in geography, transportation, and local procedure that affect those outcomes.