Although more defendants were granted release without bail, the change mostly helped white people. “On average white defendants benefited more than black defendants,” Stevenson says. The pattern held after Kentucky adopted a more complex risk-scoring algorithm in 2013.
One explanation supported by Kentucky data, she says, is that judges responded to risk scores differently in different parts of the state. In rural counties, where most defendants were white, judges granted release without bond to significantly more people. Judges in urban counties, where the defendant pool was more mixed, changed their habits less.
A separate study using Kentucky data, presented at a conference this summer, suggests a more troubling effect was also at work. It found that judges were more likely to overrule the default recommendation to waive a financial bond for moderate-risk defendants if the defendants were black.
Harvard researcher Alex Albright, who authored that study, says it shows more attention is needed to how humans interpret algorithms’ predictions. “We should put as much effort into how we train people to use predictions as we do into the predictions,” she says.
Michael Thacker, risk-assessment coordinator with Kentucky pretrial services, said his agency tries to mitigate potential bias in risk-assessment tools and talks with judges about the potential for “implicit bias” in how they interpret the risk scores.
An experiment that tested how judges react to hypothetical risk scores for determining sentences also found evidence that algorithmic advice can cause unexpected problems. The study, which is pending publication, asked 340 judges to decide sentences for made-up drug cases. Half of the judges saw “cases” with risk scores estimating the defendant had a medium to high risk of rearrest and half did not.
When they weren’t given a risk score, judges were tougher on more-affluent defendants than poor ones. Adding the algorithm reversed the trend: Richer defendants had a 44 percent chance of doing time but poorer ones a 61 percent chance. The pattern held after controlling for the sex, race, political orientation, and jurisdiction of the judge.
“I thought that risk assessment probably wouldn’t have much effect on sentencing,” says Jennifer Skeem, a UC Berkeley professor who worked on the study with colleagues from UC Irvine and the University of Virginia. “Now we understand that risk assessment can interact with judges to make disparities worse.”
There is reason to think that if risk scores were implemented carefully, they could help make the criminal justice system fairer. The common practice of requiring cash bail is widely acknowledged to exacerbate inequality by penalizing people of limited means. A National Bureau of Economic Research study from 2017 used past New York City records to project that an algorithm predicting whether someone will skip a court date could cut the jail population by 42 percent and shrink the proportion of black and Hispanic inmates, without increasing crime.
Unfortunately, the way risk-scoring algorithms have been rolled out across the US is much messier than in the hypothetical world of such studies.
Criminal justice algorithms are generally relatively simple and produce scores from a small number of inputs such as age, offense, and prior convictions. But their developers have sometimes restricted government agencies using their tools from releasing information about their design and performance. Jurisdictions haven’t allowed outsiders access to the data needed to check their performance.
“These tools were deployed out of reasonable desire for evidence-based decisionmaking, but it was not done with sufficient caution,” says Peter Eckersley, director of research at Partnership on AI, a nonprofit founded by major tech companies to examine how the technology affects society. PAI released a report in April that detailed problems with risk assessment algorithms and recommended agencies appoint outside bodies to audit their systems and their effects.