cs152project

Risk Assessment Algorithms in the Criminal Justice System - An Ethical Investigation

Maddy

Project updates with description of proposed code in update #1.

1. Introduction

Risk assessment instruments (RAIs) are tools used by the courts, usually judges and prosecutors, that employ various machine learning techniques, including neural networks, to predict the probability of a defendant engaging in criminal activity based on a selection of individual attributes (demographics, criminal history, etc.). Although these algorithms are an attempt to minimize personal and institutional bias in the judicial system, the algorithms present a number of ethical concerns regarding their inputs and their proper role in society. This paper evaluates the effectiveness and ethical questions raised by real-world algorithm applications developed for risk assessment in the criminal justice space. The evidence suggests that while RAIs may be able to reduce statistics like pretrial incarceration rates, there is little evidence to support the claim that they reduce bias in the criminal justice system.

2. Literature Review

Predicting Enemies by Ashley S. Deeks and Understanding Risk Assessment Instruments in Criminal Justice by Alex Chohlas-Wood are articles that provide a detailed analysis of the different ways in which predictive algorithms are used in the criminal justice system. Chohlas-Wood goes into detail on what tools are currently being used and what companies are creating them. How these algorithms could be fixed for the better is an important part of this article, whereas Deeks’ goal in Predicting Enemies is to warn off other sectors from utilizing these kinds of algorithms because of issues already present in their utilization in the criminal justice system. In a similar vein The Report on Algorithmic Risk Assessment Tool in US Criminal Justice System published by the nonprofit Partnership on AI reports on the serious shortcomings of risk assessment tools in the criminal justice systems through consultations with experts and a review of previous literature on risk assessment tools.

Work has also been done on how the discretionary use of risk assessment algorithms, rather than strictly adhered to policy, can add bias to the very proceedings these algorithms were aimed to neutralize. In Predictive Algorithms and Criminal Sentencing, Angele Christi analyzes how the criminal justice system relies on algorithms to determine sentences and criminality while performing an in-depth analysis of the ways in which judges and prosecutors use these algorithms to suit their interests in court, utilizing the results when they want and disregarding them when they do not. Because of this discretionary use, risk assessment instruments cannot be touted as eliminators of human bias before even considering the bias built into these tools.

3. Purpose of Risk Assessment Algorithms

Risk assessment algorithms come with the intention of using data to intelligently make decisions regarding a defendant’s risk factors in the judicial system and thereby standardize these decisions across the courts. The most widely used category of RAIs used by the courts are those which provide a risk score used by judges to set pretrial bail amount and conditions for a defendant. Historically these decisions have been made by the judges themselves based on guidelines provided by experts in the field and the written standard of law. Because judges are setting a bail price in addition to the question of whether or not to allow bail, they often consider factors beyond the risk of the defendant such as their financial ability to pay. How wealth disparity interacts with bail decisions and judicial proceedings is an issue often ignored in the analysis of RAIs.

In their idealized form “algorithmic RAIs have the potential to bring consistency, accuracy, and transparency to judicial decisions.”(Brookings, 2020). Such achievements would be an important step in long-sought criminal justice reforms for the United States. An analogy drawn by advocates is with the transformation seen in professional baseball, that the United States needs to “moneyballing justice” (Christin, 2019). If successfully applied these algorithms could reduce bias and mitigate unnecessary incarceration. However, RAIs have drawn sharp criticism from criminal justice advocates, developers, and the public alike because of both lacking implementation and flawed reasoning for their initial creation. Rather than helping to solve problems of mass incarceration, high recidivism rates, and the criminalization of minorities in the United States; critics argue that employing RAIs will entrench these issues.

The current state-of-the-art in risk assessment algorithms being deployed in policing and criminal justice sectors includes the Public Safety Assessment (PSA) system. The PSA is used by judges to determine pretrial release risk. A defendant’s biographical information (such as age, race, and gender) and criminal history are used to assign 3 risk scores - that they will be convicted for a new crime, that they will be convicted for a new violent crime, and that they will fail to appear in court. These scores are then used by a judge or prosecutor to determine whether a defendant can await trial at home or in custody. What score a defendant gets doesn’t directly impact whether they are found guilty and the sentence they ultimately receive, but there are indirect channels of influence by which their PSA score can affect their sentence. A high risk score outputted by the algorithm may influence how the judge perceives the defendant before the trial even begins which would affect sentencing. Likewise a defendant awaiting trial in custody because of a high risk score is sure to impact the preparation of their defense and how they present themself in court before the jury.

4. Methods and Inputs

Throughout the country dozens of algorithms are used by dozens of jurisdictions, “including Arizona, Kentucky, New Jersey, Charlotte, Chicago, and Phoenix” (Deeks, 2018). Each of these RAIs use slightly different metrics to gauge the recidivism risk of an individual. Although the internal designs of these algorithms are kept confidential, the public is privy to some of the inputs that are factored into the recommendations the RAIs make. In Chicago the Strategic Subject list takes into account the age of the defendant, whether they have been the victim of an assault and battery or shooting, and the defendant’s arrest and conviction records” (Deeks, 2018). The Level of Service Inventory-Revised algorithm uses a defendant’s answers to questions concerning their “criminal history, education, employment, financial problems, family or marital situation, housing, hobbies, friends, alcohol and drug use, emotional or mental health issues, and attitudes about crime and supervision” (Deeks, 2018). In contrast the popular HunchLab algorithm “primarily surveys past crimes, but also digs into dozens of other factors like population density; census data; the locations of bars, churches, schools, and transportation hubs; schedules for home games—even moon phases” (Deeks, 2018). On face value many of these inputs seem intrusive and some even irrelevant. Digging deeper into the societal determinants of these factors reveals how little control many defendants have over these descriptors that are used to determine the course of their lives,

5. Results

Results on how effectively RAIs predict a defendant’s risk vary. A 2012-2014 study in Virginia showed simultaneous reductions in both pretrial incarceration rates and pretrial misconduct which would be an optimal result with fewer defendants in jail and less misconduct incidents for those who were released (Danner, VanNostrand, Spruance, 2016). A North Carolina study on the implementation of the PSA system found only a reduction in pretrial incarceration rates while pretrial misconduct rates went unchanged (Redcross, Henderson, Miratrix, Valentine, 2019). However, a third study in Kentucky found little reduction in incarceration rates in the 2009-2016 period after RAIs were put into use (Stevenson, 2018). Interestingly this same study found that “a judge’s use of an RAI did not unevenly impact outcomes across race groups” (Stevenson, 2018), meaning that racial disparities in bail decisions were not exacerbated after the state mandated the use of RAIs, but this disparity in the bails received by minority defendants did not decrease either. Important context for these results is the largely white and rural demographic makeup of Kentucky and the procedure that Kentucky judges are informed of the details of the case over the phone and often do not know the race of the defendant. So while some of these results seem to indicate that RAIs can both decrease pretrial misconduct and the rate of pretrial incarceration, they do not provide any evidence for the assertion that machine learning algorithms reduce the structural biases that plague the criminal justice system and lead to mass incarceration.

6. Ethical Concerns

While high levels of accuracy point to some of these algorithms being fairly accurate predictors of recidivism in the real-world, strong prediction capabilities do not indicate whether use of the algorithm to decide protocols in the real world is ethically justified. Brookings Institute identifies four main areas of concern that critics have used to argue against RAIs, “their lack of individualization, absence of transparency under trade-secret claims, possibility of bias, and questions of their true impact”(Brookings, 2020), all of which are examined in this article. An ethical framework for evaluating RAI algorithms is presented by the Partnership on Artificial Intelligence in the form of four baselines. “Do risk assessment tools achieve absolute fairness? Are risk assessment tools as fair as they can possibly be based on available datasets? Are risk assessment tools an improvement over current processes and human decision-makers? Are risk assessment tools an improvement over other possible reforms to the criminal justice system?”(PAI). On the whole, evidence in this paper indicates the appropriate answer to all these questions are no, but posing these questions can situate the comparative help and harm RAI algorithms can have on the criminal justice system in the United States.

6a. Individuality

The lack of individuality that comes with a decision from an algorithm was the main argument used in the most influential legal case regarding the use of risk assessment algorithms, the 2016 case Loomis v. Wisconsin. Loomis contended that the use of the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) tool meant his sentencing was not individualized, but rather “informed by historical group tendencies for misconduct, as assessed by COMPAS” (Brookings, 2020). The COMPAS tool does not actually use machine learning, instead opting for a hard-coded algorithm using behavioral and psychological constructs, but its inputs and usage are similar to other RAIs that do use machine learning techniques. Although Loomis lost his case, the Brookings Institute makes an important note that “both humans and algorithms learn from historical behavior […] a risk prediction for a given individual—whether from a judge or an RAI—is, as a result, anchored in the historical behavior of similar individuals’’ (Brookings, 2020). This assertion begs the question of whether it is ethical and allowable to make legal decisions for an individual based on the behavior of others in their group. The Partnership on Artificial Intelligence notes that “making predictions about individuals from group-level data is known as the ecological fallacy […] although risk assessment tools use data about an individual as inputs, the relationship between these inputs and the predicted outcome is determined by patterns in training data about other people’s behavior”(PAI).

While not a risk assessment tool used by the courts, the Beware software used for policing illustrates the potential danger of this fallacy. Based on the concept of “negative social networks” encouraging criminal activity, Beware assigns threat scores and levels to any given person, area, or address that a police officer enters (Deeks, 2018). It is controversial in the extreme because “it focuses police attention on individuals who may not have actually committed an offense”(Deeks, 2018) or even been accused of an offense based on historical data about their group.

6b. Transparency

In Predictive Algorithms and Criminal Sentencing Christin attributes the lack of transparency to three main sources. First, the risk-assessment algorithms used by courts and police departments are proprietary products created by private companies such as Palantir and Northpointe (now Equivant). In order to protect their product from imitation, these companies do not reveal specifications on the design, input types, and training data. Having this vital information concealed from not only the defendant, but also the judge using the output is an obvious impingement to a fair court proceeding. The second source Christin identified was the barrier to understanding the code for the layman. Even if the source code for an algorithm was publicly available, how that code worked to create a decision is not easily accessible for most defendants. Furthermore, Christin contends that the objectivity of the decision maker is compromised by this lack of understanding of the operation of the algorithm as they tend to take the result as a hard fact. An important distinction to make is that all of these results are predictions, not statistics. Finally, because these algorithms are neural networks and not hard-coded conditions, even the programmers of the algorithms cannot determine exactly why an algorithm made the decision it did. This is known as black boxing, where “important social, political, and ethical questions about sentencing decisions are not asked, because no one knows how the algorithm works” (Christin, 2019).

6c. Bias

Many of these algorithms rely on historical recidivism data to inform their decisions. The problem with historical data, specifically as it pertains to demographic groups, is how it has been influenced by unequal institutional systems. When the inputs to an algorithm are biased the outputs are sure to display the same biases regardless of the intent of the designers or users. Unlike in other applications of AI, issues of bias cannot easily be ascribed to the dataset used for training because the inputs to these algorithms will always be biased. Instead many critics rightfully question whether these algorithms should be developed in the first place because of this inherent shortcoming. Although designed with a primary goal of reducing human bias, “algorithms tend to reinforce social and racial inequalities instead of reducing them” (Christin, 2019).

Not much data has been collected on how RAIs impact bias in the judicial system. The most notable was an incendiary 2016 study by ProPublica on how COMPAS predictions compared to actual recidivism data taken 2 years later in Florida. The article’s findings were that COMPAS showed racial bias because it had more false positives for Black individuals than for white individuals. It should be noted that this result did not account for the “differences in underlying offense rates for each race”(Brookings, 2020). When examining “whether individuals with the same risk score re-offend at the same rate, regardless of race—evidence of racial discrimination disappears”(Brookings, 2020). Even if COMPAS and other RAIs don’t increase discriminatory practices in the criminal justice system, there isn’t evidence to suggest they decrease discrimination and bias like they are supposedly designed to.

6d. Human Oversight

The need for human oversight of these algorithms is of primary concern when looking for bias in the real-world judicial system. Most people concede that “policymakers should preserve human oversight and careful discretion when implementing machine learning algorithms […] it is always possible that unusual factors could affect an individual’s likelihood of misconduct [so] a judge must retain the ability to overrule an RAI’s recommendations”(Brookings, 2020). While this helps to rectify the lack of individuality covered earlier in this paper, it introduces a large avenue for the judge or prosecutor’s personal bias. If it is at the discretion of the judge whether or not to follow the RAI’s recommendation, it can no longer be claimed that human biases don’t impact the result for the defendant.

7. Conclusion

This paper presented a number of ethical critiques regarding the use of risk assessment algorithms in the criminal justice system. Most of these concerns still lead to unresolved questions that society needs to tackle. Is it possible to create neural networks that resist the unjust biases of modern society? Using the evidence cited in this paper, it seems highly unlikely data-driven algorithms will ever be unencumbered by societal bias because all of their training data is biased. Algorithms not informed by data are also liable to the biases of their creators for what factors contribute the most risk. These arguments show that an algorithm will always have bias, the best that can be hoped for is a democratically determined bias through some form of voting (which in itself has shortcomings in disproportionate representation via voter turnout and restrictive legislation). This conclusion begs the question of whether algorithms even have a place in determining justice, or are these decisions best left to human minds capable of empathy and compassion despite their biases?

Sources

Chohlas-Wood, Alex. “Understanding Risk Assessment Instruments in Criminal Justice,” 19 2020.

Christin, Angele. “Predictive Algorithms and Criminal Sentencing.” The Decisionist Imagination, Berghahn Books, pp. 272–94.

Danner, M. J., VanNostrand, M., & Spruance, L. M. (2016). Race and gender neutral pretrial risk assessment, release recommendations, and supervision: VPRAI and PRAXIS revised. Luminosity.

Deeks, Ashley. “Predicting Enemies.” Virginia Law Review, vol. 104, no. 8.

Loomis v. Wisconsin, 881 N.W.2d 749 (Wis. 2016), cert. denied, 137 S.Ct. 2290 (2017)

Partnership on AI. “Report on the Algorithmic Risk Assessment Tools in the U.S Criminal Justice System.” Accessed May 6, 2021.

Redcross, C., Henderson, B., Miratrix, L., & Valentine, E. (2019). Evaluation of pretrial justice system reforms that use the Public Safety Assessment. MDRC Center for Criminal Justice Research.

Stevenson, M. (2018). Assessing risk assessment in action. Minn. L. Rev. 103, 303.