277
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

Judges on the Benchmark: Developing a Sentencing Feedback System

&
Pages 1-37 | Received 07 Feb 2022, Accepted 25 Feb 2023, Published online: 07 Sep 2023
 

Abstract

Judges receive limited information on how their sentencing practices contribute to inter-judge sentencing disparities which can undermine equity and the perceptions of legitimacy. We use doubly robust, internal benchmarking to measure the effect of each judge on sentencing outcomes relative to a set of cases that are handled by the judge’s peers and that are statistically similar on all observable case features. With the benchmarks, we can flag judges with extreme sentencing habits and link those sentencing habits with their discretionary decisions. Judges with the highest propensity in using custodial sentences were 22 percentage points more likely to impose an incarceration sentence and 5 percentage points more likely to use a prison sentence compared to their peers’ handling of similar cases. States can adopt this approach to provide feedback throughout a judge’s tenure to move judges that contribute most to disparities to have sentencing practices more similar to their peers.

Acknowledgments

We thank the Pennsylvania Commission on Sentencing and Mark Bergstrom for sharing the data used in this project and providing additional context on the data.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1 Pennsylvania’s criminal procedure laws eliminate “the absolute prohibition against any judicial involvement in plea discussions in order to align the rule with the realities of current practice” (Pennsylvania Code, Citation2022). In addition, defendants can enter an open plea (i.e., the defendant enters a plea and the judges determine the sentence) when the prosecutors’ plea offer is more punitive than the judge’s sentencing preference. These two attributes suggest that sentencing decisions can be influenced by judges.

2 Life sentences do not have a specified range. We use 360 months as it represents the midpoint between two upper boundary sentences for attempted murder (20 years for attempted murder without non-serious bodily injury and 40 years for attempted murder).

3 Individuals interested in applying this benchmarking process can use the R package “fastdr”: https://github.com/gregridgeway/fastDR. The user specifies the dependent variables, the covariates, and the treatment variable. For benchmarking, the treatment variable will be a (0/1) indicator for the index judge or the actor of interest. The doubly robust estimation will estimate the difference between the index judge and his or her benchmark. The user will then repeat the process for every actor. From an implementation standpoint, benchmarking is equivalent to running propensity score weighting and then assessing the effect of the treatment (or actor) multiple times for each actor of interest.”

4 For categorical case features we compare the percentage of cases with that feature for the index judge and the comparison cases and compute the difference. For continuous case features we use the analogous Kolmogorov-Smirnov test statistic, which computes the largest percentage point difference in the cumulative distribution functions for the index judge’s cases and the comparison cases. We do not benchmark 54 judges who had percentage point differences or KS statistics exceeding 0.05. These judges account for 2.5% (n = 11,259) of cases (2.5%) in the analysis.

5 Doubly robust estimates use the propensity score weights as sampling weights in a regression model that includes the potential confounders in order to estimate the judge effect. This approach protects against model misspecification and provides a consistent estimate of the treatment effect with a correctly specified propensity score model or correctly specified outcomes regression model (Bang & Robins, Citation2005, 2008; Ho et al., Citation2007). These regression models produce doubly-robust z-scores measuring how much the index judge deviates from their benchmark on the outcome.

6 The robust regression uses a M-estimator with the default Huber psi function which downweighs outliers so that the coefficients are not driven by outlier judges.

7 We run the same analysis using the fine amount in place of an (0/1) for the use of fines. Replacing the use of fines with the fine amount makes no difference.

8 The Pennsylvania Bar Association provides an evaluation for any individual who is being considered for election, retention, or appointment on ten factors: sufficient legal ability, the amount of trial or comparable experience to ensure knowledge of the rules, a record and reputation for excellent character and integrity, financial responsibility judicial temperament, mental and physical capacity, record of community involvement, administrative ability, devotion to the improvement of the quality of justice, and the demonstration of sound judgement in one’s professional life.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 386.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.