Predicting Arrest Release Outcomes: A Comparative Analysis of Machine Learning Models

Section: Article

Issue

Vol. 34 No. 4 (2025): Volume 34 Issue 4

Published

Oct 1, 2025

Pages

62-73

Abstract

This comparative study evaluates machine learning models for predicting arrest release outcomes using 5,226 marijuana possession cases from the Toronto Police Service (1997-2002). The dataset exhibited significant class imbalance, with only 17.1% detention outcomes versus 82.9% releases. After preprocessing to handle missing values and convert categorical variables, we implemented two modeling approaches: a 500-tree Random Forest classifier with feature importance measurement and a binomial Logistic Regression model. Both algorithms demonstrated strong predictive capability for release cases, achieving comparable overall accuracy (83.2-83.4%) and excellent sensitivity (>98%), though they struggled with the critical minority class as evidenced by poor specificity (<7%). The models showed similar discriminative power, with Logistic Regression achieving a marginally higher AUC-ROC (0.733 vs 0.726). Feature importance analysis identified employment status and prior police background checks as the strongest predictors, while demographic factors, including race, also contributed significantly to predictions. These results highlight both the technical challenges of imbalanced classification in justice system data and the ethical considerations surrounding potential algorithmic bias, particularly given the high false positive rate for detention predictions that could exacerbate existing disparities. The study underscores the need for careful model evaluation and responsible implementation when applying predictive analytics to sensitive criminal justice decisions, balancing statistical performance with considerations of fairness and social impact.

References

Allison, P. D. (2001). Missing data. Sage Publications.
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Berk, R., Heidari, H., Jabbari, S., Kearns, M., & Roth, A. (2021). Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, 50(1), 3-44. https://doi.org/10.1177/0049124118782533
Berk, R., Heidari, H., Jabbari, S., Kearns, M., & Roth, A. (2018). Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, 50(1), 3-44.
Berk, R., & Bleich, J. (2021). Statistical procedures for forecasting criminal behavior: A comparative assessment. Criminology & Public Policy, 20(2), 345-370.
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Brantingham, P. J., Valasik, M., & Mohler, G. O. (2023). Machine learning for criminology and crime research. Annual Review of Criminology, 6, 281-310. https://doi.org/10.1146/annurev-criminol-030421-041615
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Breiman, L. (2023). Random Forest revisited (Posthumous reprint). Journal of Computational Criminology.
Bureau of Justice Statistics (BJS). (2022). National Jail Data Dashboard. U.S. Department of Justice
Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2), 153-163. https://doi.org/10.1089/big.2016.0047
European Union. (2024). *Regulation (EU) 2024/... of the European Parliament and of the Council on artificial intelligence (AI Act). Official Journal of the European Union. https://eur-lex.europa.eu/eli/reg/2024/.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874. https://doi.org/10.1016/j.patrec.2005.10.010
Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). Sage. https://us.sagepub.com/en-us/nam/an-r-companion-to-applied-regression/book246125
Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 29, 3315-3323. https://proceedings.neurips.cc/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284. https://doi.org/10.1109/TKDE.2008.239
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Wiley. https://doi.org/10.1002/9781118548387
Kuhn, M., & Johnson, K. (2023). Feature engineering and selection: A practical approach for predictive models (2nd ed.). Chapman & Hall/CRC. https://www.routledge.com/Feature-Engineering-and-Selection-A-Practical-Approach-for-Predictive-Models/Kuhn-Johnson/p/book/9781138079229
Kuhn, M., & Johnson, K. (2023). Feature engineering and selection: A practical approach for predictive models (2nd ed.). Chapman & Hall/CRC
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer.
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18-22.
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.
R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Richardson, R., Schultz, J. M., & Crawford, K. (2021). Dirty data, bad predictions: How civil rights violations impact police data, predictive policing systems, and justice. New York University Law Review, 94(1), 192-233. https://www.nyulawreview.org/issues/volume-94-number-1/
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., & Müller, M. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12(1), 77. https://doi.org/10.1186/1471-2105-12-77
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206-215. https://doi.org/10.1038/s42256-019-0048-x
Rudin, C., Wang, C., & Coker, B. (2022). The age of secrecy and unfairness in recidivism prediction. Harvard Data Science Review, 4(1). https://doi.org/10.1162/99608f92.6ed64b30
Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432
Skeem, J. L., & Lowenkamp, C. T. (2016). Risk, race, and recidivism: Predictive bias and disparate impact. Criminology, 54(4), 680-712. https://doi.org/10.1111/1745-9125.12123
Stone, M., Lichtenstein, S., & Fischhoff, B. (2014). How to make better forecasts and decisions: Avoid face-to-face meetings. Foresight: The International Journal of Applied Forecasting, 35, 5-9.
Tufte, E. R. (2001). The visual display of quantitative information (2nd ed.). Graphics Press.
van Buuren, S. (2018). Flexible imputation of missing data (2nd ed.). Chapman & Hall/CRC. https://doi.org/10.1201/9780429492259
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). Springer.
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., ... Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Xie, Y. (2015). Dynamic documents with R and knitr (2nd ed.). Chapman & Hall/CRC.
Zeng, J., Ustun, B., & Rudin, C. (2017). Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A, 180(3), 689-722.

Authors

O. P. Adebayo

Department of Statistics, Phoenix University Agwada, Nasarawa State, Nigeria

Ahmed Ibrahim

Department of Statistics, Nasarawa State University Keffi, Nasarawa State, Nigeria

K.T. Oyeleke

Department of Statistics, Olabisi Onabanjo University Ago Iwoye, Ogun State, Nigeria

Identifiers

https://doi.org/10.33899/jes.v34i4.49670

Download this PDF file

PDF

Statistics

Downloads

Download data is not yet available.

How to Cite

[1]

O. P. Adebayo, A. Ibrahim, and K. Oyeleke, “Predicting Arrest Release Outcomes: A Comparative Analysis of Machine Learning Models”, JES, vol. 34, no. 4, pp. 62–73, Oct. 2025.

Copyright and Licensing

This work is licensed under a Creative Commons Attribution 4.0 International License.