Predicting Arrest Release Outcomes: A Comparative Analysis of Machine Learning Models

Section: Article
Published
Oct 1, 2025
Pages
62-73

Abstract

This comparative study evaluates machine learning models for predicting arrest release outcomes using 5,226 marijuana possession cases from the Toronto Police Service (1997-2002). The dataset exhibited significant class imbalance, with only 17.1% detention outcomes versus 82.9% releases. After preprocessing to handle missing values and convert categorical variables, we implemented two modeling approaches: a 500-tree Random Forest classifier with feature importance measurement and a binomial Logistic Regression model. Both algorithms demonstrated strong predictive capability for release cases, achieving comparable overall accuracy (83.2-83.4%) and excellent sensitivity (>98%), though they struggled with the critical minority class as evidenced by poor specificity (<7%). The models showed similar discriminative power, with Logistic Regression achieving a marginally higher AUC-ROC (0.733 vs 0.726). Feature importance analysis identified employment status and prior police background checks as the strongest predictors, while demographic factors, including race, also contributed significantly to predictions. These results highlight both the technical challenges of imbalanced classification in justice system data and the ethical considerations surrounding potential algorithmic bias, particularly given the high false positive rate for detention predictions that could exacerbate existing disparities. The study underscores the need for careful model evaluation and responsible implementation when applying predictive analytics to sensitive criminal justice decisions, balancing statistical performance with considerations of fairness and social impact.

References

  1. Allison, P. D. (2001). Missing data. Sage Publications.
  2. Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
  3. Berk, R., Heidari, H., Jabbari, S., Kearns, M., & Roth, A. (2021). Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, 50(1), 3-44. https://doi.org/10.1177/0049124118782533
  4. Berk, R., Heidari, H., Jabbari, S., Kearns, M., & Roth, A. (2018). Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, 50(1), 3-44.
  5. Berk, R., & Bleich, J. (2021). Statistical procedures for forecasting criminal behavior: A comparative assessment. Criminology & Public Policy, 20(2), 345-370.
  6. Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
  7. Brantingham, P. J., Valasik, M., & Mohler, G. O. (2023). Machine learning for criminology and crime research. Annual Review of Criminology, 6, 281-310. https://doi.org/10.1146/annurev-criminol-030421-041615
  8. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
  9. Breiman, L. (2023). Random Forest revisited (Posthumous reprint). Journal of Computational Criminology.
  10. Bureau of Justice Statistics (BJS). (2022). National Jail Data Dashboard. U.S. Department of Justice
  11. Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2), 153-163. https://doi.org/10.1089/big.2016.0047
  12. European Union. (2024). *Regulation (EU) 2024/... of the European Parliament and of the Council on artificial intelligence (AI Act). Official Journal of the European Union. https://eur-lex.europa.eu/eli/reg/2024/.
  13. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874. https://doi.org/10.1016/j.patrec.2005.10.010
  14. Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). Sage. https://us.sagepub.com/en-us/nam/an-r-companion-to-applied-regression/book246125
  15. Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 29, 3315-3323. https://proceedings.neurips.cc/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html
  16. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.
  17. He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284. https://doi.org/10.1109/TKDE.2008.239
  18. Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Wiley. https://doi.org/10.1002/9781118548387
  19. Kuhn, M., & Johnson, K. (2023). Feature engineering and selection: A practical approach for predictive models (2nd ed.). Chapman & Hall/CRC. https://www.routledge.com/Feature-Engineering-and-Selection-A-Practical-Approach-for-Predictive-Models/Kuhn-Johnson/p/book/9781138079229
  20. Kuhn, M., & Johnson, K. (2023). Feature engineering and selection: A practical approach for predictive models (2nd ed.). Chapman & Hall/CRC
  21. Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer.
  22. Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18-22.
  23. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.
  24. R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
  25. Richardson, R., Schultz, J. M., & Crawford, K. (2021). Dirty data, bad predictions: How civil rights violations impact police data, predictive policing systems, and justice. New York University Law Review, 94(1), 192-233. https://www.nyulawreview.org/issues/volume-94-number-1/
  26. Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., & Müller, M. (2011). pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12(1), 77. https://doi.org/10.1186/1471-2105-12-77
  27. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206-215. https://doi.org/10.1038/s42256-019-0048-x
  28. Rudin, C., Wang, C., & Coker, B. (2022). The age of secrecy and unfairness in recidivism prediction. Harvard Data Science Review, 4(1). https://doi.org/10.1162/99608f92.6ed64b30
  29. Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432
  30. Skeem, J. L., & Lowenkamp, C. T. (2016). Risk, race, and recidivism: Predictive bias and disparate impact. Criminology, 54(4), 680-712. https://doi.org/10.1111/1745-9125.12123
  31. Stone, M., Lichtenstein, S., & Fischhoff, B. (2014). How to make better forecasts and decisions: Avoid face-to-face meetings. Foresight: The International Journal of Applied Forecasting, 35, 5-9.
  32. Tufte, E. R. (2001). The visual display of quantitative information (2nd ed.). Graphics Press.
  33. van Buuren, S. (2018). Flexible imputation of missing data (2nd ed.). Chapman & Hall/CRC. https://doi.org/10.1201/9780429492259
  34. Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). Springer.
  35. Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., ... Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
  36. Xie, Y. (2015). Dynamic documents with R and knitr (2nd ed.). Chapman & Hall/CRC.
  37. Zeng, J., Ustun, B., & Rudin, C. (2017). Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A, 180(3), 689-722.
Download this PDF file

Statistics

Downloads

Download data is not yet available.

How to Cite

[1]
O. P. Adebayo, A. Ibrahim, and K. Oyeleke, “Predicting Arrest Release Outcomes: A Comparative Analysis of Machine Learning Models”, JES, vol. 34, no. 4, pp. 62–73, Oct. 2025.
Copyright and Licensing