Machine Learning in Global Development: Applying k-Means Clustering to Identify Country Groupings by Economic and Health Performance
Abstract
This study applies k-means clustering to group countries based on key economic and health indicators: Gross National Income per capita (GNIP), health expenditure, life expectancy, birth and death rates, and urbanization. The elbow method identified k = 3 as the optimal number of clusters, indicating a significant drop in within-cluster sum of squares (from 5000 to 2000). The results reveal three distinct development groupings. A small cluster of 13 high-performing countries stands out with strong economic (GNIP = 0.658) and health outcomes (Life Expectancy = 0.856), along with low birth (0.116) and death rates (0.113). This group also shows strong internal similarity (silhouette width = 0.58). The remaining countries fall into two broader clusters. The first includes 320 countries with moderate development, higher urbanization (UrbanP = 0.712), and relatively high health spending (healthE = 0.219), but lower GNIP (0.066). The second cluster of 277 countries faces greater challenges, marked by low life expectancy (0.414), high birth rates (0.670), and weak economic indicators (GNIP = 0.067). Both larger clusters show moderate cohesion (silhouette widths = 0.29 and 0.32). These findings highlight the stratified and multidimensional nature of global development, offering a data-driven framework to inform policy decisions and tailor interventions to the unique characteristics of each cluster.
References
- Athey, S., & Imbens, G. W. (2019). Machine learning methods economists should know about. Annual Review of Economics, 11, 685-725. https://doi.org/10.1257/jel.20191597
- Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: What is it and how does it work? International Journal of Methods in Psychiatric Research, 20(1), 40-49. https://doi.org/10.1002/mpr.329
- Banerjee, A. V., & Duflo, E. (2019). Good economics for hard times. . PublicAffairs. https://www.penguinrandomhouse.com/books/557013
- Bank, W. (2021). World Development Indicators 2021. World Bank Publication.https://databank.worldbank.org/source/world-development-indicators
- Bloom, D. E., Khoury, A., Kufenko, V., & Prettner, K. (2019). The macroeconomic impact of non-communicable diseases. Health Economics. 28(2), 223-226. https://doi.org/10.1002/hec.3857
- Bloom, D. E., Khoury, A., Kufenko, V., & Prettner, K. (2019). Health and economic growth. In *Global Population Health and Well-Being in the 21st Century* (pp. 25–52). Springer. https://doi.org/10.1007/978-3-030-11819-6_2
- Donoho, D. (2017). 50 years of data science., . Journal of Computational and Graphical Statistics, 26(4), 745-766. https://doi.org/10.1080/10618600.2017.1384734
- Glaeser, E. (2014). A world of cities: The causes and consequences of urbanization in poorer countries. . Journal of the European Economic Association, 12(5), 1154-1199. https://doi.org/10.1111/jeea.12100
- Guevara, Z. e. (2021). Sustainable development patterns in country clusters. Ecological Economics, 179, 106818. https://doi.org/10.1016/j.ecolecon.2020.106818
- Guevara, M. R., Hartmann, D., & Aristarán, M. (2021). Machine learning for classifying countries’ development indicators. Nature Human Behaviour, 5, 987–994. https://doi.org/10.1038/s41562-021-01122-8
- Hair, J. F. , Black, W. C., Babin, B. J., & Anderson, R. E.(2019). Multivariate data analysis (8th ed.). Cengage. https://www.pearson.com/store/p/multivariate-data-analysis/P100001672229
- Hartigan, J. A., & Wong, M. A. . (1979). Algorithm AS 136: A k-means clustering algorithm. , . Journal of the Royal Statistical Society, 28(1), 100-108. https://doi.org/10.2307/2346830
- Hartmann, D., Guevara, M. R., Jara-Figueroa, C., Aristarán, M., & Hidalgo, C. A. (2017). Linking economic complexity, institutions, and income inequality. World Development, 93, 75–93.
- https://doi.org/10.1016/j.worlddev.2017.02.006
- Hastie, T., Tibshirani, R., & Friedman, J. . (2009). The elements of statistical learning (2nd ed.). . Springer. https://hastie.su.domains/ElemStatLearn/
- Hennig, C. (2007). Cluster-wise assessment of cluster stability. . Computational Statistics & Data Analysis, 52(1), 258-271. https://doi.org/10.1016/j.csda.2006.11.025
- Hidalgo, C. A., & Hausmann, R. (2009). The building blocks of economic complexity. , . Proceedings of the National Academy of Sciences, 106(26), 10570-10575. https://doi.org/10.1073/pnas.0900943106
- Jones, G. (2020). Ultra-low fertility in East Asia. Population and Development Review, 46(3), 579-606. https://doi.org/10.1111/padr.12364
- Ketchen, D. J., & Shook, C. L. (1996). The application of cluster analysis in strategic management research. . Strategic Management Journal, 17(6), 441-458. https://doi.org/10.1177/014920639602200105
- Lee, R. (2003). The demographic transition. Journal of Economic Perspectives, 17(4), 167-190. https://doi.org/10.1257/089533003321164967
- Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory .https://doi.org/10.1109/TIT.1982.1056489
- MacQueen, J. (1967). Some methods for classification. . Proceedings of the 5th Berkeley Symposium.https://projecteuclid.org/proceedings/berkeley-symposium-on-mathematical-statistics-and-probability/Proceedings-of-the-Fifth-Berkeley-Symposium-on-Mathematical-Statistics-and/Chapter/Some-Methods-for-Classification-and-Analysis-of-Multivariate-Observations/bsmsp/1200512992
- Milanovic, B. (2016). Global inequality: A new approach for the age of globalization. Harvard University Press. ISBN: 978-0674737136
- Moretti, E. (2013). Real wage inequality. , . American Economic Journal: Applied Economics, 5(1), 65-103. https://doi.org/10.1257/app.5.1.65
- Nations, U. (2015). Transforming our world: The 2030 Agenda for Sustainable Development. https://sdgs.un.org/2030agenda
- Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226-1227. https://doi.org/10.1126/science.1213847
- Pinkovskiy, M., & Sala-i-Martin, X. (2020). Parametric estimations of the world distribution of income. (NBER Working Paper No. 26933). https://doi.org/10.1257/mac.20150313
- Preston, S. H. (2007). The changing relation between mortality and income. International Journal of Epidemiology, 36(3), 484-490. https://doi.org/10.1093/ije/dym075
- Ranis, G., Stewart, F., & Ramirez, A. (2000). Economic growth and human development. World Development, 28(2), 197-219. https://doi.org/10.1016/S0305-750X(99)00131-X
- Rodrik, D. (2008). The real exchange rate and economic growth. Brookings . Papers on Economic Activity,, 2008(2), 365-412. ISBN: 978-0691141179
- Rodrik, D. (2015). Premature deindustrialization. Journal of Economic Growth, 21(1), 1-33.https://doi.org/10.1007/s10887-015-9122-3
- Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
- Stiglitz, J. E., Sen, A., & Fitoussi, J. P. (2009). Report by the Commission on the Measurement of Economic Performance and Social Progress. https://ec.europa.eu/eurostat/web/products-statistical-books/-/KS-32-12-142
- Thorndike, R. L. (1953). Who belongs in the family? . Psychometrika. 18(4), 267 - 276 https://doi.org/10.1007/BF02289263
- UNDP. (2020). Human Development Report 2020. United Nations. http://hdr.undp.org/en/2020-report
- United Nations. (2015). Sustainable Development Goals (SDGs). https://sdgs.un.org/goals
- United Nations Population Division (2019). World Population Prospects. https://population.un.org/wpp.
- World Health Organization (WHO) (2020). Global health expenditure database. https://apps.who.int/nha/database
- World Bank. (2021). World Bank country and lending groups. https://datahelpdesk.worldbank.org/knowledgebase/articles/906519.
Identifiers
Download this PDF file
Statistics
Downloads
How to Cite
Copyright and Licensing

This work is licensed under a Creative Commons Attribution 4.0 International License.





