The 17 UN Sustainable Development Goals: Classification of Research Topics Using BERT and Logistic Regression
1Informatics, Faculty of Informatics and Engineering, Multimedia Nusantara University, Indonesia
*Author to whom correspondence should be addressed:
E-mail: eunike.endariahna@umn.ac.id (EES)
E-mail: eunike.endariahna@umn.ac.id (EES)
Received: June 16, 2025 | Revised: January 19, 2026 | Accepted: February 17, 2026 | Published: March 2026
Abstract
An academic institution with over 200 lecturers has produced more than 3,000 research articles between 2018 and 2023. Accurately classifying these research outputs according to the 17 United Nations Sustainable Development Goals (UN SDGs)—a global agenda addressing issues such as poverty, education, gender equality, clean energy, and climate action—is vital for demonstrating institutional contributions to sustainability and supporting faculty accreditation processes. Traditionally, the Research and Community Service Institute of private universities has performed this classification manually, which is inefficient and time-consuming. To address this challenge, two machine learning-based text classification systems were developed and evaluated. The model was trained on a dataset of 76,958 records. The first approach implements a Bidirectional Encoder Representations from Transformers (BERT) model, a state-of-the-art deep learning framework in Natural Language Processing. Preprocessing was performed using NLTK, and the model was fine-tuned over 4 epochs with a learning rate of 2e-5 and a batch size of 32, using a 70/30 train-test split. This model delivered superior performance, with an accuracy of 90.68%, precision of 0.99, recall of 0.82, and an F1-score of 0.87. The second approach utilizes a Logistic Regression model with TF-IDF (Term Frequency-Inverse Document Frequency) for text vectorization. This model employs the L1 penalty and the Saga solver, trained with 80% of the dataset and tested on the remaining 20%, without additional data cleaning. It achieved an accuracy of 90.01%, a precision of 0.86, recall of 0.82, and an F1-score of 0.84. Both models demonstrated strong performance, but the BERT-based model provided better precision and overall classification quality. The findings show that both models deliver strong classification performance, with the BERT-based model providing superior precision and overall quality. These systems have been presented to the university for potential adoption, offering a more efficient and consistent approach to aligning institutional research with 17 UN SDGs.
Keywords
17 UN SDG; BERT; Logistic Regression; Research Topic; Text Classification
Available Repositories
Share Article
Article Metrics
--
Views
--
Downloads
--
Citations
Export Citation
Full Text
References
- 1) M. D. Abdulrahaman, N. Faruk, A. A Oloyede, N. T Surajudeen-Bakinde, L. A Olawoyin, O. V Mejabi, Y. O Imam-Fulani, A. O Fahm, and A. L Azeez, "Multimedia Tools in the Teaching and Learning Processes: A Systematic Review," Heliyon (2020). doi: https://doi.org/ 10.1016/j.heliyon.2020.e05312
- 2) I. Mergel, N. Edelmann, and N. Haug, "Defining digital transformation: Results from expert interviews," Government Information Quarterly, vol. 36, no. 4, p. 101385 (2019) doi:10.1016/j.giq.2019.06.002
- 3) A. Qingyao, T. Bai, Z. Cao, Y. Chang, and J. Chen. "Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community," AI Open volume 4 80-90 4 (2023) doi:10.1016/j.aiopen.2023.08.001
- 4) U. Hanani, B. Shapira, and P. Shoval. "Information Filtering: Overview of Issues, Research and Systems." User Model. User-Adapted Interaction 11 203-259 (2001) doi:10.1023/A:1011196000674
- 5) M. Chankseliani, and T. McCowan, "Higher education and the Sustainable Development Goals," Higher Education, vol. 81 no. 1 pp. 1-8 (2021) doi:10.1007/s10734-020-00652-w
- 6) "Take Action for the Sustainable Development Goals - United Nations Sustainable Development." https://www.un.org/sustainabledevelopment/sustainable-development-goals/ (accessed December 25, 2024)
- 7) J. Liu, W. C. Chang, Y. Wu, and Y. Yang, "Deep learning for extreme multilabel text classification," Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval pp 115-124 (2019) doi:10.1145/3077136.3080780
- 8) S. R. Medina, "Multi-Label Text Classification with Transfer Learning for Policy Documents The Case of the Sustainable Development Goals," Uppsala Universitet Publications. (2019). https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-395186
- 9) E. C. Garrido-Merchan, R. Gozalo-Brizuela, and S. Gonzalez-Carvajal, "Comparing BERT Against Traditional Machine Learning Models in Text Classification," Journal of Computational and Cognitive Engineering, vol. 2, no. 4, pp. 352-356, (2023) doi:10.47852/bonviewJCCE3202838
- 10) E. E. Surbakti, B. Purwandari, I. Solichah, and L. Kumaralalita, "Analysis of software development method selection: A case of a private financial institution," ACM Int. Conf. Proceeding Ser., pp. 168-173 (2019) doi:10.1145/3361785.3361806
- 11) C. Maximiliano, E. E. Surbakti, R. Winantyo, Jantianus, M. Vasty, and W. Istiono, "Long Short-Term Memory Networks for Predicting LQ45 Index Trends in Infrastructure Stocks," 2024 9th IEEE Int. Conf. Adv. Robot. Mechatronics, pp. 57-61 (2024) doi:10.1109/ICARM62033.2024.10715962
- 12) K. Ravikant, "Text-classify: A comprehensive comparative study of logistic regression, random forest, and knn models for enhanced text classification performance," International Journal of Advances in Engineering & Technology (2023) doi:10.5281/zenodo.10148008
- 13) S. A. Bahtiar, C. K. Dewa, and A. Luthfi, "Comparison of naıve bayes and logistic regression in sentiment analysis on marketplace reviews using rating-based labeling," Journal of Information Systems and Informatics vol. 5 no 3 pp 915-927 (2023) doi:10.51519/journalisi.v5i3.539
- 14) A. Hajikhani and A. Suominen, "Mapping the sustainable development goals (sdgs) in science, technology and innovation: application of machine learning in SDG-oriented artefact detection," Scientometrics vol. 127 pp. 6661-6693 11 (2022) doi:10.1007/s11192-022-04358-x
- 15) P. Yadav, I. Kashyap, and B. S. Bhati, "Impact of Double Negation through Majority Voting of Machine Learning Algorithms," Joint Journal of Novel Carbon Resource Sciences & Green Asia Strategy Vol 11 Issue 01, pp331-342 (2024) doi:10.5109/7172289
- 16) U. Gurnani, S. K. Singh, M. K. Sain, and M. L. Meena, "Musculoskeletal Health Problems and their Association with Risk Factors among Manual Dairy Farm Workers," Joint Journal of Novel Carbon Resource Sciences & Green Asia Strategy vol. 9 no. 4 pp 950-961(2022) doi:10.5109/6622881
- 17) X. Zhou, R. Gururajan, Y. Li, R. Venkataraman, X. Tao, G. Bargshady, P. D. Barua, and S. Kondalsamy-Chennakesavan, "A survey on text classification and its applications," Web Intelligence vol. 18 no. 3 pp 205-216(2020) doi:10.3233/WEB-200442
- 18) X. Luo, "Efficient english text classification using selected machine learning techniques," Hunan University of Technology and Business (2021) doi:10.1016/j.aej.2021.02.009
- 19) A. Gasparetto, M. Marcuzzo, A. Zangari, and A. Albarelli, "Survey on text classification algorithms: From text to predictions," Information Switzerland, vol. 13 no 2 (2022) doi:10.3390/info13020083
- 20) J. Grace, A. Maneengam, P. Kumar, and J Alanya-beltran. "Design and Implementation of Machine Learning Modelling through Adaptive Hybrid Swarm Optimization Techniques for Machine Management," Joint Journal of Novel Carbon Resource Sciences & Green Asia Strategy Vol 10 Issue 02 pp 1120-1126 (2023) doi:10.5109/6793672
- 21) A.S. Talaat, "Sentiment analysis classification system using hybrid BERT models", J Big Data 10 110 (2023) doi:10.1186/s40537-023-00781-w
- 22) A. F, Bangi, "Exploring the Role of Transformers in NLP: From BERT to GPT-3," International Research Journal of Engineering and Technology, pp 243-251, (2023)
- 23) E. C. Garrido-Merchan, R. Gozalo-Brizuela, and S. Gonzalez-Carvajal, "Comparing BERT against Traditional Machine Learning Models in Text Classification," Journal of Computational and Cognitive Engineering no. l (2023) doi:10.47852/bonviewJCCE3202838
- 24) N. D. Robert, C.Y Peng, R, Khavari, " HHS Public Access," Physiology behavior vol. 176 no. 3 pp. 139-148(2019) doi:10.1002/jnr.23963
- 25) "Natural language processing (NLP): What it is and why it matters." https://www.sas.com/en us/insights/analytics/what-is-natural-language-processing-nlp.html (accessed January 14, 2025)
- 26) "Using natural language processing to improve everyday life u-m information and technology services." https://its.umich.edu/news/ article/using-natural-language-processing-improve-everyday-life (accessed December 29, 2024)
- 27) H. Zhang and M. O. Shafiq, "Survey of transformers and towards ensemble learning using transformers for natural language processing," Journal Big Data 11 25 (2024) doi:10.1186/s40537-023-00842-0
- 28) "What is the BERT language model? | definition from TechTarget.com."https://www.techtarget.com/ searchenterpriseai/definition/BERT-language-model. (accessed December 29, 2024)
- 29) "Global Strategic Institute for Sustainable Development — Department of Economic and Social Affairs." https://sdgs.un.org/ partnerships/global-strategic-institute-sustainable-development. (accessed December 22, 2024)
- 30) R. Yao, M. Tian, C.U. Lei, and D. K. W. Ciu, "Assigning multiple labels of sustainable development goals to open educational resources for sustainability education," Educ Inf Technol 29 18477-18499 (2024) doi:10.1007/s10639-024-12566-6
- 31) D. F. Hsu, M. T. LaFleur, and I. Orazbek, "Improving SDG classification precision using combinatorial fusion," Sensors vol. 22 no. 3 art no 1067 (2022) doi:10.3390/s22031067
- 32) S. Xu, "Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets," Journal of data and information Science vol. 9 no 2 pp. 81-103(2024) doi:10.2478/jdis-2024-0014
- 33) A. Das, "Logistic Regression," Springer International Publishing pp. 1-2. 1689-2 doi:10.1007/978-3-319-69909-7
- 34) M. Adi Widyatmika and N. B. Bolia, "Unveiling Segregation and Composting Behavior in Urban Communities: A Study Case of Sarbagita Municipality, Indonesia," Joint Journal of Novel Carbon Resource Sciences & Green Asia Strategy Vol 11 Issue 02 pp 612-623 (2024) doi:10.5109/7183356
- 35) A. Subasi, "Chapter 3- Machine learning techniques," Academic Press pp. 91-202. 2020. https://www.sciencedirect.com/science/ article/pii/B9780128213797000035
- 36) "Text classification. using logistic regression,". https://medium.com/@ashins1997/ text-classification-dfe370bf7044. (accessed December 20, 2024)
Other Papers in This Issue
- Modification of the Complex Proportional Assessment Method: A New Methodology for Decision Support
D. Megawaty et al. (2026) - Coati Optimization based ANFIS MPPT for PV-Battery Integrated System to Improve Power Quality
N. Pandey, R. Pachauri (2026) - Forward and Inverse Kinematics analysis of the ABB IRB 6700 Industrial Robot
S. Chauhan, N. Gupta, A. Mishra (2026) - Hybrid ANN–GA and Machine Learning Approaches for Surface Roughness Prediction in CNC Step Turning of Aluminium Alloy
D. Kumar, C. Kirpalani (2026) - Design and Development of PSO-Firefly Hybrid Optimizer–CNN Model for Lung Disease Classification using Chest X-Ray Images
T. Dhiman, P. Kumar (2026) - Heat Transfer Performance Evaluation of Common Flow-Down Rectangular Winglet Vortex Generator in Solar PV Cooling System
S. Putra, D. Tjahjana, I. Yaningsih (2026) - Optimization of Unidirectional Carbon/Epoxy Facesheets for Enhanced Flexural Strength in PVC Foam Sandwich Beam
J. Havaldar et al. (2026) - Experimental Investigation and Characterization Studies on Coconut Fibre Reinforced Bacterial Concrete Using Bacillus Subtilis
Y. Mayilsamy et al. (2026) - Investigating the Impact of Portable Humidifier on Coefficient of Performance (COP) and Power Consumption of Non-Inverter Split Unit Air Conditioner in Malaysian Climate
B. Muhamad et al. (2026) - Evaluating the energy/exergy efficiency of utilizing cold energy from LNG regasification for cooling and power generation
H. Huynh (2026) - Evaluation of Sphygmomanometer Dial Performance Across Variable Temperatures and Pressure Conditions
W. Ardiatna et al. (2026) - Optimization of Surface Roughness and Diameter Error in Thin-Walled AA6063 during Internal Turning under Minimum Quantity Lubrication
A. Rianto et al. (2026) - Development and Evaluation of a Portable Dilution-Based Gas Mixer System for On-Site Calibration of Low-Cost Sensors in Ambient Air Monitoring
R. Samodro et al. (2026) - Development of a Formula for Predicting Average Surface Heat Transfer Coefficient of Cylindrical Foods
V. DANG (2026) - Evaluation on the cooling capacity of a cascade cold storage refrigeration system using refrigerant pair R513A/R744
V. Le et al. (2026) - The Impact of Ultrasound-Assisted Freezing on Energy Consumption and Freezing Time of White Shrimp and Striped Catfish
N. Bao, N. Tin (2026)









Creative Commons Attribution 4.0 International
