A Hybrid Weight based Feature Selection Algorithm for Predicting Students’ Academic Advancement by Employing Data Science Approaches

Автор: Ujwal U.J., Saleem Malik

Журнал: International Journal of Education and Management Engineering @ijeme

Статья в выпуске: 5 vol.13, 2023 года.

Бесплатный доступ

PerformanceX is a proposed system that combines Educational Data Mining (EDM) techniques to enhance student performance and reduce dropout rates. It employs a hybrid feature selection approach to identify the most significant attributes from student academic datasets, eliminating unnecessary features that are not crucial for predicting performance. The selectX algorithm, a critical component of PerformanceX, selects a limited number of high-performing features to optimize student learning effectiveness and prediction accuracy. The system applies various machine learning classifiers, including a fusion Voting Classifier, to different subsets of features, ultimately determining the best combination. The study achieved an impressive accuracy rate of 99.41%, with the selectX approach utilizing 10 features in conjunction with a random forest (RF) classifier offering the highest accuracy. These findings underscore the importance of categorizing student performance based on a concise yet meaningful set of features, leading to improved student quality and career progression. The research value of PerformanceX lies in the development of a performance forecasting system that eliminates irrelevant information and provides precise predictions for student performance. Its efficacy and efficiency make it an invaluable tool for educators and educational institutions. By assisting students in selecting appropriate courses to enhance their performance and advance their careers, PerformanceX contributes to diminishing dropout rates while fostering positive student outcomes.

Еще

Educational Data Mining, Feature selection, Data Science

Короткий адрес: https://sciup.org/15018673

IDR: 15018673   |   DOI: 10.5815/ijeme.2023.05.01

Список литературы A Hybrid Weight based Feature Selection Algorithm for Predicting Students’ Academic Advancement by Employing Data Science Approaches

  • Badugu S, Rachakatla B.. Student’s performance prediction using machine learning approach. In Data engineering and communication technology. Singapore: Springer; 2020. p. 333–340.
  • Ahmed, A.B.E.D., Elaraby, I.S.: Data mining: a prediction for student’s performance using classification method. World J. Comput. Appl. Technol. 2(2), 43–47 (2014).
  • Hooshyar D, Pedaste M, Yang Y. Mining educational data to predict students’ performance through procrastination behavior. Entropy . 2020;22(1):12.
  • Zulfiker MS, Kabir N, Biswas AA, et al. Predicting students’ performance of the private universities of Bangladesh using machine learning approaches. International Journal of Advanced Computer Science and Applications. 2020;11(3):672–679.
  • Tatar AE, Düştegör D. Prediction of academic performance at undergraduate graduation: course grades or grade point average? Applied Sciences. 2020;10(14): 4967.
  • Gajwani J, Chakraborty P. Students’ performance prediction using feature selection and supervised machine learning algorithms. In International Conference on Innovative Computing and Communications (pp. 347- 354). Springer, Singapore; 2021.
  • Ajibade SSM, Ahmad NB, Shamsuddin SM. A heuristic feature selection algorithm to evaluate the academic performance of students. In 2019 IEEE 10th Control and System Graduate Research Colloquium (ICSGRC) (pp. 110-114). IEEE; 2019, August.
  • Ahmed MR, Tahid STI, Mitu NA, et al. A comprehensive analysis on undergraduate student academic performance using feature selection techniques on classification algorithms. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-6). IEEE; 2020, July.
  • Dutt A, Ismail MA, Herawan T. A systematic review on educational datamining. Ieee Access. 2017;5:15991– 16005.
  • Saqr M, Fors U, Tedre M. How the study of online collaborative learning can guide teachers and predict students’ performance in a medical course. BMC Med Educ. 2018;18(1):1–14.
  • Ahmed NS, Sadiq MH. Clarify of the random forest algorithm in an educational field. In 2018 international conference on advanced science and engineering (ICOASE) (pp. 179-184). IEEE; 2018, October.
  • AljohaniNR, FayoumiA,Hassan SU. Predicting at-risk students using clickstream data in the virtual learning environment. Sustainability. 2019;11(24):7238.
  • Buenaño-Fernández D, Gil D, Luján-Mora S. Application of machine learning in predicting performance for computer engineering students: A case study. Sustainability. 2019;11(10):2833.
  • Abdullah Saeed Ghareb, Azuraliza Abu Bakar, Abdul Razak Hamdan, "Hybrid feature selection based on enhanced genetic algorithm for text categorization," Expert Systems with Applications, vol. 49, pp. 31-47, 2016.
  • Miguéis VL, Freitas A, Garcia PJ, et al. Early segmentation of students according to their academic performance: a predictivemodelling approach. Decis Support Syst. 2018;115:36–51.
  • Costa, E. B., Fonseca, B., Santana, M. A., de Araujo, F. F., & Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256.
  • Ofori F, Maina E, Gitonga R. Using machine learning algorithms to predict studentsâeTM performance and improve learning outcome: a literature based review.Journal of Information and Technology. 2020;4(1):33–55.
  • Amra IAA, Maghari AY. Students performance prediction using KNN and Naïve Bayesian. In 2017 8th InternationalConference on Information Technology (ICIT) (pp. 909-913). IEEE; 2017, May.
  • Sekeroglu B, Dimililer K, Tuncal K. Student performance prediction and classification using machine learning algorithms. In Proceedings of the 2019 8th International Conference on Educational and Information Technology (pp. 7-11); 2019, March.
  • Zaffar M, Savita KS, Hashmani MA, et al. A study of feature selection algorithms for predicting student’s academic performance. Int. J. Adv. Comput. Sci. Appl. 2018;9(5):541–549.
  • Gajwani J, Chakraborty P. Students’ performance prediction using feature selection and supervised machine learning algorithms. In International Conference on Innovative Computing and Communications (pp. 347- 354). Springer, Singapore; 2021.
  • Zohair LMA. Prediction of student’s performance by modelling small dataset size. International Journal of Educational Technology in Higher Education. 2019;16(1):1–18.
  • Y Paul, A., Mukherjee, D.P., Das, P., Gangopadhyay, A., Chintha, A.R., Kundu, S.: Improved random forest for classification. IEEE Trans. Image Process. 27(8), 4012–4024 (2018)
  • Alyahyan E, Düştegör D. Predicting academic success in higher education: literature review and best practices. International Journal of Educational Technology in Higher Education. 2020;17(1):3.
  • Al-Shehri H, Al-Qarni A, Al-Saati L, et al. Student performance prediction using support vector machine and k-nearest neighbor. In 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE) (pp. 1-4). IEEE; 2017, April.
  • Naseer M, Zhang W, Zhu W. Early prediction of a team performance in the initial assessment phases of a software project for sustainable software engineering education. Sustainability. 2020;12(11):4663.
  • Tatar AE, Düştegör D. Prediction of academic performance at undergraduate graduation: course grades or grade point average? Applied Sciences. 2020;10(14): 4967.
  • Bujang SDA, Selamat A, Ibrahim R, et al. Multiclass prediction model for student grade prediction using machine learning. IEEE Access. 2021;9:95608–95621.
  • Hussain M, Zhu W, Zhang W, et al. Using machine learning to predict student difficulties from learning session data. Artif Intell Rev. 2019;52(1):381–407.
  • Li C, Xing W, Leite W. Yet another predictive model? Fair predictions of students’ learning outcomes in an online math learning platform. In LAK21: 11th International Learning Analytics and Knowledge Conference (pp. 572-578); 2021, April.
  • Yan L, Liu Y. An ensemble prediction model for potential student recommendation using machine learning. Symmetry (Basel). 2020;12(5):728.
  • Shekhar S, Kartikey K, Arya A. Integrating decision trees with metaheuristic search optimization algorithm for a student’s performance prediction. In 2020 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 655-661). IEEE. 2020, December.
  • Abubakar Y, Ahmad NBH. Prediction of students’ performance in e-learning environment using random forest. International Journal of Innovative Computing. 2017;7(2):1–5.
  • Dangi A, Srivastava S. An application of student data to forecast education results of student by using classification techniques. Journal of Critical Reviews. 2020;7(14):3339–3343.
  • Rastrollo-Guerrero JL, Gomez-Pulido JA, Durán- Domínguez A. Analyzing and predicting students’ performance by means of machine learning: A review. Applied Sciences. 2020;10(3):1042.
  • Al-Shehri H, Al-Qarni A, Al-Saati L, et al. Student performance prediction using support vector machine and k-nearest neighbor. In 2017 IEEE 30thCanadian conference on electrical and computer engineering (CCECE)(pp. 1-4). IEEE; 2017, April.
  • Burman I, Som S. Predicting student’s academic performance using support vector machine. In 2019 Amity International Conference on Artificial Intelligence (AICAI) (pp. 756-759). IEEE; 2019, February.
  • Huang C, Zhou J, Chen J, et al. A feature weighted support vector machine and artificial neural network algorithmfor academic course performance prediction. Neural Computing and Applications. 2021;33:1–13.
  • Boedeker P, Kearns NT. Linear discriminant analysis for prediction of group membership: a user-friendly primer. Advances in Methods and Practices in Psychological Science. 2019;2(3):250–263.
  • Adnan, Muhammad & Habib, Asad & Ashraf, Jawad & Mussadiq, Shafaq & Raza, Arslan & Abid, Muhammad & Bashir, Maryam & Khan, Sana. (2021). Predicting at-Risk Students at Different Percentages of Course Length for Early Intervention Using Machine Learning Models. IEEE Access. PP. 1-1. 10.1109/ACCESS.2021.3049446.
  • Amra IAA, Maghari AY. Students performance prediction using KNN and Naïve Bayesian. In 2017 8th InternationalConference on Information Technology (ICIT) (pp. 909-913). IEEE; 2017, May.
  • Vyas MS, Gulwani R. Predicting student’s performance using cart approach in data science. In 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA) (Vol. 1, pp. 58-61). IEEE; 2017, April.
  • Nawai SNM, Saharan S, Hamzah NA. An analysis of students’ performance using CART approach. In AIP Conference Proceedings (Vol. 2355, No. 1, p. 060009). AIP Publishing LLC; 2021, May.
  • Tripathi A, Yadav S, Rajan R. Naive Bayes classification model for the student performance prediction. In 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT) (Vol. 1, pp. 1548-1553). IEEE; 2019, July.
  • Afef Ben Brahim, Mohamed Limam. "A hybrid feature selection method based on instance learning and cooperative subset search," Pattern Recognition Letters, vol. 69, pp. 28-34, 2016.
  • Gómez-Pulido JA, Durán-Domínguez A, Pajuelo-Hol guera F. Optimizing latent factors and collaborative filtering for students’ performance prediction. Applied Sciences. 2020;10(16):5601.
  • Li J, Sun S, Yin H, et al. SEPN: a sequential engagement based academic performance prediction model. IEEE Intell Syst. 2020;36(1):46–53.
  • Rai S, Shastry KA, Pratap S, et al. Machine learning approach for student academic performance prediction. In: Evolution in computational intelligence. Singapore: Springer; 2021. p. 611–618.
  • Kou G, Yang P, Peng Y, et al. Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl Soft Comput. 2020;86:105836.
  • Hasan R, Palaniappan S, Mahmood S, et al. Predicting student performance in higher educational institutions using video learning analytics and data mining techniques. Applied Sciences. 2020;10(11):3894.
  • Badal, Y.T., Sungkur, R.K. Predictive modelling and analytics of students’ grades using machine learning algorithms. Educ Inf Technol (2022). https://doi.org/10.1007/s10639-022-11299-8.
  • E. Alyahyan and D. Dü³tegör, ``Predicting academic success in higher education: Literature review and best practices,'' Int. J. Educ. Technol. Higher Edu., vol. 17, no. 1, Dec. 2020.
  • Sharma, A., Mishra, P.K. Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis. Int. j. inf. tecnol. 14, 1949–1960 (2022). https://doi.org/10.1007/s41870-021-00671-5.
  • M. S. Sassirekha & S. Vijayalakshmi (2022) Predicting the academic progression in student’s standpoint using machine learning, Automatika, 63:4, 605-617, DOI:10.1080/00051144.2022.2060652.
  • R. Kamala, Ranjit Jeba Thangaiah” An improved hybrid feature selection method for huge dimensional datasets”, IAES International Journal of Artificial Intelligence (IJ-AI), Vol. 8, No. 1, March 2019, pp. 77~86 ISSN: 2252-8938, DOI: 10.11591/ijai.v8.i1.pp77-86.
  • Sharma, A., Mishra, P.K. Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis. Int. j. inf. tecnol. 14, 1949–1960 (2022). https://doi.org/10.1007/s41870-021-00671-5.
  • Rawat, K.S., Malhan, I.V. (2019). A Hybrid Classification Method Based on Machine Learning Classifiers to Predict Performance in Educational Data Mining. In: Krishna, C., Dutta, M., Kumar, R. (eds) Proceedings of 2nd International Conference on Communication, Computing and Networking. Lecture Notes in Networks and Systems, vol 46. Springer, Singapore. https://doi.org/10.1007/978-981-13-1217-5_67.
  • Phauk, Sokkhey & Okazaki, Takeo. (2020). Study on Dominant Factor for Academic Performance Prediction using Feature Selection Methods. International Journal of Advanced Computer Science and Applications. 11. 492-502. 10.14569/IJACSA.2020.0110862.
  • M. Pandey and S. Taruna, ``A comparative study of ensemble methods for students' performance modeling,'' Int. J. Comput. Appl., vol. 103, no. 8, pp. 26_32, Oct. 2014.
  • I. E. Livieris, K. Drakopoulou, T. A. Mikropoulos, V. Tampakas, and P. Pintelas, ``An ensemble-based semi-supervised approach for predicting students' performance,'' in Research on e-Learning and ICT in Education. Cham, Switzerland: Springer, 2018, pp. 25_42.
  • C. S. Rao and A. S. Arunachalam, “Ensemble based learning style identification using VARK,” NVEO-Natural Volatiles & Essential OILS Journal| NVEO, pp. 4550–4559, 2021.
  • Chaudhury, Pamela & Tripathy, Hrudaya. (2020). A novel academic performance estimation model using two stage feature selection. Indonesian Journal of Electrical Engineering and Computer Science. 19. 1610. 10.11591/ijeecs.v19.i3.pp1610-1619.
  • Febro, January. (2019). Utilizing Feature Selection in Identifying Predicting Factors of Student Retention. International Journal of Advanced Computer Science and Applications. 10. 10.14569/IJACSA.2019.0100934.
  • Marbouti, Farshid & Diefes-Dux, Heidi & Madhavan, Krishna. (2016). Models for early prediction of at-risk students in a course using standards-based grading. Computers & Education. 103. 10.1016/j.compedu.2016.09.005.
  • Mohamed, Yahia & Alkawsi, Gamal & Mustafa, Abdulsalam & Alkahtani, Ammar & Alsariera, Yazan & Ali, Abdulrazzaq & Hashim, Wahidah & Kiong, Tiong. (2022). Toward Predicting Student’s Academic Performance Using Artificial Neural Networks (ANNs). Applied Sciences. 12. 10.3390/app12031289.
  • Hussain, Mushtaq & Zhu, Wenhao & Zhang, Wu & Abidi, Raza. (2018). Student Engagement Predictions in an e-Learning System and Their Impact on Student Course Assessment Scores. Computational Intelligence and Neuroscience. 2018. 1-21. 10.1155/2018/6347186.
  • Hussain, M., Zhu, W., Zhang, W. et al. Using machine learning to predict student difficulties from learning session data. Artif Intell Rev 52, 381–407 (2019). https://doi.org/10.1007/s10462-018-9620-8.
  • Francis, B.K., Babu, S.S. Predicting Academic Performance of Students Using a Hybrid Data Mining Approach. J Med Syst 43, 162 (2019). https://doi.org/10.1007/s10916-019-1295-4.
  • Huynh-Cam, Thao-Trang & Chen, Long-Sheng & Huynh, Khai-Vinh. (2022). Learning Performance of International Students and Students with Disabilities: Early Prediction and Feature Selection through Educational Data Mining. Big Data and Cognitive Computing. 6. 94. 10.3390/bdcc6030094.
  • Badal, Yudish & Sungkur, Roopesh. (2022). Predictive modelling and analytics of students’ grades using machine learning algorithms. Education and Information Technologies. 10.1007/s10639-022-11299-8.
  • Al-Zawqari, A., Peumans, D., & Vandersteen, G. (2022). A flexible feature selection approach for predicting students’ academic performance in online courses. Computers and Education. Artificial Intelligence, 3, [100103]. https://doi.org/10.1016/j.caeai.2022.100103.
  • Nisha S Raj, Renumol V G. "Early prediction of student engagement in virtual learning environments using machine learning techniques", E-Learning and Digital Media, 2022
  • Alshanqiti, Abdullah & Namoun, Abdallah. (2020). Predicting Student Performance and Its Influential Factors Using Hybrid Regression and Multi-Label Classification. IEEE Access. 8. 203827-203844. 10.1109/ACCESS.2020.3036572.
  • D K Arun, V Namratha, B V Ramyashree, Yashita P Jain, Antara Roy Choudhury. "Student Academic Performance Prediction using Educational Data Mining", 2021, International Conference on Computer Communication and Informatics (ICCCI), 2021.
  • Aladeemy, Mohammed & Tutun, Salih & Khasawneh, Mohammad. (2017). A new hybrid approach for feature selection and Support Vector Machine model selection based on Self-Adaptive Cohort Intelligence. Expert Systems with Applications. 88. 118–131. 10.1016/j.eswa.2017.06.030.
  • Aiguo Wang, Ning An, Guilin Chen, Lian Li, and Gil Alterovitz, "Accelerating wrapper-based feature selection with K-nearest-neighbour," Knowl.-Based Syst., vol. 83, pp. 81–91, 2015.
  • Malik, S., Jothimani, K., Ujwal, U.J. (2023). A Comparative Analysis to Measure Scholastic Success of Students Using Data Science Methods. In: Shetty, N.R., Patnaik, L.M., Prasad, N.H. (eds) Emerging Research in Computing, Information, Communication and Applications. Lecture Notes in Electrical Engineering, vol 928. Springer, Singapore. https://doi.org/10.1007/978-981-19-5482-5_3.
  • Ezgi Zorarpacı, and Selma Ayse Ozel, "A hybrid approach of differential evolution and artificial bee colony for feature selection," Expert Systems With Applications, vol.62, pp. 91-103, 2016.
  • Huijuan Lu, Junying Chen, Ke Yan, Qun Jin, and Zhigang Gao, "A hybrid feature selection algorithm for gene expression data classification," Neurocomputing, vol. 256, pp. 56-62, 2017.
  • A Khan, SK Ghosh, D Ghosh, S Chattopadhyay” Random wheel: An algorithm for early classification of student performance with confidence” - Engineering Applications of Artificial Intelligence, 2021.
Еще
Статья научная