Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 20

Communication Papers of the 2019 Federated Conference on Computer Science and Information Systems

Comparative Analysis of Data Mining Algorithms Applied to the Context of School Dropout

, , ,

DOI: http://dx.doi.org/10.15439/2019F265

Citation: Communication Papers of the 2019 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 20, pages 310 ()

Full text

Abstract. Students' dropout is certainly one of the major problems that afflict educational institutions, the losses caused by the student's abandonment are social, academic and economic waste. The quest for its causes has been subject of work and educational research around the world. Several organizations seek strategic decisions to control the dropout rate. This work's goal is to evaluate the effectiveness of the most used data mining algorithms in the education area. An``in vivo'' controlled experiment was planned and performed to compare the efficacy selected classifiers. The Random Forest and SVM algorithms have stood out in this context, having, statistically similar accuracy (80.36\%, 81.18\%), precision (80.79\%, 80.25\%), recall (76.50\%, 77.51\%) and f-measure (78.86\%, 78.81\%) averages. The results showed evidence of significant differences between the algorithms, and also showed that, although the SVM had the best metric of accuracy and recall, it results were statistically similar with Random Forest results.

References

  1. A. M. Ahmed, A. Rizaner, and A. H. Ulusoy. Using data mining to predict instructor performance. Procedia Computer Science, 102:137–142, 2016.
  2. A. Anjos. Análise de variância. Universidade Federal do Paraná, Departamento de Estatística - UFPR, Curitiba, page Capítulo 7, 2009.
  3. V. R. Basili and D. M. Weiss. A methodology for collecting valid software engineering data. Technical report, NAVAL RESEARCH LAB WASHINGTON DC, 1983.
  4. L. Breiman. Machine learning. Kluwer Academic Publishers, pages 5–32, 2001.
  5. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classication and regression trees. Monterey, CA: Wadsworth and Brooks, 1984.
  6. G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent, and M. E. Houle. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining and Knowledge Discovery, 30(4):891–927, Jul 2016.
  7. C. Cortes and V. Vapnik. Support-vector networks. machine learning, 20. pages 273–297, 1995.
  8. I. N. de Estudos e Pesquisas Educacionais Anísio Teixeira – Inep. Censo da educação superior 2015. 2015.
  9. A. Field. Descobrindo a estatística usando o SPSS. 2.ed. Porto Alegre: Artned, 2009.
  10. D. G., P. M., and V. J. Predicting students drop out: A case study. Proceedings of the International Conference on Educational Data Mining, pages 41–50, 2009.
  11. F. Gorunescu. Data Mining: Concepts, models and techniques, volume 12. Springer Science & Business Media, 2011.
  12. J. Han, J. Pei, and M. Kamber. Data mining: concepts and techniques. Elsevier, 2011.
  13. D. J. Hand, H. Mannila, and P. Smyth. Principles of data mining. MIT press, 2001.
  14. N. Iam-On and T. Boongoen. Improved student dropout prediction in thai university using ensemble of mixed-type data clusterings. International Journal of Machine Learning and Cybernetics, 8(2):497–510, 2017.
  15. IBM. Spss. IBM SPSS Statistics for Windows, Version 25.0. Armonk, NY: IBM Corp, 2017.
  16. N. Juristo and A. Moreno. Software engineering experimentation. 2001.
  17. G. Kantorski, E. G. Flores, J. Schmitt, I. Hoffmann, and F. Barbosa. Predição da evasão em cursos de graduação em instituições públicas. In Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação-SBIE), volume 27, page 906, 2016.
  18. H. Levene. Robust tests for equality of variances. International Journal of Machine Learning and Cybernetics, pages 278–292, 1960.
  19. E. Machado and L. Marcelo. Um estudo de limpeza em base de dados desbalanceada e com sobreposição de classes. XXVII Congresso da Sociedade Brasileira de Computação, 2007.
  20. R. D. Machado, E. O. B. Nara, J. N. C. Schreiber, and G. A. Schwingel. Estudo bibliométrico em mineração de dados e evasão escolar. XI Congresso Nacional de Excelência em Gestão, 2015.
  21. L. M. B. Manhães, S. Cruz, R. J. M. Costa, J. Zavaleta, and G. Zimbrão. Identificação dos fatores que influenciam a evasão em cursos de graduação através de sistemas baseados em mineração de dados: Uma abordagem quantitativa. Anais do VIII Simpósio Brasileiro de Sistemas de Informação, São Paulo, 2012.
  22. C. Márquez-Vera, C. R. Morales, and S. V. Soto. Predicting school failure and dropout by using data mining techniques. IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, 8(1):7–14, 2013.
  23. E. Mundstock. Introdução à análise estatística utilizando o spss 13.0. cadernos de matemática e estatística série b. 2006.
  24. A. Nürnberger, W. Pedrycz, and R. Kruse. Handbook of data mining and knowledge discovery. Chapter data mining tasks and Methods: Classication: Neural network approaches. New York, NY, USA: Oxford University Press, 2002.
  25. J. G. d. Oliveira Júnior et al. Identificação de padrões para a análise da evasão em cursos de graduação usando mineração de dados educacionais. Master’s thesis, Universidade Tecnológica Federal do Paraná, 2015.
  26. K. S. Poll. Analytics, data mining software used. https://www.kdnuggets.com/polls/2015/analytics-data-mining-data-science-software-used.html, 2015.
  27. A. Pradeep, S. Das, and J. J. Kizhekkethottam. Students dropout factor prediction using edm techniques. In Soft-Computing and Networks Security (ICSNS), 2015 International Conference on, pages 1–7. IEEE, 2015.
  28. S. J. Russell and P. Norvig. Artificial intelligence: a modern approach. Malaysia; Pearson Education Limited„ 2016.
  29. A. J. Severino. Metodologia do trabalho científico. Cortez editora, 2017.
  30. S. Shapiro and M. Wilk. An analysis of variance test for normality (complete samples). International Journal of Machine Learning and Cybernetics, 52:591–611, 1965.
  31. D. F. Silva and G. E. de Almeida Prado Alves Batista. Uma comparação experimental de métodos de imputação de valores desconhecidos. ICMC - Instituto de Ciências Matemáticas e de Computação, São Paulo, 2009.
  32. R. L. L. Silva Filho, P. R. Motejunas, O. Hipólito, and M. Lobo. A evasão no ensino superior brasileiro. Cadernos de pesquisa, 37(132):641–659, 2007.
  33. C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén. Experimentation in software engineering. Springer Science & Business Media, 2012.
  34. X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, S. Y. Philip, et al. Top 10 algorithms in data mining. Knowledge and information systems, 14(1):1–37, 2008.