Language Independent Models for COVID-19 Fake News Detection Black Box versus White Box Models

Main Article Content

W. K. Wong
Filbert H. Juwono
Ing Ming Chew
Basil Andy Lease


Fake news, black box model, white box model, machine learning, COVID-19


In an era where massive information can be spread easily through social media, fake news detention is increasingly used to prevent widespread misinformation, especially fake news regarding COVID-19. Databases have been built and machine-learning algorithms have been used to identify patterns in news content and filter the false information. A brief overview, ranging from public domain datasets through the deployment of several machine learning models, as well as feature extraction methods, is provided in this paper. As a case study, a mixed language dataset is presented. The dataset consists of tweets of COVID-19 which have been labelled as fake or real news. To perform the detection task, a classification model is implemented using language-independent features. In particular, the features offer numerical inputs that are invariant to the language type; thus, they are suitable for investigation, as many regions in the world have similar linguistic structures. Furthermore, the classification task can be performed by using black box or white box models, each having its own advantages and disadvantages. In this paper, we compare the performance of the two approaches. Simulation results show that the performance difference between black box models and white box models is not significant.


Download data is not yet available.
Abstract 187 | 789-PDF-v11n3pp84-104 Downloads 20


Abdelminaam, D. S., Ismail, F. H., Taha, M., Taha, A., Houssein, E. H., & Nabil, A. (2021). CoAID-DEEP: An Optimized Intelligent Framework for Automated Detecting COVID-19 Misleading Information on Twitter. IEEE Access, 9, 27840–27867.
Abonizio, H. Q., Morais, J. I., Tavares, G. M., & Barbon Junior, S. (2020). Language-Independent Fake News Detection: English, Portuguese, and Spanish Mutual Features. Future Internet, 12, 1–18.
Al-Ahmad, B., Al-Zoubi, A., Abu Khurma, R., & Aljarah, I. (2021). An Evolutionary Fake News Detection Method for COVID-19 Pandemic Information. Symmetry, 13, 1091.
Alam, F., Shaar, S., Dalvi, F., Sajjad, H., Nikolov, A., Mubarak, H., Martino, G. D. S., Abdelali, A., Durrani, N., Darwish, K., Al-Homaid, A., Zaghouani, W., Caselli, T., Danoe, G., Stolk, F., Bruntink, B., & Nakov, P. (2020). Fighting the COVID-19 Infodemic: Modeling the Perspective of Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the Society. arXiv preprint arXiv:2005.00033.
Alameri, S. A., & Mohd, M. (2021). Comparison of Fake News Detection Using Machine Learning and Deep Learning Techniques. 3rd International Cyber Resilience Conference (CRC).
Al-Ash, H. S., Putri, M. F., Mursanto, P., & Bustamam, A. (2019). Ensemble Learning Approach on Indonesian Fake News Classification. 3rd International Conference on Informatics and Computational Sciences (ICICoS).
Albury, N. J. (2017). Mother Tongues and Languaging in Malaysia: Critical Linguistics Under Critical Examination. Language in Society, 46, 567–589.
Choudhury, D., & Acharjee, T. (2022). A Novel Approach to Fake News Detection in Social Networks Using Genetic Algorithm Applying Machine Learning Classifiers. Multimedia Tools and Applications, 82, 9029–9045.
Cui, L., & Lee, D. (2020). CoAID: COVID-19 Healthcare Misinformation Dataset. arXiv preprint arXiv:2006.00885.
Das, S. D., Basak, A., & Dutta, S. (2021). A Heuristic-driven Ensemble Framework for COVID-19 Fake News Detection. International Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation (pp. 164–176).
De, A., Bandyopadhyay, D., Gain, B., & Ekbal, A. (2021). A Transformer-based Approach to Multilingual Fake News Detection in Low-resource Languages. ACM Transactions on Asian and Low-Resource Language Information Processing, 21, 1–20.
Domenico, G. D., Sit, J., Ishizaka, A., & Nunan, D. (2021). Fake News, Social Media and Marketing: A Systematic Review. Journal of Business Research, 124, 329–341.
Faustini. P., & Covões, T. (2020). Fake News Detection in Multiple Platforms and Languages. Expert Systems with Applications, 158, 1–17.
Faustini, P., & Covões, T. (2019). Fake News Detection Using One-class Classification. 8th Brazilian Conference on Intelligent Systems (BRACIS).
Ferreira Caceres, M. M., Sosa, J. P., Lawrence, J. A., Sestacovschi, C., Tidd-Johnson, A., Rasool, M. H. U., Gadamidi, V. K., Ozair, S., Pandav, K., Cuevas-Lou, C., Parrish, M., Rodriguez, I., & Fernandez, J. P. (2022). The Impact of Misinformation on the COVID-19 Pandemic. AIMS Public Health, 9(2), 262–277.
Fung, P. L., Zaidan, M. A., Timonen, H., Niemi, J. V., Kousa, A., Kuula, J., Luoma, K., Tarkoma, S., Petäjä, T., Kulmala, M., & Hussein, T. (2021). Evaluation of White-box Versus Black-box Machine Learning Models in Estimating Ambient Black Carbon Concentration. Journal of Aerosol Science, 152, 105694.
Galal, S., Nagy, N., & El-Sharkawi, M. E. (2021). CNMF: A Community-Based Fake News Mitigation Framework. Information, 12(9), 376.
Grossman, G. M., & Helpman, E. (2023). Electoral Competition with Fake News. European Journal of Political Economy, 77, 1–12.
Guibon, G., Ermakova, L., Seffih, H., Firsov, A., & Noé-Bienvenu, G. (2019). Multilingual Fake News Detection with Satire. International Conference on Computational Linguistics and Intelligent Text Processing (pp. 392–402).
Gupta, M., Dennehy, D., Parra, C. M., Mäntymäki, M., & Dwivedi, Y. K. (2023). Fake News Believability: The Effects of Political Beliefs and Espoused Cultural Values. Information & Management. 60, 1–12.
Hande, A., Puranik, K., Priyadharshini, R., Thavareesan, S., & Chakravarthi, B. R. (2021). Evaluating Pretrained Transformer-based Models for COVID-19 Fake News Detection. 5th International Conference on Computing Methodologies and Communication (ICCMC) (pp. 766–772).
Hayawi, K., Shahriar, S., Serhani, M. A., Taleb, I., & Mathew, S. S. (2022). ANTi-Vax: A Novel Twitter Dataset for COVID-19 Vaccine Misinformation Detection. Public Health, 203, 23–30.
Hu, L., Wei, S., Zhao, Z., & Wu, B. (2022). Deep Learning for Fake News Detection: A Comprehensive Survey. AI Open, 3, 133–155.
Hussain, M G., Hasan, M. R., Rahman, M., Protim, J., & Hasan, S. A. (2020). Detection of Bangla Fake News Using MNB and SVM Classifier. arXiv preprint arXiv:2005.14627.
Imaduwage, S., Kumara, P. P. N. V., & Samaraweera, W. J. (2022). Importance of User Representation in Propagation Network-based Fake News Detection: A Critical Review and Potential Improvements. 2nd International Conference on Advanced Research in Computing (ICARC) (pp. 90–95).
Imbwaga, J. L., Chittaragi, N., & Koolagudi, S. (2022). Fake News Detection Using Machine Learning Algorithms. Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing (IC3-2022).
Ivancová, K., Sarnovský, M., & Maslej-Krcšñáková, V. (2021). Fake News Detection in Slovak Language Using Deep Learning Techniques. IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI).
Jardaneh, G., Abdelhaq, H., Buzz, M., & Johnson, D. (2019). Classifying Arabic Tweets Based on Credibility Using Content and User Features. Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT).
Javed Mehedi Shamrat, F. M., Ranjan, R., Hasib, K. M., Yadav, A., & Siddique, A. H. (2022). Performance Evaluation Among ID3, C4.5, and CART Decision Tree Algorithm. In Ranganathan, G., Bestak, R., Palanisamy, R., & Rocha, Á. (eds). Pervasive Computing and Social Networking. Lecture Notes in Networks and Systems, 317. Springer, Singapore.
Jiang, T., Li, J. P., Haq, A. U., & Saboor, A. (2020). Fake News Detection Using Deep Recurrent Neural Networks. 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP).
Jiang, T., Li, J. P., Haq, A. U., Saboor, A., & Ali, A. (2021). A Novel Stacking Approach for Accurate Detection of Fake News. IEEE Access, 9, 22626–22639.
Kar, D., Bhardwaj, M., Samanta, S., & Azad, A. P. (2020). No Rumours Please! A Multi-indic-lingual Approach for COVID Fake-tweet Detection. 2021 Grace Hopper Celebration India (GHCI) conference.
Kesarwani, A., Chauhan, S. S., & Nair, A. R., (2020). Fake News Detection on Social Media Using K-Nearest Neighbours Classifier. International Conference on Advances in Computing and Communication Engineering (ICACCE).–/ICACCE49060.2020.9154997
Kim, J., Tabibian, B., Oh, A., Schoelkopf, B., & Gomez-Rodriguez, M. (2018). Leveraging the Crowd to Detect and Reduce the Spread of Fake News and Misinformation. arXiv preprint arXiv:1711.09918.
Kong, J. T. H., Wong, W. K., Juwono, F. H., & Apriono, C. (2023). Generating Fake News Detection Model Using a Two-stage Evolutionary Approach. IEEE Access, 11, 85067–85085.
Kong, S. H., Tan, L. M., Gan, K. H., & Samsudin, N. H. (2020). Fake News Detection Using Deep Learning. 2020 IEEE 10th Symposium on Computer Applications & Industrial Electronics (ISCAIE).
Li, Y., Jiang, B., Shu, K., & Liu, H. (2020). Mm-covid: A Multilingual and Multimodal Data Repository for Combating COVID-19 Disinformation. arXiv preprint arXiv:2011.04088.
Lin, J., Tremblay-Taylor, G., Mou, G., You, D., & Lee, K. (2019). Detecting Fake News Articles. 2019 IEEE International Conference on Big Data (Big Data) (pp. 3021–3025).
Loyola-González, O. (2019). Black-Box vs. White-Box: Understanding Their Advantages and Weaknesses from A Practical Point of View. IEEE Access, 7, 154096–154113.
Maakoul, O., Boucht, S., Hachimi, K., & Azzouzi, S. (2020). Towards Evaluating the COVID’19 Related Fake News Problem: Case of Morocco. 2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS).
Melo, T., & Figueiredo, C. M. (2020). A First Public Dataset from Brazilian Twitter and News on COVID-19 in Portuguese. Data in brief, 32, 106179.–/j.dib.2020.106179
Memon, S. A., & Carley, K. M. (2020). Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset. arXiv preprint arXiv:2008.00791.
Mugdha, S. B. S., Ferdous, S. M., & Fahmin, A. (2020). Evaluating Machine Learning Algorithms for Bengali Fake News Detection. 23rd International Conference on Computer and Information Technology (ICCIT).–/ICCIT51783.2020.9392662
Murayama, T., Wakamiya, S., Aramaki, E., & Kobayashi, R. (2021). Modeling the Spread of Fake News on Twitter. PLOS ONE, 16(4), e0250419.–/journal.pone.0250419
Nordberga, P., Kävrestada, J., & Nohlberg, M. (2020). Automatic Detection of Fake News. 6th International Workshop on Socio-Technical Perspective in IS Development (STPIS’20).
Oliveira, N. R., Medeiros, D. S., & Mattos, D. M. (2020). A Sensitive Stylistic Approach to Identify Fake News on Social Networking. IEEE Signal Processing Letters, 27, 1250–1254.
Patwa, P., Sharma, S., Pykl, S., Guptha, V., Kumari, G., Akhtar, M. S., Ekbal, A., Das, A., & Chakraborty, T. (2021). Fighting an Infodemic: COVID-19 Fake News Dataset. In Chakraborty, T., Shu, K., Bernard, H. R., Liu, H., & Akhtar, M.S. (eds), Combating Online Hostile Posts in Regional Languages during Emergency Situation. CONSTRAINT 2021. Communications in Computer and Information Science, 1402. Springer, Cham.
Pizarro, J. (2020). Profiling Bots and Fake News Spreaders at Pan’19 and Pan’20: Bots and Gender Profiling 2019, Profiling Fake News Spreaders on Twitter 2020. IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA).
Prasetyo, A., Septianto, B. D., Shidik, G. F., & Fanani, A Z. (2019). Evaluation of Feature Extraction TF-IDF in Indonesian Hoax News Classification. International Seminar on Application for Technology of Information and Communication (iSemantic).
Probierz, B., Stefański, P., & Kozak, J. (2021). Rapid Detection of Fake News Based on Machine Learning Methods, Procedia Computer Science, 192, 2893–2902.–/10.1016/j.procs.2021.09.060
Rocha, Y. M., Moura, G. A., Desidério, G. A., Oliveira C. H., Lourenço, F. D., & Figueiredo Nicolete, L. D. (2023). The Impact of Fake News on Social Media and Its Influence on Health During The COVID-19 Pandemic: A Systematic Review. Journal of Public Health, 31, 1007–1016.
Rusli, A., Young, J. C., & Iswari, N. M. S. (2020). Identifying Fake News in Indonesian via Supervised Binary Text Classification. IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology, pp. 86–90.
Shahi, G. K., & Nandini, D. (2020). FakeCovid –A Multilingual Cross-domain Fact Check News Dataset for COVID-19. arXiv preprint arXiv:2006.11343.–/10.36190/2020.14
Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake News Detection on Social Media: A Data Mining Perspective. ACM SIGKDD explorations newsletter, 19(1), 22–36.
Sutter, G. D., Cappelle, B., Clercq, O. D., Loock, R., & Plevoets, K. (2017). Towards A Corpus-based, Statistical Approach to Translation Quality: Measuring and Visualizing Linguistic Deviance in Student Translations. Linguistica Antverpiensia, New Series–Themes in Translation Studies, 16, 16–25.–/lanstts.v16i0.440
Törnberg, P. (2018). Echo Chambers and Viral Misinformation: Modeling Fake News as Complex Contagion. PLOS ONE, 13(9), e0203958.–/journal.pone.0203958
Verma, P. K., Agrawal, P., Amorim, I., & Prodan, R. (2021). Welfare: Word Embedding Over Linguistic Features for Fake News Detection. IEEE Transactions on Computational Social Systems, 8(4), 881–893.
Veselý, K., Karafiát, M., Grézl, F., Janda, M., & Egorova, E. (2012). The Language-Independent Bottleneck Features. IEEE Spoken Language Technology Workshop (SLT).
Vogel, I., & Meghana, M. (2020). Detecting Fake News Spreaders on Twitter From A Multilingual Perspective. 7th International Conference on Data Science and Advanced Analytics (DSAA).
Wang, Y., Hou, Y., Che, W., & Liu, T. (2020). From Static to Dynamic Word Representations: A Survey. International Journal of Machine Learning and Cybernetics, 11, 1611–1630.
Waszak, P., Kasprzycka-Waszak, W., Kubanek, A. (2018). The Spread of Medical Fake News in Social Media – The Pilot Quantitative Study. Health Policy and Technology, 7(2), 115–118.
Wong, W. K., Juwono, F. H., & Apriono, C. (2021). Vision-based Malware Detection: A Transfer Learning Approach Using Optimal ECOC-SVM Configuration. IEEE Access, 9, 159262–159270.
Wong, W. K., Juwono, F. H., Nuwara, Y., & Kong, J. T. H. (2023). Synthesizing Missing Travel Time of P-Wave and S-Wave: A Two-Stage Evolutionary Modeling Approach. IEEE Sensors Journal, 23(14), 15867–15877.–/JSEN.2023.3280708
Wong. W., Ming, C. I. (2019). A Review on Metaheuristic Algorithms: Recent Trends, Benchmarking and Applications. 7th International Conference on Smart Computing Communications (ICSCC).
Yang, C., Zhou, X., Zafarani, R. (2021). Checked: Chinese COVID-19 Fake News Dataset. Social Network Analysis and Mining, 11(1), 1–8.
Zervopoulos, A., Alvanou, A. G., Bezas, K., & Papamichail, A. (2022). Deep Learning for Fake News Detection on Twitter Regarding the 2019 Hong Kong Protests. Neural Computing and Applications, 34(1), 969–982.
Zhou, X., & Zafarani, R. (2019). Network-based Fake News Detection: A Pattern-driven Approach. SIGKDD Explorations Newsletter, 21(2), 48–60.–/10.1145/3373464.3373473