LongT5Rank: A Novel Integrated Hybrid Approach for Text Summarisation

Agatha Jin Jin Lau; Chi Wee Tan

doi:10.18080/jtde.v12n3.977

Agatha Jin Jin Lau

Tunku Abdul Rahman University of Management and Technology, Malaysia

https://orcid.org/0009-0007-6906-8475
Chi Wee Tan

Tunku Abdul Rahman University of Management and Technology, Malaysia

https://orcid.org/0000-0001-6828-4896

Keywords

Hybrid approach, LongT5 model, Semantic Textual Similarity, TextRank, Text Summarisation

Abstract

Text summarisation reduces text length while retaining important information, helping individuals, especially students, in managing information overload during research or assignments. However, existing text summarisation methods often lose important details, generate irrelevant or redundant sentences, or produce incoherent summaries. This study introduces a hybrid approach, LongT5Rank (coined in this study), which combines TextRank, an extractive summarisation algorithm, with LongT5, an abstractive summarisation algorithm, to automate the summarisation process. TextRank utilizes GloVe, a pre-trained word embedding model, and PageRank, a graph-based ranking algorithm, to select representative sentences. LongT5, an encoder-decoder transformer model, condenses extracted sentences into a concise and coherent summary, handling input sequences up to 16,384 tokens, for long-range sequence-to-sequence tasks. The LongT5Rank approach has shown significant achievements, including a minimum 60% compression rate, a minimum 0.6 semantic textual similarity score, and an improved F-measure compared to employing TextRank alone. Furthermore, it received positive feedback from Human Level Performance (HLP), underlining the importance of evaluating results directly from human users. This emphasizes the belief that the performance of the proposed solution should be assessed natively by humans. By combining both extractive and abstractive methods, LongT5Rank excels in generating accurate and coherent summaries.

Abstract 269 | 977-PDF-v12n3pp73-96 Downloads 5

References

Abujar, S., Masum, A. K. M., Sanzidul Islam, M., Faisal, F., & Hossain, S. A. (2020). A Bengali Text Generation Approach in Context of Abstractive Text Summarization Using RNN. In Saini, H., Sayal, R., Buyya, R., & Aliseri, G. (eds), Innovations in Computer Science and Engineering. Lecture Notes in Networks and Systems, vol. 103. Springer, Singapore. pp. 509–518. https://doi.org/10.1007/978-981-15-2043-3_55
Bichi, A. A., Samsudin, R., Hassan, R., Hasan, L. R. A., & Ado Rogo, A. (2023). Graph-based extractive text summarization method for Hausa text. PloS One, 18(5), e0285376–e0285376. https://doi.org/10.1371/journal.pone.0285376
Chawla, J. S. (2020, July 6). Word Vectorization using GloVe. Analytics Vidhya. https://medium.com/analytics-vidhya/word-vectorization-using-glove-76919685ee0b
Dewda, M. (2022, September 18). Abstractive Text Summarization. Globant. https://medium.com/globant/abstractive-text-summarization-bccb4bf5851c
El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2020). Automatic Text Summarization: A Comprehensive Survey. Expert Systems with Applications, 165, 113679. https://doi.org/10.1016/j.eswa.2020.113679
Gambhir, M., & Gupta, V. (2016). Recent automatic text summarization techniques: a survey. Artificial Intelligence Review, 47(1), 1–66. https://doi.org/10.1007/s10462-016-9475-9
Ghadimi, A., & Beigy, H. (2022). Hybrid multi-document summarization using pre-trained language models. Expert Systems with Applications, 192, 116292. https://doi.org/10.1016/j.eswa.2021.116292
Guo, M., Ainslie, J., Uthus, D., Ontanon, S., Ni, J., Sung, Y.-H., & Yang, Y. (2022). LongT5: Efficient Text-To-Text Transformer for Long Sequences. ArXiv:2112.07916 [Cs]. https://arxiv.org/abs/2112.07916
Hernandez-Castaneda, A., Garcia-Hernandez, R. A., Ledeneva, Y., & Millan-Hernandez, C. E. (2020). Extractive Automatic Text Summarization Based on Lexical-Semantic Keywords. IEEE Access, 8, 49896–49907. https://doi.org/10.1109/access.2020.2980226
Jayan, J. P., & Govindaru, G. (2022). Automatic Summarization of Malayalam Documents using Text Extraction Methods. SCRS Conference Proceedings on Intelligent Systems, 443–457. https://www.publications.scrs.in/chapter/978-93-91842-08-6/42
Joshi, P. (2023, May 22). An Introduction to Text Summarization using the TextRank Algorithm (with Python implementation). Analytics Vidhya. https://www.analyticsvidhya.com/blog/2018/11/introduction-text-summarization-textrank-python/
Khor, Y. K., Tan, C. L., & Lim, T. M. (2022). Extractive Summarization on Food Reviews. Journal of The Institution of Engineers, Malaysia—ICDXA Special Issue, 82(3). https://doi.org/10.54552/v82i3.96
Krishnan, S. (2022, January 5). Why Cosine Similarity is used in Natural Language Processing? Medium. https://sandhyakrishnan02.medium.com/cosine-similarity-for-natural-language-processing-d761e2c02d10
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. ArXiv:1910.13461 [Cs, Stat]. https://arxiv.org/abs/1910.13461
Luo, C., Chen, Z., Jiang, X., & Yang, S. (2022). Gap Sentences Generation with TextRank for Chinese Text Summarization. ACAI ’22: Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence. Article 67, pp 1–5. https://doi.org/10.1145/3579654.3579725
Majumder, G., Pakray, P., Gelbukh, A., & Pinto, D. (2016). Semantic Textual Similarity Methods, Tools, and Applications: A Survey. Computación y Sistemas, 20(4). https://doi.org/10.13053/cys-20-4-2506
Mishra, U. (2022). What Is Text Summarization in NLP? Analytics Steps. www.analyticssteps.com. https://www.analyticssteps.com/blogs/what-text-summarization-nlp
Priyanka. (2022, November 21). ROUGE your NLP Results! Medium. https://medium.com/@priyankads/rouge-your-nlp-results-b2feba61053a
Rusnachenko, N. (2024, January 5). nicolay-r/ViLongT5. GitHub. https://github.com/nicolay-r/ViLongT5
Santhosh, S. (2023, April 16). Understanding BLEU and ROUGE score for NLP evaluation. Medium. https://medium.com/@sthanikamsanthosh1994/understanding-bleu-and-rouge-score-for-nlp-evaluation-1ab334ecadcb
Torres, S. (2021). Evaluating Extractive Text Summarization with BERTSUM Stanford CS224N Custom Project. Retrieved August 25, 2024, from https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1214/reports/final_reports/report042.pdf
Vodolazova, T., & Lloret, E. (2019, September 1). Towards Adaptive Text Summarization: How Does Compression Rate Affect Summary Readability of L2 Texts? In Mitkov, R. & Angelova, G. (eds.). Proceedings of Recent Advances in Natural Language Processing, pp. 1265–1274. https://doi.org/10.26615/978-954-452-056-4_145
Zaware, S., Patadiya, D., Gaikwad, A., Gulhane, S., & Thakare, A. (2021). Text Summarization using TF-IDF and Textrank algorithm. 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 2021, pp. 1399–1407. https://doi.org/10.1109/ICOEI51242.2021.9453071

977-PDF-v12n3pp73-96 (AUD 30)

Published

2024-09-30

DOI: https://doi.org/10.18080/jtde.v12n3.977

How to Cite

Lau, A. J. J., & Tan, C. W. (2024). LongT5Rank: A Novel Integrated Hybrid Approach for Text Summarisation. Journal of Telecommunications and the Digital Economy, 12(3), 73-96. https://doi.org/10.18080/jtde.v12n3.977

Issue

Vol. 12 No. 3 (2024)

Section

Special Issue: Perspectives on Machine Learning

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Copyright Telecommunications Association Inc.

Author Biographies

Agatha Jin Jin Lau, Tunku Abdul Rahman University of Management and Technology, Malaysia

Agatha Jin Jin Lau

Department of Mathematical and Data Science, Tunku Abdul Rahman University of Management and Technology, Jalan Genting Kelang, Setapak, 53300, Malaysia

Chi Wee Tan, Tunku Abdul Rahman University of Management and Technology, Malaysia

Chi Wee Tan

Department of Computer Science and Embedded Systems, Tunku Abdul Rahman University of Management and Technology, Jalan Genting Kelang, Setapak, 53300, Malaysia

How to Cite

Lau, A. J. J., & Tan, C. W. (2024). LongT5Rank: A Novel Integrated Hybrid Approach for Text Summarisation. Journal of Telecommunications and the Digital Economy, 12(3), 73-96. https://doi.org/10.18080/jtde.v12n3.977

Download Citation

Main Article Content

Keywords

Abstract

References

Article Sidebar

Article Details

Agatha Jin Jin Lau, Tunku Abdul Rahman University of Management and Technology, Malaysia

Chi Wee Tan, Tunku Abdul Rahman University of Management and Technology, Malaysia

How to Cite