LongT5Rank: A Novel Integrated Hybrid Approach for Text Summarisation
Main Article Content
Keywords
Hybrid approach, LongT5 model, Semantic Textual Similarity, TextRank, Text Summarisation
Abstract
Text summarisation reduces text length while retaining important information, helping individuals, especially students, in managing information overload during research or assignments. However, existing text summarisation methods often lose important details, generate irrelevant or redundant sentences, or produce incoherent summaries. This study introduces a hybrid approach, LongT5Rank (coined in this study), which combines TextRank, an extractive summarisation algorithm, with LongT5, an abstractive summarisation algorithm, to automate the summarisation process. TextRank utilizes GloVe, a pre-trained word embedding model, and PageRank, a graph-based ranking algorithm, to select representative sentences. LongT5, an encoder-decoder transformer model, condenses extracted sentences into a concise and coherent summary, handling input sequences up to 16,384 tokens, for long-range sequence-to-sequence tasks. The LongT5Rank approach has shown significant achievements, including a minimum 60% compression rate, a minimum 0.6 semantic textual similarity score, and an improved F-measure compared to employing TextRank alone. Furthermore, it received positive feedback from Human Level Performance (HLP), underlining the importance of evaluating results directly from human users. This emphasizes the belief that the performance of the proposed solution should be assessed natively by humans. By combining both extractive and abstractive methods, LongT5Rank excels in generating accurate and coherent summaries.
Downloads
References
Bichi, A. A., Samsudin, R., Hassan, R., Hasan, L. R. A., & Ado Rogo, A. (2023). Graph-based extractive text summarization method for Hausa text. PloS One, 18(5), e0285376–e0285376. https://doi.org/10.1371/journal.pone.0285376
Chawla, J. S. (2020, July 6). Word Vectorization using GloVe. Analytics Vidhya. https://medium.com/analytics-vidhya/word-vectorization-using-glove-76919685ee0b
Dewda, M. (2022, September 18). Abstractive Text Summarization. Globant. https://medium.com/globant/abstractive-text-summarization-bccb4bf5851c
El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2020). Automatic Text Summarization: A Comprehensive Survey. Expert Systems with Applications, 165, 113679. https://doi.org/10.1016/j.eswa.2020.113679
Gambhir, M., & Gupta, V. (2016). Recent automatic text summarization techniques: a survey. Artificial Intelligence Review, 47(1), 1–66. https://doi.org/10.1007/s10462-016-9475-9
Ghadimi, A., & Beigy, H. (2022). Hybrid multi-document summarization using pre-trained language models. Expert Systems with Applications, 192, 116292. https://doi.org/10.1016/j.eswa.2021.116292
Guo, M., Ainslie, J., Uthus, D., Ontanon, S., Ni, J., Sung, Y.-H., & Yang, Y. (2022). LongT5: Efficient Text-To-Text Transformer for Long Sequences. ArXiv:2112.07916 [Cs]. https://arxiv.org/abs/2112.07916
Hernandez-Castaneda, A., Garcia-Hernandez, R. A., Ledeneva, Y., & Millan-Hernandez, C. E. (2020). Extractive Automatic Text Summarization Based on Lexical-Semantic Keywords. IEEE Access, 8, 49896–49907. https://doi.org/10.1109/access.2020.2980226
Jayan, J. P., & Govindaru, G. (2022). Automatic Summarization of Malayalam Documents using Text Extraction Methods. SCRS Conference Proceedings on Intelligent Systems, 443–457. https://www.publications.scrs.in/chapter/978-93-91842-08-6/42
Joshi, P. (2023, May 22). An Introduction to Text Summarization using the TextRank Algorithm (with Python implementation). Analytics Vidhya. https://www.analyticsvidhya.com/blog/2018/11/introduction-text-summarization-textrank-python/
Khor, Y. K., Tan, C. L., & Lim, T. M. (2022). Extractive Summarization on Food Reviews. Journal of The Institution of Engineers, Malaysia—ICDXA Special Issue, 82(3). https://doi.org/10.54552/v82i3.96
Krishnan, S. (2022, January 5). Why Cosine Similarity is used in Natural Language Processing? Medium. https://sandhyakrishnan02.medium.com/cosine-similarity-for-natural-language-processing-d761e2c02d10
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. ArXiv:1910.13461 [Cs, Stat]. https://arxiv.org/abs/1910.13461
Luo, C., Chen, Z., Jiang, X., & Yang, S. (2022). Gap Sentences Generation with TextRank for Chinese Text Summarization. ACAI ’22: Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence. Article 67, pp 1–5. https://doi.org/10.1145/3579654.3579725
Majumder, G., Pakray, P., Gelbukh, A., & Pinto, D. (2016). Semantic Textual Similarity Methods, Tools, and Applications: A Survey. Computación y Sistemas, 20(4). https://doi.org/10.13053/cys-20-4-2506
Mishra, U. (2022). What Is Text Summarization in NLP? Analytics Steps. www.analyticssteps.com. https://www.analyticssteps.com/blogs/what-text-summarization-nlp
Priyanka. (2022, November 21). ROUGE your NLP Results! Medium. https://medium.com/@priyankads/rouge-your-nlp-results-b2feba61053a
Rusnachenko, N. (2024, January 5). nicolay-r/ViLongT5. GitHub. https://github.com/nicolay-r/ViLongT5
Santhosh, S. (2023, April 16). Understanding BLEU and ROUGE score for NLP evaluation. Medium. https://medium.com/@sthanikamsanthosh1994/understanding-bleu-and-rouge-score-for-nlp-evaluation-1ab334ecadcb
Torres, S. (2021). Evaluating Extractive Text Summarization with BERTSUM Stanford CS224N Custom Project. Retrieved August 25, 2024, from https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1214/reports/final_reports/report042.pdf
Vodolazova, T., & Lloret, E. (2019, September 1). Towards Adaptive Text Summarization: How Does Compression Rate Affect Summary Readability of L2 Texts? In Mitkov, R. & Angelova, G. (eds.). Proceedings of Recent Advances in Natural Language Processing, pp. 1265–1274. https://doi.org/10.26615/978-954-452-056-4_145
Zaware, S., Patadiya, D., Gaikwad, A., Gulhane, S., & Thakare, A. (2021). Text Summarization using TF-IDF and Textrank algorithm. 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 2021, pp. 1399–1407. https://doi.org/10.1109/ICOEI51242.2021.9453071