ROUGE Metric

By Diya Mathew on July 06, 2023

The Recall-Oriented Understudy for Gisting Evaluation (ROUGE)(Lin et al., 2004) is a metric, commonly used to measure the accuracy of text summarization models. Summarization is one of the most difficult tasks to measure because there is no straightforward way to do it. The ROUGE score is one of the most used metrics for summarization. This metric compares a generated summary to a set of human-created reference summaries. Using ROUGE, recall is calculated based on how much of the reference summary is represented by the generated summary. ROUGE refers to a set of metrics. Those most likely to be used are ROUGE-N, ROUGE-L, and ROUGE-LSUM. The ROUGE-N metric measures how many ‘n-grams’ match between our model-generated text and the reference text. In simple terms, n-grams are groups of words or tokens. There is only one word in a unigram (1-gram) and two consecutive words in a bigram(2-gram). ROUGE-N uses n-grams to represent the gram. In ROUGE-1, we are measuring the match-rate of unigrams from our model output to the reference. In ROUGE-L, the longest common subsequence (LCS) between output and reference is calculated.

The ROUGE value can be calculated by three methods Recall, Precision and F1 score.

Recall: Using the recall method, the model output and reference are compared for the number of overlapping n-grams, and then this number is divided by the total number of n-grams in the reference.
Recall = Word count in reference summary / Word count of overlapping words in the reference.
Precision: ROUGE uses this measure to determine the relevance of the generated summary. Instead of dividing by the reference n-gram count, we divide by the output n-gram count when calculating precision.
Precision = Word count in generated summary/Word count of overlapping words in the generated summary
F1-Score: It gives a reliable measure of the performance of generated output based on recall and precision values.
F1 score = 2 * ((precision * recall) / (precision + recall))
Reference:
Lin, Chin-Yew., 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.

Search This Blog

The Data Savvy

ROUGE Metric

Comments

Post a Comment

Popular posts from this blog

Pre-trained Language Models (PTLM) in NLP

Explainable AI (XAI) in Healthcare