Improved MMR Technique for Extractive Text Summarization

International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
--------------------------------------------------------------------
Improved MMR Technique for Extractive Text Summarization
Maryam Kiabod*, Mohammad Naderi Dehkordi, Sayed Mehran Sharafi
Department of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan, Iran
*Corresponding author's Email: [email protected]
Abstract
Text summarization is a process that reduces the length of the original text and saves
time by preparing the concept of text for reader. Two kinds of text summarization
approaches are extractive and abstractive techniques. Our technique is based on extractive
approach. In this approach, some scores are assigned to text sentences. The most
important sentences of text are the sentences with the highest scores. One of the methods
which is used in extractive approach is MMR technique. In this paper, we use MMR
technique in combination with graph theory and genetic algorithm to extract the most
important sentences of text. In this approach, the MMR technique is improved and its
equation is changed. Also, genetic algorithm is performed to get better results in
calculating the graph edges weights. The results show that this technique has better
performance than the MMR technique.
Keywords: MMR technique, Genetic Algorithm, Graph Theory, Word Weight,
Sentence Weight
1. INTRODUCTION
As the amount of information grows rapidly, text summarization is getting more
important. It results in a faster availability to the main concept of a text. There are two
kinds of text summarization technique: extractive and abstractive. Extractive technique
selects the most important sentences of the original text and put them together to form the
summary of the text. Abstractive method uses Natural Language Processing to derive the
main concept of the text and produce the summary. In extractive method, two
characteristics should be considered: finding sentences of text which are more important
than others, and the coherence of the summary. One approach to respond to it is to assign
589
International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
-------------------------------------------------------------------some numerical measure to each sentence (called sentence score) and then choose
sentences with the highest scores to form the summary. For this reason, statistical
criterions and semantic relations among words are used to weight each word of the
original text and then calculate the sentence score based on words weights. Some
researches applied TF/IDF (Term Frequency-Inverse Document Frequency) [1], number
of words occurring in title [2], and number of numerical data [3] as statistical criterions to
weight words of the text. Using these criterions does not produce a human quality
summary. As a result, NLP and lexical cohesion [4] are used to guarantee the cohesion of
the summary. In order to apply lexical cohesion, the chains of related words are formed
that represents cohesive structure of the text.
Halliday and Hasan [5] classified lexical cohesion into two categories: Reiteration
category and Colllocation category. Reiteration category considers repetition, synonym,
and hyponyms, while collocation category deals with the co-occurrence among words in
the original text. One of the new methods to determine the most important sentences of
text is MMR technique. The benefit of using this technique is considering both the
maximum relevance novelty and minimum redundancy. This method was improved in
several ways.Another method used in extractive text summarization system is graph
theory. This method assigns a node to each sentence of the text. The edges weights in this
graph are the similarity between two sentences of text. The score of each sentence in this
method is calculated according to the edges weights and nodes weights. We use this
method to improve MMR technique and calculate the importance of a sentence in the
text. Some methods, such as Genetic Algorithm, are performed on problems to get better
results. We use Genetic Algorithm to constitute the sentences vectors of the text. The rest
of this paper is as follow. Section 2 provides a review of previous works on text
summarization systems. Section 3 presents our technique. Section 4 describes
experimental results and evaluation. Finally we conclude and suggest future work in
section 5.
590
International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
--------------------------------------------------------------------
2. Text Summarization Approaches
Automatic text summarization dates back to fifties. In 1958, Luhn [6] created text
summarization system based on weighting sentences of a text. He used word frequency to
specify topic of the text. There are some methods that consider statistical criterions.
Edmundson [7] used Cue method (i.e. "introduction", "conclusion", and "result"), title
method and location method for determining the weight of sentences. Statistical methods
suffer from not considering the cohesion of text. Kupiec, Pederson, and Chen [8]
suggested a trainable method to summarize the original text. In this method, number of
votes collected by the sentence determines the probability of being included the sentence
in summary. Another method includes graph approach proposed by Kruengkrai and
Jaruskululchi [9] to determine text title and produce summary. Their approach takes
advantages of both the local and global properties of sentences.
They used clusters of significant words within each sentence to calculate the local
property of sentence and relations of all sentences in text to determine global property of
text. Beside statistical methods, there are other approaches that consider semantic
relations among words. These methods need linguistic knowledge. Chen, Wang, and
Guan [10] proposed an automated text summarization system based on lexical chain.
Lexical chain is a series of interrelated words in a text. WordNet is a lexical database
includes relations among words such as synonym, hyponymy, meronymy, and some other
relations. Svore, Vander Wende and Bures [11] used machine learning algorithm to
summarize text. Eslami, Khosravyan D., Kyoomarsi, and Khosravi [12] proposed an
approach based on Fuzzy Logic. Fuzzy Logic does not guarantee the cohesion of the
summary of text. Halavati, Qazvinian, Sharif H. applied Genetic algorithm in text
summarization system [13]. Genetic Algorithm also is used in improving content
selection in automatic text summarization by Khosraiyan, Kumarci, and Khosravi[14]. It
was based on statistical tools. Latent Semantic Analysis [15] is another approach used in
text summarization system. Abdel Fattah and Ren [16] proposed a technique based on
Regression to estimate text features weights. In regression model a mathematical function
can relate output to input variables. Feature parameters were considered as input variables
591
International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
-------------------------------------------------------------------and training phase identifies corresponding outputs. There are some methods that
combine algorithms, such as, Fuzzy Logic and PSO [17]. Salim, Salem Binwahla, and
Suanmali [18] proposed a technique based on fuzzy logic. Text features (such as
similarity to title, sentence length, and similarity to keywords, etc.) were given to fuzzy
logic as input parameters. Carbonell and Goldstein [19] presented MMR (Maximal
Marginal Relevance) as text summarization technique. In this approach a greedy
algorithm is used to select the most relevant sentences of text to user query. Another aim
in this approach is minimizing redundancy with sentences already included in the
summary. Then, a linear combination of these two criterions is used to choose the best
sentences for summary. Carbonell and Goldstein [19] used cosine similarity to calculate
these two properties. Xie and Liu [20] used centroid score to calculate the first property
and cosine similarity to compute the second property. Different measures of novelty were
used to adopt this technique by Murray, Renals, and Carletta [21]. Mc Donald [22] used
optimization algorithm to solve the new formulation of the summarization task.
3. Proposed Technique
One of the methods in text summarization is extractive method. The goal in extractive
text summarization approach is assigning numerical scores to sentences and selecting
sentences with the highest scores to form the summary. There are lots of techniques used
for calculating the importance of text sentences, but the newest one is MMR technique.
MMR technique was proposed by Carbonell and Goldstein in 1998 to consider both the
properties of minimal redundancy and maximum relevance novelty. Some techniques
have been proposed to improve this technique. In this article we change the equation of
the MMR technique to improve the performance of this technique. The method proposed
in this article is recursive and composed by four steps as it follows.
1. The input document is transformed into a labeled graph, where each node stands
for a sentence and the edges represent the similarity relationships between the
sentences.
2. The MMR technique is fed on the graph in order to predict the suitable sentences to
be extracted and used in the summary.
592
International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
-------------------------------------------------------------------3. The sentences are sorted according to the score returned by the MMR technique
and those having the largest score are selected and inserted into the summary node
of the graph. Then, the node containing that sentence is removed from the graph.
4. The number of sentences in summary node of the graph determines whether the
algorithm should be terminated or not. If the number of sentences in the summary
node reaches to a threshold, the algorithm terminates, otherwise it returns to the
second step.
In the following, further details on the method are provided. The features we used
as the labels of the graph are discussed. Also, we describe the structure which we
used to implement this technique.
4. Pre-processing
In this step, we prepare the original text to be analyzed in the next sections. For this
purpose, first, we use sentence segmentation to signify the sentences of text. After that,
sentence tokenization is used to identify words of each sentence. Part of speech tagging is
another process that is used to determine the type of each word in the sentence. The
words having part of speech as noun are crucial in any text materials. As a result, we
work on nouns of text. So, we remain nouns of the text and remove every other type of
words from sentences. The final step in this section is pruning text words to reduce the
number of words processed in the next sections. This leads to a faster analysis process.
The criterion we use for pruning words of text is term frequency of words. If term
frequency of a word is less than a special threshold, we will remove that word. We got the
best result for the quarter of sum of term frequency of all text words. As a result, we use
equation )1) to calculate this threshold.
∑
(1)
593
International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
--------------------------------------------------------------------
5. Calculating Weights of Words
In this step, we assign weights to text nouns to be used in the next step. There are
some statistical and linguistic criterions to compute this score. We use three criterions.
Two of them are statistical and one of them is a linguistic criterion. The two statistical
criterions are term frequency of text nouns, and normalized number of sentences
containing that word. We use WordNet as a lexical database to determine the semantic
relations between words. We use equation (2) to compute words weights.
(2)
where α, β, θ are weights of features and determine the importance of these features.
TF represents term frequency of the word, SenCount is the number of sentences
containing the word normalized by total number of sentences of text, and sem-sim
represents the maximum semantic similarity between the word and title. We use WordNet
to determine semantic relations between words. These parameters are in the range of [0,
1] and the sum of them equals to 1.0. We utilize a two layered backpropagation neural
network with nine neurons in hidden layer, maximum error of 0.001, and learning rate of
0.2 to obtain these weights. The dendrites weights of this network are initialized in the
range of [0, 1]. We use sigmoid function as transfer function. The importance of each
parameter is determined by the average of dendrites weights connected to the input
neuron that represents a parameter [23].After training neural network with training dataset
we use weights to calculate the weights of the parameters.
6. The Representation of Text by Graph
In a natural language text the sentences are related to each other. In graph theory the
relatedness of text sentences is represented by the labels of graph edges. We add a node to
the graph containing the sentences of the summary. First, we initiate it by adding the title
of text. In each cycle of the algorithm, the most important sentence, the node with the
highest score, is added to the summary node and the node containing that sentence is
omitted from the graph. Finally, summary node contains the output of the graph. In this
594
International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
-------------------------------------------------------------------work, we consider undirected weighted links. The nodes of the graph contain the
sentences of text. The relatedness of two sentences is measured by the similarity between
the sentences. There are several techniques to compute the similarity between two
sentences. In our case, we use Cosine-Similarity to calculate the similarity value between
two sentences, but we change the way we constitute the vector for each sentence of text.
In this article, we propose Genetic Algorithm to create the vectors of text sentences.
Figure 1 shows the graph model of the text.
S1
Sim 1,2
S2
Sim 1,summ
Sim 1,summ
Summary
sentence
Sim 1,4
Sim 3, summ
3,summ
S3
s
Sim 4,summ
Sim 2,4
S4
Figure 1: The representation of text by graph
Genetic Algorithm [24] is a technique used to locate optimal solutions by working on
a population of potential solutions. Each population is a collection of chromosomes. A
chromosome is formed by a sequence of genes. First of all, Genetic operators are applied
to the current population. Then, a new population is generated. Genetic Algorithm uses a
fitness function and selection mechanism to determine which chromosomes have to
survive and produce the children for the next generation. After several generations,
Genetic Algorithm converges to the optimal solution. We use these advantages of Genetic
Algorithm to find the similarity between sentences of the text. Genetic algorithm fitness
595
International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
-------------------------------------------------------------------function that we used has been shown in equation (3). The parameter setting for Genetic
Algorithm is shown in table 1.
∑
[]
(
[ ])
(3)
Table 1: Parameter setting for GA
Setting Type
Value
Encoding Scheme
Floating-Point
Population Size
100
Generations
2000
Selections
Roulette Wheel
Crossover
One Point
Crossover rate
0.8
Mutation rate
0.06
Elitism
Yes
7. MMR Technique
MMR considers the most relevant sentences and minimal redundancy of summary
sentences at the same time. The final score of a given sentence is calculated as follow.
where D is the average of document vectors, Summ represents the average of sentences
vectors that have been extracted into the summary, and λ is used to adjust the combined
score to emphasize the relevance or to avoid redundancy. The two similarity functions
(i.e. sim1, sim2) represent the similarity of a sentence to the entire document and to the
selected summary, respectively. The sentences with the highest MMR scores will be
iteratively chosen into the summary until the summary reaches a predefined proper size.
In our technique, we change MMR formula to adjust it to the graph theory. In our work,
the MMR score of each node of the graph is calculated by the equation (5)
596
International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
--------------------------------------------------------------------
( )
∑
(5)
where n denotes the number of edges connected to the graph nodes except the summary
node. In this way, the relatedness of the sentence to the remaining sentences of the text is
calculated by getting the average of the similarities between that sentence and the
remaining sentences of the text. As we mentioned before, we use cosine- similarity to
compute the similarity between two sentences and improve it by performing Genetic
Algorithm to create the vector for each sentence of the text. By using the proposed
formula, we encounter a new problem. The problem is that, in the case of having a
sentence with more similarity to some sentences of text and less similarity to other
sentences of text, the technique takes this sentence equal to a node with almost the same
similarity to all of text sentences. To solve this problem, we use the variance measure to
distinguish between two kinds of sentences which were explained. So, the new equation
of MMR technique is changed according to equation (6).
( )
∑
[
]
(6)
where n is the number of text sentences and the variance is calculated by equation (7).
∑
where
(
)
(7)
is the similarity between sentences
similarity between the sentence
and
and avg(
is the average
and other sentences of text. In our opinion, more
significant words of each sentence play more important role in determining similarity
between two sentences. As a result, we rank words of each sentence of text according to
their importance probability (calculated by Genetic Algorithm in the previous section) in
597
International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
-------------------------------------------------------------------decreasing order. Then, we compute the cosine similarity of the vectors of two sentences
to calculate the similarity score in MMR technique. The cosine similarity is calculated by
equation (8).
∑
(8)
√∑
where A, and B are the attribute vectors of the two sentences. These vectors are usually
the term frequency of the words in text. In our technique, these vectors are the ranked
probability of being included each word of the sentence in significant words list in
decreasing order. As mentioned before, we used Genetic Algorithm to determine this
probability. In this way, we use the importance probability of sentence words to make the
vectors of each sentence of the text.
8. Evaluation
Text summarization evaluation is a complicated task. Pyramid method is a new
technique to evaluate the effectiveness of text summarization systems [25]. Using this
method to evaluate our system has some benefits, such as, it considers both of the
analysis of content selection variation in human summaries and making the results less
dependent on the model user for evaluation. The main unit in this method is Semantic
Content Units (SCU). SCU links different surface realizations with the same meaning. In
this technique, a SCU from a lower tier is not included unless all SCUs from higher tiers
are included as well. The number of human summarizers who expressed the SCU in their
summaries determines the weight of each SCU. This is a very labor intensive task. So, a
special annotation tool (DUCView) has been developed to facilitate the process. In this
method, the score which is assigned to each sentence is calculated by dividing the total
SCU weight by the optimal content score. The total SCU weight is determined by
equation (9).
∑
(9)
where i represents the weight of SCU in tier T i . This weight is i for every SCU in tier T i.
Di displays number of SCUs in the summary that appears in Ti.
598
International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
-------------------------------------------------------------------The optimal content score is calculated by equation (10).
∑
| |
∑
∑
| |
|
where X denotes the summary size and j is the first one top down such that the sum of its
cardinality and the cardinalities of tiers above it is greater than or equals to X. We use
DUC20021 as input data to train neural network and test our technique. We compare our
results with the MMR technique proposed by Carbonell and Goldstein to calculate the
properties of the sentence in MMR technique. The sentence vector in their approach
consists of term frequency of sentence words. The results are shown in Figure 2.
Figure 2: The comparison of pyramid score with proposed technique and without it
The results show that using the proposed equation in MMR technique causes better
performance in selecting the most important sentence of text .
Conclusion
In this paper, we improved MMR technique. We applied Genetic Algorithm to
identify the importance of each sentence words. Then, we investigated the use of graph
theory in MMR technique. Finally, we changed the formula of MMR technique and used
variance measure to distinguish the sentences with the more similarity to all of the text
sentences. The evaluation results show that using the proposed equation in MMR
1
www.nlpir.nist.gov
599
International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
-------------------------------------------------------------------technique results to better performance. In future work, we intend to change the fitness
function and to use other similarity factors in MMR technique.
References
[1] M/ Wasson, "Using Leading Text for News Summaries: Evaluation results and implications for
commercial summarization applications", In Proceedings of the 17th International Conference on
Computational Linguistics and 36th Annual Meeting of the ACL, 2007; 1364-1368.
[2] G. Salton, C. Buckley, "Term-weighting approaches in automatic text retrieval", Information Proceeding
and Management , 1998; 24:513-523.Reprinted in:Sparck-Jones, K.; Willet ,P.(eds), Readings in I.Retreival,
Morgan Kaufmann, 1997; 323-328.
[3] C.Y. Lin, "Training a selection function for extraction", In Proceedings of eighth international conference
on Information and knowledge management, Kansas City. Missouri. United States, 1999; pp.55-62.
[4] M. Hoey, " Patterns of Lexis in Text", Oxford: Oxford University Press
[5] M. Halliday, and R. Hasan, "Cohesion in English", London: Longman
[6] H.P. Luhn, "The Automatic Creation of literature abstracts", IBM journal of Research Development,159165, 1958
[7] H.P. Edmundson, "New methods in automatic extraction", Journal of the ACM. 1969. 55-62.
[8] J. Kupiec, J. Pedersen and F. Chen, "A trainable document summarizer", In Proceedings of the 18th
ACMSIGIR Conference, 1955; 68-73.
[9] C. Jaruskululchi, C. Kruengkrai, "text summarization using local and global properties of sentences",
IEEE/WIC international conference on web intelligence , 2003; 13-16.
[10] Y. Chen, X. Wang, L. Guan, L.V.YI. "Automatic text Summarization Based on Lexical chains", in
Advances in Natural Computation, 2005; 947-951.
[11] K. Svore, L. Vanderwende and C. Bures, "Enhancing single-document summarization by combining
Ranknet and third-party sources", In Proceeding of the EMNLP-CoNLL.
[12] F. Kyoomarsi, H. Khosravi, E. Eslami and P. K. Dehkordy, " Optimizing Text Summarization Based on
Fuzzy Logic", In Proceedings of Seventh IEEE/ACIS International Conference on Computer and Information
Science, IEEE, University of shahid Bahonar Kerman, 2008; 347-352.
[13] V. Qazvinian, L. Sharif Hassanabadi, R. Halavati, "Summarization Text with a Genetic Algorithm-Based
Sentence Extraction", International of Knowledge Management Studies (IJKMS), 2008; 4: 426-444.
[14] P.K. Dehkordi, F. Kumarci and H. Khosravi, "Text summarization based on Genetic Programming",
International Journal of Computing and ICT Research, 2009; 3.
[15] S. Hariharan, "Multi document summarization by combinational approach", International Journal of
Computational Cognition, 2010; 8: 68-74.
600
International Journal of Mechatronics, Electrical and Computer Technology
Vol. 4(11), Apr. 2014, pp. 589-601, ISSN: 2305-0543
Available online at: http://www.aeuso.org
© Austrian E-Journals of Universal Scientific Organization
-------------------------------------------------------------------[16] M.A. Fattah, and F. Ren, "Automatic Text Summarization, Proceedings of World of Science",
Engineering and Technology, 2008; 27: pp.195-192.
[17] L. Suanmali, M.S. Binwahlan and N. Salim, "sentence Features Fusion for Text Summarization using
Fuzzy Logic", IEEE, 2009; 142-145.
[18] N. Salim, L. Suanmali and M.S. Binwahlan, " fuzzy swarm based text summarization", journal of
computer science, 2009; 338-346.
[19] J. Carbonell and J. Goldstein "The use of MMR, diversity-based rerunning for reordering documents
and producing summaries", in Proceedings of the Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval, 1998; 335–336.
[20] Xie S. and Liu Y. "Using corpus and knowledge-based similarity measure in Maximum Marginal
Relevance for meeting summarization", in Proceedings of the IEEE International Conference on Acoustics,
Speech and Signal Processing, 2008; 4985–4988.
[21] G. Murray, S. Renals and J. Carletta, "Extractive summarization of meeting Recordings", in Proceedings
of 9th European Conference on Speech Communication and Technology, 2005; 593–596.
[22] R. McDonald, "A study of global inference algorithms in multi-document Summarization", in
Proceedings of the European Conference on IR Research, 2007; 557–564.
[23] S. Zadeh, N. Sharif, "Evaluation of effective parameters and their effect on summarization systems using
neural network (in Persian)", Fifth annual international conference of computer society of iran, 2008.
[24] E.D. Goldberg, "Genetic algorithms in search, optimization, and machine learning", Publication
Addison-Wesley Professional, 1989.
[25] A. Nenkova, R. Passonneau and K. McKeown, "The Pyramid method: Incorporating human content
selection variation in summarization evaluation", ACM Transactions on Speech and Language Processing,
2007; vol. 4, no. 2.
601