人工知能学会研究会資料 SIG-FPAI-B502-10 テキストから生成されるタプルを用いた 大学入試センター試験日本史問題の自動解答 Answering Center Test Questions on Japanese History by Comparing Tuples Generated from Texts 尾納 宗仁 1∗ Munehito Binou1 吉仲 亮 1 Ryo Yoshinaka1 山本 章博 1 Akihiro Yamamoto1 1 1 京都大学 情報学研究科 Graduate School of Informatics, Kyoto University Abstract: This paper proposes a method of answering questions on Japanese History for the National Center Test for University Admissions (Center test). Many of these questions are of the form that answerers choose one of given sentences in the question. These sentences are correct or wrong, and correct ones should be implied by history textbooks. We assume that a description of a historical event can be represented with five attributes, “time”, “person”, “place”, “topic” and “others”, each of which has a set of keywords as its value. Our method evaluates the correctness of a sentence as to these attributes referring to the subject discussed in the question and to snippets of a textbook. Snippets can be sections, subsections, paragraphs or sentences. In our experiment, our results are better than an existing method for the questions on Japanese History for the Center test in the last 24 years except when sections are used as snippets. 1 Introduction The purpose of this study is to develop a new method for answering questions on Japanese History for the National Center Test for University Admissions (for short, Center test). The Center test is an entrance examination used by a lot of universities in Japan. Every year the Center test consists of tests for several subjects, such as Mathematics, Science, World History and Japanese History. On the subjects of History, all questions are of multiple choices. In particular, most questions on Japanese History are intended to be answered by choosing one of given sentences in them. These sentences are correct or wrong, and correct ones should be implied by knowledge sources, such as history textbooks. Therefore, answerers are expected to judge whether each sentence is implied by such knowledge source. Developing a method for answering such questions is a challenging task in artificial intelligence and a kind of question answering. The Center test on Japanese History consists of six parts, and a direction is given in each part, followed by several questions and some contexts of a set of questions. Each ∗ 連絡先:京都大学 情報学研究科 〒 606-8501 京都市左京区吉田本町 総合研究 7 号館 3 階 324・327 号室 E-mail: [email protected] question has an instruction and more than two choices. For example, Fig. 1 shows an example of such a part on the Center test. The part begins with a direction, and a context. Several questions follow them. In Fig. 1, we show a on only Question 2. In addition, the underlined text ⃝ the context A is referred to the instruction. Question 2 requires answerers to choose a correct sentence among the given four sentences. Some methods have been proposed previously to answer such type of questions. Tian and Miyao [6] proposed an approach answering by using semantic representation suited for natural language inference. Kano [2] proposed an approach answering Center test questions on History by keyword distribution. Kano’s method obtained the best result in the Mock Center Test challenge held by Todai Robot Project [4] and the highest score on NTCIR-10 RITE2 Exam Search Task [7]. The method we propose here does not need a lot of historical documents as the knowledge sources but only a history textbook used in Japanese high schools. This is because the Center test questions on History are made up basically so that the right answer for every question is entailed by any history textbooks. Our method is summarized as follows. We make it clear what subject is discussed in the question and what his- − 33 − time person place topic others 第3問 中世の文化・政治・社会に関する次の文章 A・B を読み, 下の 問い(問1∼6)に答えよ。(配点 18) A 平安末期のあいつぐ戦乱や社会の変化を体験した人々は,心の支え を求めていた。そのようななかで仏教界でも,武士や庶民など広い層を a 鎌倉新仏教の開祖のなかで最初 救済の対象にする動きが起こった。⃝ に登場した法然は,旧来のような難しい修行をしなくとも往生できる と説いた。以後,. . . ∅ { 源義仲, 平氏 } { 北陸道, 西国 } ∅ { 京, 一門 } Table 1: The event tuple generated from the sentence on 2 in Fig. 1 the choice ⃝ a に関連して, 法然が活躍した時期の政治状況について述 問2 下線部⃝ 1 ∼⃝ 4 のうちから一つ選べ。 べた文として正しいものを次の⃝ 1 白河天皇が,堀河天皇に譲位し,自らは上皇となって院政を開始 ⃝ した。 2 源義仲が,北陸道から入京し,平氏一門を西国へ追った。 ⃝ 3 北条時頼が,御家人の訴訟を専門に扱う機関として引付を設置し ⃝ た。 4 北条時宗が,元の襲来に備えて九州北部などを御家人に防備させた。 ⃝ Tothers is a set of other keywords. An event tuple is generated from a natural language text. For example, Table 1 shows the event tuple generated 2 in Fig. 1. from a sentence on the choice ⃝ Fig. 1: Question 2, Part 3, 2010 Academic Year Main Examination: Japanese History B Definition 1. Let T be an event tuple. If the following condition holds, we call T the empty event tuple and it is denoted by T = ⊘. For all a ∈ {time, person, place, topic, others}, Ta = ∅. torical event each sentence describes. In order to represent the subject and historical event, we take a tuple-based approach [3] [5] under the following assumption: a description of a historical event can be represented with the five attributes “time”, “person”, “place”, “topic” and “others”, each of which has a set of keywords as its value. We extract keywords from a text and generate a tuple, a data structure that consists of the five attributes above. Next, we find out whether or not a history textbook includes some description of such a historical event related to the subject, and assign a score to each sentence. That score indicates how suitable the sentence is for the content of the textbook. We compare these scores and append a score to each choice. By doing so, we can generate our answer by comparing all choices quantitatively. Our method uses the scoring function and the score-based answering manner proposed by Kano [2]. In Section 2, we define an event tuple and explain how to make it. In Section 3, we explain our method. Section 4 shows experimental results and discussion. Finally, we give our conclusion and future plan in Section 5. 2 2.1 Definition 2. Let S and T be event tuples. The union of S and T , is denoted by S ⊔ T , is defined to hold (S ⊔ T )a = Sa ∪ Ta for all a ∈ {time, person, place, topic, others}. The relation ⊔ is used for joining the subject described in a question and the historical event described in the sentence in the question. Definition 3. Let S and T be event tuples. If the following condition holds, it is denoted by S ⊗ T . For all b ∈ {time, person, place, topic}, Sb = ∅ ∨ Tb = ∅ ∨ Sb ∩ Tb ̸= ∅. In this study, when comparing two event tuples, we regard synonyms as the same word. We treat a word and a redirected of another in Wikipedia as the synonyms. Also, we prepare a synonym relation dictionary and make use of it. The relation ⊗ is used for filtering the snippets in the textbook in order to search for some description of a historical event described in the sentence in a question and related to the subject described in the question. Preliminaries Event Tuple 2.2 We call the following data structure T that consists of five sets of keywords an event tuple. T = (Ttime , Tperson , Tplace , Ttopic , Tothers ), where Ttime , Tperson , Tplace , Ttopic are sets of keywords for “time”, “person”, “place” and “topic” respectively, and How to Generate an Event Tuple from a Text Primarily, we extract a set of keywords from a natural language text. After extracting the morphemes that match with the headings of the entries in Japanese Wikipedia, we let them be the keywords of the text. In order to conduct morphological analysis, we use the JUMAN system, − 34 − a Japanese morphological analyzer [1] registering the headings of the entries in Japanese Wikipedia with a JUMAN dictionary in advance. For example, we obtain the following set of keywords 2 in Fig. 1. from the sentence on the choice ⃝ { 源義仲, 北陸道, 京, 平氏, 一門, 西国 } After extracting some keywords from a text we generate an event tuple from them. We classify all extracted keywords into three attributes, “time”, “person” and “place”. This classification is conducted by checking whether the keyword is included in the word dictionary that we prepare in advance. Especially to the “time” attribute, we add numerical keywords by pattern matching to the original text because we cannot obtain such keywords well by JUMAN system. Also, if some (hyponyms) of “政治”, “経 済”, “文化”, “戦争”, “外交”, “社会”, “教育” or “産業” exists in extracted keywords, we add the topic words into the “topic” attribute. Their topic words are given by using past tests of the Center test as reference. The keywords that are not relevant to all of above four attributes are assigned to “others”. For example, the keywords “源義仲” , “北陸道”, “京”, “平氏”, “一門” and “西国” are assigned attributes “person”, “place”, “others”, “person”, “others” and “place”, respectively. Moreover, if a keyword in the “time” or “topic” attribute has hypernyms, we add them to the attribute to take the notation variability into account. By doing this operation, when comparing two event tuples by using the relation ⊗, we can recognize that their original texts may describe the same content as to “time” or “topic”, in the case that one sentence includes more general keyword than a keyword included in another text in the same semantic field. For example, by doing the above operation for two event tuples S and T so that Stime = { 中世 } and Ttime = { 鎌倉時 代, 12世紀 }, a word “中世” is added to Ttime because it is a hypernym of “鎌倉時代”. Hence we recognize two event tuples S and T may describe the same content as to “time”. 3 Proposed Method In this section, we explain the method we propose in this paper in order. time person place topic others { 中世 } { 法然 } ∅ { 文化, 政治, 社会 } { 状況 } Table 2: The event tuple Q generated from the question in Fig. 1 3.1 Generating an Event Tuple from a Question We generate an event tuple Q from a question. A question basically includes three texts: a direction, an underlined text and an instruction. At first, we generate three event tuples D, UT and I from a direction, an underlined text and an instruction respectively. Next, we select one from those, and let it be Q according to the order of importance to the question. We assume that these three texts have an order of the importance to the question as follows: instruction, underlined text and direction in order. For example, in the question in Fig. 1, the direction and the underlined text are less important to the question than the instruction. This selection is formalized as the following. if I ̸= ⊘ I Q= UT if I = ⊘ ∧ UT ̸= ⊘ D otherwise. In the case of the question in Fig. 1, Q = I because I ̸= ⊘. Moreover, we append Dtime to Qtime and Dtopic to Qtopic . This is because the Center test questions tend to describe keywords for “time” and “topic” in the direction. For example, the obtained event tuple Q generated from the question in Fig. 1 is shown in Table 2. 3.2 Generating an Event Tuple for Each Choice Sentence We generate an event tuple from each choice sentence. At first, we generate an event tuple CS i from i-th choice sentene. Moreover, we unite two event tuples Q and CS i , and let it be Ci as follows. Ci = Q ⊔ CS i . For example, Table 3 shows the union C2 of two event tuples Q generated from the question in Fig. 1 and CS 2 2 generated from the sentence on the choice ⃝. − 35 − time person place topic others { 中世 } { 源義仲, 平氏, 法然 } { 北陸道, 西国 } { 文化, 政治, 社会 } { 京, 一門, 状況 } section subsection paragraph sentence Table 3: The union C2 of two event tuples in Table 1 and 2 3.3 ESi = {Rj |Rj ⊗ Ci }. After filtering snippets, if ES ̸= ∅, we select the most suitable snippet for the content of the choice sentence from them to score the choice sentence. We assign a score to each extracted snippet based on how suitable it is for the content of the event tuple C, and select the snippet whose score is the highest. In order to do so, we apply the method in [2]. In the following, let Twords be Tperson ∪ Tplace ∪ Tothers where T is an event tuple. At first, we assign a weight wl to each keyword l in Cwords by using the following weight function. wl = 1 , cl z where cl is the frequency of the keyword l in the textbook ∑ 1 and z is a normalization constant that z = l cl holds. Each weight reflects how important a keyword is to the content of the textbook. Moreover, we assign a score Sij to each snippet by using the following score function. Sij = ∑ l∈Ci,words ∩Rj,words −1 wl − ∑ m∈Ci,words −Rj,words wm if Rj ∈ ESi otherwise. Obviously, −1 ≤ Sij ≤ 1 holds. This function assumes that in the case of the correct sentence, its keywords in the choice sentence should be densely included in a snippet in the textbook, on the other hand, in the case of the wrong sentence, its keywords should dispersedly exist within all snippets. WRO 58.50 58.75 62.00 63.00 COM 15.17 19.00 15.50 17.25 total 137.09 156.58 156.50 150.75 Table 4: The expected value of the number of correct answers on our method Searching for a Snippet from a Textbook After obtaining an event tuple Ci , we search the snippets in the textbook for some description related to the content of the event tuple. Snippets can be sections, subsections, paragraphs or sentences. We append the chapter, section and subsection titles related to each snippet to it, and generate an event tuple Rj from j-th snippet in the same way as that mentioned above. From all of the snippets in the textbook, we extract one or more snippets that can be relevant to the content of Ci . This extraction is conducted by using the operation that we define in Section 2. The set of extracted snippets ES is represented as follows. COR 63.42 78.83 79.00 70.50 section subsection paragraph sentence COR 71.33 78.17 75.92 75.00 WRO 56.50 51.25 59.00 59.00 COM 14.25 17.25 15.75 14.75 total 142.08 146.67 150.67 148.75 Table 5: The expected value of the number of correct answers on Kano’s method 3.4 Generating Our Answer Let the score of each choice sentence be the maximum Sij among all of snippets. The snippet whose score is highest could be the description of the content of the event tuple Ci . Hence we consider that the higher the score is, the more the choice sentence matches the content of the textbook, and generate our answer. In the case that two or more choices have the same score, we determine one from them randomly. 4 Experiment and Discussion We conduct experiments on the questions of the type choosing the correct sentence (COR), choosing the wrong sentence (WOR) and choosing the combination of correct sentences (COM) on 1992-2015 Academic Year Main Examination: Japanese History B. The total number of questions of the type COR is 206, WOR is 169 and COM is 42. We use “Nihonshi B” published by Tokyo Shoseki as a textbook. The total number of snippets of the type section is 109, subsection is 404, paragraph is 1,408 and sentence is 5,300. As a result, the expected value of the number of correct answers is shown in Table 4. Also the results of the method proposed by Kano in [2] are shown in Table 5. Kano’s method is a state-of-the-art method for answering Center test question on Japanese History. Kano method calculates each choice score not by using event tuples but by using sets of keywords obtained from the question and snippets in a textbook, and answers such questions. Our method uses the scoring function and the score-based answering manner of it. Kano’s method treats a word and a redirected of another in Wikipedia as the synonyms, but − 36 − correct sentence wrong sentence difference section 0.626 0.495 0.131 subsection 0.509 0.315 0.194 paragraph 0.389 0.206 0.182 sentence 0.205 0.022 0.182 Table 6: The average score of correct/wrong sentences on our method correct sentence wrong sentence difference section 0.699 0.603 0.095 subsection 0.587 0.484 0.103 paragraph 0.490 0.390 0.100 sentence 0.388 0.300 0.088 e アジア太平洋戦争末期の沖縄戦や,それに続 問2 下線部⃝{ く米軍統治の歴史 } に関して述べた文として誤っているもの 1 ∼⃝ 4 のうちから一つ選べ。 を次の⃝ 2 日米行政協定により,沖縄は GHQ の施政権下におかれる ⃝ ことが確定した。 Fig. 2: Question 6, Part 1, 2013 Academic Year Main Examination: Japanese History B c 西欧の知識や学問を吸収して,歴史研究を 問5 下線部⃝{ 行ったり,新たに自国認識を深めようとしたりした知識人・文 化人が多く現れた } に関連して述べた文として正しいものを 1 ∼⃝ 4 のうちから一つ選べ。 次の⃝ Table 7: The average score of correct/wrong sentences on Kano’s method does not have any synonym relation dictionaries. Also it does not take any is-a relation of words. Although the expected value of the number of correct answers differs according to the snippet type and question type, in most cases, that of our method is better than that of Kano’s method. As a whole, the difference between the average score of correct sentences and that of wrong sentences in our method tends to be bigger than in Kano’s method. Table 6 and 7 show the average scores of correct sentences and of wrong sentences on our method and Kano’s method. On Kano’s method, the choice score strongly depends on a highly weighted keyword. If a highly weighted keyword is extracted from a choice sentence, its score becomes very high even if it is wrong. For example, in the case of the question in Fig. 2, the weight of a word “日米 行政協定” because it appears only once in the textbook. Therefore the score Sij of the subsection including it is high, and the choice score becomes high even though it is a wrong sentence. On the other hand, our method extracts snippets based on four attributes “time”, “person”, “place” and “topic” before calculating a score for a choice sentence. In the above question, the snippet including the word “日米行政 協定” does not include a word “沖縄” in its “place” attribute. Therefore our method calculates the choice score without being affected by a highly weighted keyword so much. As a result, our method can answer correctly the question in Fig. 2, but Kano’s method cannot. However our method sometimes misses the snippet suitable for the description of the choice sentence in the textbook by filtering. For example, the choice sentence in Fig. 3 is correct and it has a suitable snippet when subsections are used as snippets. Our method omits it by filtering 1 田口卯吉は,文明史論を叙述する立場から『日本開化小 ⃝ 史』を著した。 Fig. 3: Question 5, Part 1, 2002 Academic Year Main Examination: Japanese History B because the event tuple of the snippet has a word “欧米” in its “place” attribute but that of the choice sentence does not have. For improving our method, we need to prepare the suitable thesaurus because we obtain a word “西欧” from the question. Also, we do not calculate a high score to the correct sentence with a word not appearing in the textbook. For example, in the case of the choice sentence in Fig. 4, even though it is correct, the choice score becomes low because a word “残留孤児” does not appear in the textbook. Such sentences are often observed in Center test questions on Japanese History. Moreover, the number of questions which our method answers randomly tends to be larger than Kano’s method. Table 8 and 9 show that the number of questions which our method and Kano’s method answer randomly. These figures show the tendency that the smaller the snippet type is, the smaller the number of such questions is. 5 Conclusion and Future Work This paper proposed a method of answering Center test questions on Japanese History by using event tuples generated from texts. Our method focuses on the attributes “time”, “person”, “place”, “topic” and “others” of a natural language text and assumes that these attributes represent the historical event described in it. − 37 − section subsection paragraph sentence g 「尋ね人」の放送や,朝鮮特需を論じた新聞 問5 下線部⃝{ 記事 } に関連して,占領期の社会状況について述べた文とし 1 ∼⃝ 4 のうちから一つ選べ。 て正しいものを次の⃝ 1 敗戦による混乱で中国大陸から帰国できず,残留孤児とな ⃝ る人もいた。 Fig. 4: Question 8, Part 6, 2015 Academic Year Main Examination: Japanese History B COR 75 36 17 6 WRO 14 3 0 2 WRO 10 1 0 0 COM 9 4 2 2 Table 9: The number of questions which Kano’s method answers randomly section subsection paragraph sentence COR 63 21 8 2 COM 11 5 4 3 [in Japanese], IEICE Technical Report, Vol. 111, No. 474 (2012). [4] Arai, N.: ロボットは東大に入れるか?- 国立情 報学研究所「人工頭脳」プロジェクト, Transactions of the Japanese Society for Artificial Intelligence (2012). Table 8: The number of questions which our method answers randomly Our method tends to calculate a score so that the difference between the average score of correct sentences and that of wrong sentences is bigger than Kano’s method. It does not reflect on the results so much, but it may be related to superiority to Kano’s method. As our future work, we attempt to develop a method of answering the other type questions, such as chronological questions. The synonym and hypernym relation dictionary should be more improved. Additionally, we analyze the result more in details and improve the way to calculate a score for a choice sentence. Acknowledgement This work was supported by JSPS KAKENHI Grant Number 26280085. [5] Shibata, T., Kurohashi, S., Kohama, S. and Yamamoto, A.: Predicate-argument Structure based Textual Entailment Recognition System of KYOTO Team for NTCIR-10 RITE-2, Proceedings of the 10th NTCIR Workshop Meeting on Evaluation of Information Access Technologies (NTCITR-10) (2013). [6] Tian, R. and Miyao, Y.: Answering Center-exam Questions on History by Textual Inference [in Japanese], The 28th Annual Conference of the Japanese Society for Artificial Intelligence (2014). 2A1-4. [7] Watanabe, Y., Miyao, Y., Mizuno, J., Shibata, T., Kanayama, H., Lee, C.-W., Lin, C.-J., Shi, S., Mitamura, T., Kando, N., Shima, H. and Takeda, K.: Overview of the Recognizing Inference in Text (RITE-2) at NTCIR-10, The 10th Conference of NII Testbeds and Community for Information access Research (NTCIR-10), p. 385404 (2013). References [1] : JUMAN (a User-Extensible Morphological Analyzer for Japanese) Ver.7.0. http://nlp.ist. i.kyoto-u.ac.jp/EN/index.php?JUMAN. [2] Kano, Y.: Solving History Problems of the National Center Test for University Admissions [in Japanese], The 28th Annual Conference of the Japanese Society for Artificial Intelligence (2014). [3] Kitano, T. and Yamamoto, A.: Evaluating Documents in Historical Events by Comparing Tuple Sets Generated from Predicate-Argument Structure − 38 −
© Copyright 2024 ExpyDoc