テキストから生成されるタプルを用いた 大学入試センター試験日本史問題

人工知能学会研究会資料
SIG-FPAI-B502-10
テキストから生成されるタプルを用いた
大学入試センター試験日本史問題の自動解答
Answering Center Test Questions on Japanese History by Comparing Tuples
Generated from Texts
尾納 宗仁 1∗
Munehito Binou1
吉仲 亮 1
Ryo Yoshinaka1
山本 章博 1
Akihiro Yamamoto1
1
1
京都大学 情報学研究科
Graduate School of Informatics, Kyoto University
Abstract: This paper proposes a method of answering questions on Japanese History for the National
Center Test for University Admissions (Center test). Many of these questions are of the form that answerers
choose one of given sentences in the question. These sentences are correct or wrong, and correct ones should
be implied by history textbooks. We assume that a description of a historical event can be represented with
five attributes, “time”, “person”, “place”, “topic” and “others”, each of which has a set of keywords as its
value. Our method evaluates the correctness of a sentence as to these attributes referring to the subject
discussed in the question and to snippets of a textbook. Snippets can be sections, subsections, paragraphs
or sentences. In our experiment, our results are better than an existing method for the questions on Japanese
History for the Center test in the last 24 years except when sections are used as snippets.
1
Introduction
The purpose of this study is to develop a new method for
answering questions on Japanese History for the National
Center Test for University Admissions (for short, Center
test). The Center test is an entrance examination used by
a lot of universities in Japan. Every year the Center test
consists of tests for several subjects, such as Mathematics, Science, World History and Japanese History. On the
subjects of History, all questions are of multiple choices.
In particular, most questions on Japanese History are intended to be answered by choosing one of given sentences
in them. These sentences are correct or wrong, and correct ones should be implied by knowledge sources, such
as history textbooks. Therefore, answerers are expected to
judge whether each sentence is implied by such knowledge
source. Developing a method for answering such questions is a challenging task in artificial intelligence and a
kind of question answering.
The Center test on Japanese History consists of six parts,
and a direction is given in each part, followed by several
questions and some contexts of a set of questions. Each
∗ 連絡先:京都大学 情報学研究科
〒 606-8501 京都市左京区吉田本町
総合研究 7 号館 3 階 324・327 号室
E-mail: [email protected]
question has an instruction and more than two choices. For
example, Fig. 1 shows an example of such a part on the
Center test. The part begins with a direction, and a context. Several questions follow them. In Fig. 1, we show
a on
only Question 2. In addition, the underlined text ⃝
the context A is referred to the instruction. Question 2 requires answerers to choose a correct sentence among the
given four sentences.
Some methods have been proposed previously to answer such type of questions. Tian and Miyao [6] proposed an approach answering by using semantic representation suited for natural language inference. Kano [2] proposed an approach answering Center test questions on History by keyword distribution. Kano’s method obtained the
best result in the Mock Center Test challenge held by Todai Robot Project [4] and the highest score on NTCIR-10
RITE2 Exam Search Task [7].
The method we propose here does not need a lot of historical documents as the knowledge sources but only a
history textbook used in Japanese high schools. This is
because the Center test questions on History are made up
basically so that the right answer for every question is entailed by any history textbooks.
Our method is summarized as follows. We make it clear
what subject is discussed in the question and what his-
− 33 −
time
person
place
topic
others
第3問 中世の文化・政治・社会に関する次の文章 A・B を読み, 下の
問い(問1∼6)に答えよ。(配点 18)
A 平安末期のあいつぐ戦乱や社会の変化を体験した人々は,心の支え
を求めていた。そのようななかで仏教界でも,武士や庶民など広い層を
a 鎌倉新仏教の開祖のなかで最初
救済の対象にする動きが起こった。⃝
に登場した法然は,旧来のような難しい修行をしなくとも往生できる
と説いた。以後,. . .
∅
{ 源義仲, 平氏 }
{ 北陸道, 西国 }
∅
{ 京, 一門 }
Table 1: The event tuple generated from the sentence on
2 in Fig. 1
the choice ⃝
a に関連して, 法然が活躍した時期の政治状況について述
問2 下線部⃝
1 ∼⃝
4 のうちから一つ選べ。
べた文として正しいものを次の⃝
1 白河天皇が,堀河天皇に譲位し,自らは上皇となって院政を開始
⃝
した。
2 源義仲が,北陸道から入京し,平氏一門を西国へ追った。
⃝
3 北条時頼が,御家人の訴訟を専門に扱う機関として引付を設置し
⃝
た。
4 北条時宗が,元の襲来に備えて九州北部などを御家人に防備させた。
⃝
Tothers is a set of other keywords. An event tuple is generated from a natural language text.
For example, Table 1 shows the event tuple generated
2 in Fig. 1.
from a sentence on the choice ⃝
Fig. 1: Question 2, Part 3, 2010 Academic Year Main Examination: Japanese History B
Definition 1. Let T be an event tuple. If the following
condition holds, we call T the empty event tuple and it is
denoted by T = ⊘.
For all a ∈ {time, person, place, topic, others}, Ta = ∅.
torical event each sentence describes. In order to represent the subject and historical event, we take a tuple-based
approach [3] [5] under the following assumption: a description of a historical event can be represented with the
five attributes “time”, “person”, “place”, “topic” and “others”, each of which has a set of keywords as its value. We
extract keywords from a text and generate a tuple, a data
structure that consists of the five attributes above. Next, we
find out whether or not a history textbook includes some
description of such a historical event related to the subject, and assign a score to each sentence. That score indicates how suitable the sentence is for the content of the
textbook. We compare these scores and append a score
to each choice. By doing so, we can generate our answer
by comparing all choices quantitatively. Our method uses
the scoring function and the score-based answering manner proposed by Kano [2].
In Section 2, we define an event tuple and explain how
to make it. In Section 3, we explain our method. Section
4 shows experimental results and discussion. Finally, we
give our conclusion and future plan in Section 5.
2
2.1
Definition 2. Let S and T be event tuples. The union of
S and T , is denoted by S ⊔ T , is defined to hold
(S ⊔ T )a = Sa ∪ Ta
for all a ∈ {time, person, place, topic, others}.
The relation ⊔ is used for joining the subject described
in a question and the historical event described in the sentence in the question.
Definition 3. Let S and T be event tuples. If the following
condition holds, it is denoted by S ⊗ T .
For all b ∈ {time, person, place, topic},
Sb = ∅ ∨ Tb = ∅ ∨ Sb ∩ Tb ̸= ∅.
In this study, when comparing two event tuples, we regard synonyms as the same word. We treat a word and a
redirected of another in Wikipedia as the synonyms. Also,
we prepare a synonym relation dictionary and make use of
it.
The relation ⊗ is used for filtering the snippets in the
textbook in order to search for some description of a historical event described in the sentence in a question and
related to the subject described in the question.
Preliminaries
Event Tuple
2.2
We call the following data structure T that consists of
five sets of keywords an event tuple.
T = (Ttime , Tperson , Tplace , Ttopic , Tothers ),
where Ttime , Tperson , Tplace , Ttopic are sets of keywords for
“time”, “person”, “place” and “topic” respectively, and
How to Generate an Event Tuple from a
Text
Primarily, we extract a set of keywords from a natural
language text. After extracting the morphemes that match
with the headings of the entries in Japanese Wikipedia,
we let them be the keywords of the text. In order to conduct morphological analysis, we use the JUMAN system,
− 34 −
a Japanese morphological analyzer [1] registering the headings of the entries in Japanese Wikipedia with a JUMAN
dictionary in advance.
For example, we obtain the following set of keywords
2 in Fig. 1.
from the sentence on the choice ⃝
{ 源義仲, 北陸道, 京, 平氏, 一門, 西国 }
After extracting some keywords from a text we generate
an event tuple from them. We classify all extracted keywords into three attributes, “time”, “person” and “place”.
This classification is conducted by checking whether the
keyword is included in the word dictionary that we prepare in advance. Especially to the “time” attribute, we add
numerical keywords by pattern matching to the original
text because we cannot obtain such keywords well by JUMAN system. Also, if some (hyponyms) of “政治”, “経
済”, “文化”, “戦争”, “外交”, “社会”, “教育” or “産業”
exists in extracted keywords, we add the topic words into
the “topic” attribute. Their topic words are given by using
past tests of the Center test as reference. The keywords
that are not relevant to all of above four attributes are assigned to “others”. For example, the keywords “源義仲” ,
“北陸道”, “京”, “平氏”, “一門” and “西国” are assigned
attributes “person”, “place”, “others”, “person”, “others”
and “place”, respectively.
Moreover, if a keyword in the “time” or “topic” attribute
has hypernyms, we add them to the attribute to take the
notation variability into account. By doing this operation,
when comparing two event tuples by using the relation ⊗,
we can recognize that their original texts may describe the
same content as to “time” or “topic”, in the case that one
sentence includes more general keyword than a keyword
included in another text in the same semantic field. For
example, by doing the above operation for two event tuples
S and T so that Stime = { 中世 } and Ttime = { 鎌倉時
代, 12世紀 }, a word “中世” is added to Ttime because
it is a hypernym of “鎌倉時代”. Hence we recognize two
event tuples S and T may describe the same content as to
“time”.
3
Proposed Method
In this section, we explain the method we propose in this
paper in order.
time
person
place
topic
others
{ 中世 }
{ 法然 }
∅
{ 文化, 政治, 社会 }
{ 状況 }
Table 2: The event tuple Q generated from the question in
Fig. 1
3.1
Generating an Event Tuple from a Question
We generate an event tuple Q from a question. A question basically includes three texts: a direction, an underlined text and an instruction. At first, we generate three
event tuples D, UT and I from a direction, an underlined
text and an instruction respectively.
Next, we select one from those, and let it be Q according to the order of importance to the question. We assume
that these three texts have an order of the importance to the
question as follows: instruction, underlined text and direction in order. For example, in the question in Fig. 1, the
direction and the underlined text are less important to the
question than the instruction. This selection is formalized
as the following.


if I ̸= ⊘
 I
Q=
UT if I = ⊘ ∧ UT ̸= ⊘

 D
otherwise.
In the case of the question in Fig. 1, Q = I because I ̸=
⊘.
Moreover, we append Dtime to Qtime and Dtopic to
Qtopic . This is because the Center test questions tend to
describe keywords for “time” and “topic” in the direction.
For example, the obtained event tuple Q generated from
the question in Fig. 1 is shown in Table 2.
3.2
Generating an Event Tuple for Each Choice
Sentence
We generate an event tuple from each choice sentence.
At first, we generate an event tuple CS i from i-th choice
sentene. Moreover, we unite two event tuples Q and CS i ,
and let it be Ci as follows.
Ci = Q ⊔ CS i .
For example, Table 3 shows the union C2 of two event
tuples Q generated from the question in Fig. 1 and CS 2
2
generated from the sentence on the choice ⃝.
− 35 −
time
person
place
topic
others
{ 中世 }
{ 源義仲, 平氏, 法然 }
{ 北陸道, 西国 }
{ 文化, 政治, 社会 }
{ 京, 一門, 状況 }
section
subsection
paragraph
sentence
Table 3: The union C2 of two event tuples in Table 1 and 2
3.3
ESi = {Rj |Rj ⊗ Ci }.
After filtering snippets, if ES ̸= ∅, we select the most
suitable snippet for the content of the choice sentence from
them to score the choice sentence.
We assign a score to each extracted snippet based on
how suitable it is for the content of the event tuple C, and
select the snippet whose score is the highest. In order to
do so, we apply the method in [2]. In the following, let
Twords be Tperson ∪ Tplace ∪ Tothers where T is an event
tuple. At first, we assign a weight wl to each keyword l in
Cwords by using the following weight function.
wl =
1
,
cl z
where cl is the frequency of the keyword l in the textbook
∑ 1
and z is a normalization constant that z =
l cl holds.
Each weight reflects how important a keyword is to the
content of the textbook.
Moreover, we assign a score Sij to each snippet by using the following score function.



Sij =
∑
l∈Ci,words ∩Rj,words

−1
wl −
∑
m∈Ci,words −Rj,words
wm
if Rj ∈ ESi
otherwise.
Obviously, −1 ≤ Sij ≤ 1 holds. This function assumes
that in the case of the correct sentence, its keywords in the
choice sentence should be densely included in a snippet in
the textbook, on the other hand, in the case of the wrong
sentence, its keywords should dispersedly exist within all
snippets.
WRO
58.50
58.75
62.00
63.00
COM
15.17
19.00
15.50
17.25
total
137.09
156.58
156.50
150.75
Table 4: The expected value of the number of correct answers on our method
Searching for a Snippet from a Textbook
After obtaining an event tuple Ci , we search the snippets
in the textbook for some description related to the content
of the event tuple. Snippets can be sections, subsections,
paragraphs or sentences. We append the chapter, section
and subsection titles related to each snippet to it, and generate an event tuple Rj from j-th snippet in the same way
as that mentioned above.
From all of the snippets in the textbook, we extract one
or more snippets that can be relevant to the content of Ci .
This extraction is conducted by using the operation that
we define in Section 2. The set of extracted snippets ES
is represented as follows.
COR
63.42
78.83
79.00
70.50
section
subsection
paragraph
sentence
COR
71.33
78.17
75.92
75.00
WRO
56.50
51.25
59.00
59.00
COM
14.25
17.25
15.75
14.75
total
142.08
146.67
150.67
148.75
Table 5: The expected value of the number of correct answers on Kano’s method
3.4
Generating Our Answer
Let the score of each choice sentence be the maximum
Sij among all of snippets. The snippet whose score is
highest could be the description of the content of the event
tuple Ci . Hence we consider that the higher the score is,
the more the choice sentence matches the content of the
textbook, and generate our answer. In the case that two or
more choices have the same score, we determine one from
them randomly.
4
Experiment and Discussion
We conduct experiments on the questions of the type
choosing the correct sentence (COR), choosing the wrong
sentence (WOR) and choosing the combination of correct
sentences (COM) on 1992-2015 Academic Year Main Examination: Japanese History B. The total number of questions of the type COR is 206, WOR is 169 and COM is 42.
We use “Nihonshi B” published by Tokyo Shoseki as a
textbook. The total number of snippets of the type section
is 109, subsection is 404, paragraph is 1,408 and sentence
is 5,300.
As a result, the expected value of the number of correct
answers is shown in Table 4. Also the results of the method
proposed by Kano in [2] are shown in Table 5.
Kano’s method is a state-of-the-art method for answering Center test question on Japanese History. Kano method
calculates each choice score not by using event tuples but
by using sets of keywords obtained from the question and
snippets in a textbook, and answers such questions. Our
method uses the scoring function and the score-based answering manner of it. Kano’s method treats a word and a
redirected of another in Wikipedia as the synonyms, but
− 36 −
correct sentence
wrong sentence
difference
section
0.626
0.495
0.131
subsection
0.509
0.315
0.194
paragraph
0.389
0.206
0.182
sentence
0.205
0.022
0.182
Table 6: The average score of correct/wrong sentences on
our method
correct sentence
wrong sentence
difference
section
0.699
0.603
0.095
subsection
0.587
0.484
0.103
paragraph
0.490
0.390
0.100
sentence
0.388
0.300
0.088
e アジア太平洋戦争末期の沖縄戦や,それに続
問2 下線部⃝{
く米軍統治の歴史 } に関して述べた文として誤っているもの
1 ∼⃝
4 のうちから一つ選べ。
を次の⃝
2 日米行政協定により,沖縄は GHQ の施政権下におかれる
⃝
ことが確定した。
Fig. 2: Question 6, Part 1, 2013 Academic Year Main Examination: Japanese History B
c 西欧の知識や学問を吸収して,歴史研究を
問5 下線部⃝{
行ったり,新たに自国認識を深めようとしたりした知識人・文
化人が多く現れた } に関連して述べた文として正しいものを
1 ∼⃝
4 のうちから一つ選べ。
次の⃝
Table 7: The average score of correct/wrong sentences on
Kano’s method
does not have any synonym relation dictionaries. Also it
does not take any is-a relation of words.
Although the expected value of the number of correct
answers differs according to the snippet type and question
type, in most cases, that of our method is better than that
of Kano’s method. As a whole, the difference between
the average score of correct sentences and that of wrong
sentences in our method tends to be bigger than in Kano’s
method. Table 6 and 7 show the average scores of correct sentences and of wrong sentences on our method and
Kano’s method.
On Kano’s method, the choice score strongly depends
on a highly weighted keyword. If a highly weighted keyword is extracted from a choice sentence, its score becomes very high even if it is wrong. For example, in the
case of the question in Fig. 2, the weight of a word “日米
行政協定” because it appears only once in the textbook.
Therefore the score Sij of the subsection including it is
high, and the choice score becomes high even though it is
a wrong sentence.
On the other hand, our method extracts snippets based
on four attributes “time”, “person”, “place” and “topic”
before calculating a score for a choice sentence. In the
above question, the snippet including the word “日米行政
協定” does not include a word “沖縄” in its “place” attribute. Therefore our method calculates the choice score
without being affected by a highly weighted keyword so
much. As a result, our method can answer correctly the
question in Fig. 2, but Kano’s method cannot.
However our method sometimes misses the snippet suitable for the description of the choice sentence in the textbook by filtering. For example, the choice sentence in
Fig. 3 is correct and it has a suitable snippet when subsections are used as snippets. Our method omits it by filtering
1 田口卯吉は,文明史論を叙述する立場から『日本開化小
⃝
史』を著した。
Fig. 3: Question 5, Part 1, 2002 Academic Year Main Examination: Japanese History B
because the event tuple of the snippet has a word “欧米”
in its “place” attribute but that of the choice sentence does
not have. For improving our method, we need to prepare
the suitable thesaurus because we obtain a word “西欧”
from the question.
Also, we do not calculate a high score to the correct
sentence with a word not appearing in the textbook. For
example, in the case of the choice sentence in Fig. 4, even
though it is correct, the choice score becomes low because
a word “残留孤児” does not appear in the textbook. Such
sentences are often observed in Center test questions on
Japanese History.
Moreover, the number of questions which our method
answers randomly tends to be larger than Kano’s method.
Table 8 and 9 show that the number of questions which
our method and Kano’s method answer randomly. These
figures show the tendency that the smaller the snippet type
is, the smaller the number of such questions is.
5
Conclusion and Future Work
This paper proposed a method of answering Center test
questions on Japanese History by using event tuples generated from texts. Our method focuses on the attributes
“time”, “person”, “place”, “topic” and “others” of a natural language text and assumes that these attributes represent the historical event described in it.
− 37 −
section
subsection
paragraph
sentence
g 「尋ね人」の放送や,朝鮮特需を論じた新聞
問5 下線部⃝{
記事 } に関連して,占領期の社会状況について述べた文とし
1 ∼⃝
4 のうちから一つ選べ。
て正しいものを次の⃝
1 敗戦による混乱で中国大陸から帰国できず,残留孤児とな
⃝
る人もいた。
Fig. 4: Question 8, Part 6, 2015 Academic Year Main Examination: Japanese History B
COR
75
36
17
6
WRO
14
3
0
2
WRO
10
1
0
0
COM
9
4
2
2
Table 9: The number of questions which Kano’s method
answers randomly
section
subsection
paragraph
sentence
COR
63
21
8
2
COM
11
5
4
3
[in Japanese], IEICE Technical Report, Vol. 111,
No. 474 (2012).
[4] Arai, N.: ロボットは東大に入れるか?- 国立情
報学研究所「人工頭脳」プロジェクト, Transactions of the Japanese Society for Artificial Intelligence (2012).
Table 8: The number of questions which our method answers randomly
Our method tends to calculate a score so that the difference between the average score of correct sentences and
that of wrong sentences is bigger than Kano’s method. It
does not reflect on the results so much, but it may be related to superiority to Kano’s method.
As our future work, we attempt to develop a method of
answering the other type questions, such as chronological
questions. The synonym and hypernym relation dictionary
should be more improved. Additionally, we analyze the
result more in details and improve the way to calculate a
score for a choice sentence.
Acknowledgement
This work was supported by JSPS KAKENHI Grant
Number 26280085.
[5] Shibata, T., Kurohashi, S., Kohama, S. and Yamamoto, A.: Predicate-argument Structure based
Textual Entailment Recognition System of KYOTO
Team for NTCIR-10 RITE-2, Proceedings of the
10th NTCIR Workshop Meeting on Evaluation of Information Access Technologies (NTCITR-10) (2013).
[6] Tian, R. and Miyao, Y.: Answering Center-exam
Questions on History by Textual Inference [in
Japanese], The 28th Annual Conference of the
Japanese Society for Artificial Intelligence (2014).
2A1-4.
[7] Watanabe, Y., Miyao, Y., Mizuno, J., Shibata, T.,
Kanayama, H., Lee, C.-W., Lin, C.-J., Shi, S.,
Mitamura, T., Kando, N., Shima, H. and Takeda,
K.: Overview of the Recognizing Inference in Text
(RITE-2) at NTCIR-10, The 10th Conference of NII
Testbeds and Community for Information access Research (NTCIR-10), p. 385404 (2013).
References
[1] : JUMAN (a User-Extensible Morphological Analyzer for Japanese) Ver.7.0. http://nlp.ist.
i.kyoto-u.ac.jp/EN/index.php?JUMAN.
[2] Kano, Y.: Solving History Problems of the National
Center Test for University Admissions [in Japanese],
The 28th Annual Conference of the Japanese Society
for Artificial Intelligence (2014).
[3] Kitano, T. and Yamamoto, A.: Evaluating Documents in Historical Events by Comparing Tuple
Sets Generated from Predicate-Argument Structure
− 38 −