ˀ͉঱նɊʄɔ઴ވȶၢǨȃᜱᏊௌஈौ᛫ู

ˀ͉঱նɊʄɔ઴‫ވ‬ȶၢǨȃᜱᏊௌஈौ᛫ู
♥
1
ᗍၦ ୯ᓜ,♥ Francis Bond,♠ ᗍᥗ ஊТ ♥
{sanae,a.fujino}@cslab.kecl.ntt.co.jp, ♠ [email protected]
♥ NTT ɌɸɾɤɊʑɐʀʌᇒ‫ٮݯ‬ᆖᅛሇਪ, ♠ ঍‫پ‬ᣄ·ᅛሇഷഁ
ȓǼȥȏ
ᣄࡶnj‫ۏ‬DZȐᜯȑୁᐿȍȪȅȈႇȌȭᚩହȐᜯᏈȴ੽
ȆǍǵȮȫȐᜯᏈǭȫୁᐿȍयǺȈൿǹǦᜯᏈȴ᣹ੑ
ǻȭᜯᏈொஆॊᛩื (WSD) ȑnjᢡࢉnjഁୁᛩథ጖఩Ȑ
ൿᛩ᣹ੑ (Fujita et al., 2007) ȦnjഷಓᏠᜃ (Chan et al.,
2007) ȐዖࢠՕʾȍ௣ұȉǤȭǵȊǮᆟǷȮȈǦȭǍ
WSD ȍȆǦȈȑnj‫ۏ‬DZȐІᙦᅛሇǮǤȭǍ୩௵ᜯ
ȉȑnjଲ࡯ǤȬȐਯ෾ȍȑԨȒȌǦȤȐȐnjଲ࡯Ȍǹ
ȉ᮰Ǧዖࢠȴखȁnjੱࣥ Lesk ȴၠǦȭਯ෾ (Baldwin
et al., 2008) Ȧnjযմ঍‫پ‬Ȋഁୁ঍‫پ‬ȐˋୗȴၠǦȁଲ
࡯ǤȬȐਯ෾ (Tanaka et al., 2007) ȌȋǮૂ౮ǷȮȈǦ
ȭǍǹǭǹnjਔǏȑ WSD Ȑ጖఩ȴഁୁᛩథȍȤѵၠ
ǻȭȁȣnj௵ᇵȉȑnjഁୁᛩథ጖఩ȴѵၠǹȌǦਯ෾
ȴૂ౮ǻȭǍ
WSD Ȑ᪣ǹǷȐԗ‫؟‬ȐʸȆȊǹȈnjᑭ‫۔‬ȌᥖȐᜯᏈ
ȴલ‫ކ‬ǻȭȁȣȍӰѧȌ‫ݯ‬ᏕɠʑɘȐഁ኉Ǯ᪣ǹǦȊ
ǦǨǵȊǮǤǴȫȮȭǍҥǪȈnjɈʂɒହǮᑭ‫۔‬ȉǤ
ȮȒnj‫ۏ‬DZȐഷಓ‫ݯ‬ᏕɝʑʄȑѵၠǮ‫آ‬᪣ȉǤȭǍǿ
Ȑȁȣnjૂ౮ਯ෾ȉȑ WSD ȴ 2 ඟ᩷ȍѧdzȭǍ 1 ඟ
᩷ᄑȉȑnj˕͌, ‫ٿ‬ਪ, П͌ྤ, ਗ਼ឤྤȌȋȐ (ʾ͇) য
մɈʂɒȴલ‫ކ‬ǻȭǍযմɈʂɒȑᜯᏈȍතțȈହǮ
ȃnjʅɲʄ 2 ǭȫ 5 ȠȉȐযմɈʂɒ1 ȴၠǦȈ‫ފ‬ᮍȴ
ᙦȌǦnjᜯᏈொஆॊᛩืȍ௣ұȌʅɲʄȴ᜿౒ǻȭǍ
2.1 ᛹፛ᴄɡɔɣɢʓɚ
േɔʌɒɩʌɈȐФ‫ޗ‬ᜯȑnj Lexeed ȐᜯᏈȍȪȅȈɘ
ɉ̝dzǷȮȈǦȭǍǹǭǹnj௵ሱȉȑnj (ʾ͇) যմɈ
ʂɒȴલ‫ކ‬ǻȭȁȣnjേɔʌɒɩʌɈȐᜯᏈɘɉȴʾ
͇যմɈʂɒȚȊᎰ૆ǹnjᛷፙᴂɟɒɡɠʑɘȴ͓ਓ
ǻȭǍͣȊǹȈnjᣜᡜਯ 1 ȐᜯᏈୁȴୁ (1) ȍᆟǻǍୁ
(1) Ȑʿȍ cat ȉᆟǹȁᙦȑnjՋᜯᏈɘɉȍʃʌɈǷȮ
ȈǦȭᜯࣴ‫۔‬ያȐযմɈʂɒnj lvl X ȉᆟǹȁᙦȑnj
ʅɲʄ X ȍǬdzȭʾ͇যմɈʂɒȴᆟǹȈǦȭǍ௵ᇵ
ȉȑnjᜯᏈǮᚩହȐযմɈʂɒȍʃʌɈǷȮȈǦȭ‫ٿ‬
Ռnj௠ѱȐযմɈʂɒȐȡȴၠǦȈǦȭǍ
ᙲ 1ȍnjᛷፙᴂɟɒɡɠʑɘȐହȴᆟǻǍᛷፙɠʑ
ɘ ȑnj 1 ᜯ Ꮘ ࢈ ‫ ف‬5.1 ୁ (ͣ ୁ) ǭ ȫ 17.7 ୁ (KC) ȉ Ǥ
ȭǍǹǭǹnjʾ͇যմɈʂɒȍ᪕ዯǹȁ‫ٿ‬Ռnjʅɲʄ 5
ȉȤ 1 Ɉʂɒ࢈‫ ف‬340.9 ୁ (ͣୁ) ǭȫ 539.7 ୁ (KC) Ȋ
ȌȭǍǵȐȪǨȍnjʾ͇যմɈʂɒȍ᪕ዯǻȭ˲ȍȪ
ȬnjɠʑɘɒɪʑɒɦɒȍࣦDZȌȭǍ
ȑȭǭȍߋȌǦȁȣnjතᡤყߋȌǦ‫ݯ‬Ꮥɠʑɘǭȫȉ
ȤӰѧȌዖࢠȴखȭǵȊǮȉǯȭǍȠȁnjʾ͇যմɈ
ʂɒǮ‫ކ‬ȠȮȒnjᜯᏈǮʸযȍෘȠȭ˲Ȥ‫ۏ‬ǦǍ 2 ඟ
Corpus
᩷ᄑȉȑnj 1 ඟ᩷ᄑȉલ‫ކ‬ǹȁযմɈʂɒȴၠǦȈᜯ
ᏈǿȐȤȐȴલ‫ކ‬ǻȭǍ
KC
ȌǬnj‫ފ‬ᮍȍȑnjേɌʑɪɒ (Bond et al., 2006) ȴ
ၠǦȭǍേɌʑɪɒȑnjᢍ௘ (Lexeed(቎ԗȫ, 2004)) Ȑ
ᜯᏈୁnjͣୁnj୕᏾ (̇‫۔‬Ɍʑɪɒnj̥ʿnj KC) ȍ޼ǻ
ȭɝʃʑɩʌɈnjǬȪȕnjɔʌɒɩʌɈǭȫഁਓǷȮ
ȭǍ ȠȁnjǵȐᢍ௘ȍȑnjᜯᏈඪȍ୩௵ᜯɐɖʑʂɒ
ȉǤȭ୩௵ᜯᜯࣴ‫۔‬ያ (ැԗȫ, 1997) ȐযմɈʂɒ (য
մ߬ॊ) Ǯ̝ˁǷȮȈǦȭǍ
2
ᜯᏈୁ
ͣୁ
Set
Train
Test
Train
Test
Train
Test
ୁହ
޼ឤᜯହ
Еᜯହ
67,202
4,942
106,528
8,942
141,968
5,408
175,709
15,932
133,616
12,416
211,567
12,581
613,216
54,276
432,514
41,019
947,298
53,703
ᙲ 1: ᛷፙ / ɟɒɡɠʑɘହ: ǵǵȉ޼ឤᜯȊȑnj Lexeed
ȐᜯᏈȉɘɉ̝ˁǷȮȁᜯ
2.2 ‫ތ‬ᮏ: ˀ͉঱նɊʄɔȒ઴‫ވ‬
ഷ ಓ ‫ ݯ‬Ꮥ ਯ ෾ Ȋ ǹ Ȉnj Maximum Entropy Method:
MEM(Nigam et al., 1999) ǬȪȕnj Conditional Random
Fields: CRF(Suzuki et al., 2006) ȴၠǦȈ‫ފ‬ᮍȴᙦȌȅ
ˀ͉঱նɊʄɔȒ઴‫ވ‬
௵ሱȉȑnjʾ͇যմɈʂɒȐલ‫ކ‬ୗ෾ȍȆǦȈ᢬ț
ȭǍᜯࣴ‫۔‬ያȑnj 2,710 ȐযմɈʂɒǭȫȌȬnj๛Ƿ 0
ǭȫ 11 ȠȉȐ᩷߯ (ʅɲʄ) ȍѧdzȫȮȈǦȭǍǿȐǨ
1 য մ Ɉ ʂ ɒ ȑnj ʅ ɲ ʄ 2 Ȑ ‫ ٿ‬Ռ �3: ˗ ͎� Ȧ �533: С
͎ ྦ� Ȍ ȋ 9 Ɉ ʂ ɒnj ʅ ɲ ʄ 3 Ȑ ‫ ٿ‬Ռ �4: ̏� Ȧ �706: ཊ ၞ
ྦ� Ȍȋ 30 Ɉʂɒnjʅɲʄ 4 Ȑ‫ٿ‬Ռ �5: ̏ᨮ� Ȧ �760: ̏ࡖ
ྦ� Ȍȋ 136 Ɉʂɒnjʅɲʄ 5 Ȑ‫ٿ‬Ռ �6: ̏ᨮᴑ̏ᇠᴓ� Ȧ
�838: ᬽୋ� Ȍȋ 392 Ɉʂɒȍ᪕ዯǷȮȭǍ
- 568 -
(1) ᪬ᡔ 1
cat
lvl 5
lvl 4
lvl 3
lvl 2
�988: ˣȮྦ (௷͎ (ᇡҾ (ᩰ‫�)))ز‬
�986: ˣȮྦ�
�760: ̏ࡖྦ�
�706: ཊၞྦ�
�533: С͎ྦ�
Ȧ ᒈҼᡔ 1
�988: ˣȮྦ (௷͎ (ᇡҾ (ᩰ‫�)))ز‬
�986: ˣȮྦ�
�760: ̏ࡖྦ�
�706: ཊၞྦ�
�533: С͎ྦ�
-
ȁ2 ǍȌǬnj௵ᇵȉȑnjᔀቜȍȪȭ΋ൿ๧ȡࣸ৆ዾᛩథ
጖఩ȴДҢȊǹȈѵၠǻȭǍ
̥ʿnjѵၠǻȭዾॊȍȆǦȈ᢬țȭǍ CRFȐዾॊȍ
ȑnj uni-gram, bi-gram, ޼ឤᜯȐ҆ऑ 2 ᜯȐጎՌǽȴၠ
Ǧȭ (ᙲ 2)ǍMEMȐዾॊȍȑnj޼ឤᜯᒈᡋȊǿȐ҆ऑ
Ȑᜯnj޼ឤୁˏȐǻțȈȐФ‫ޗ‬ᜯnjǬȪȕnj޼ឤᜯȐ
҆ऑ 3 ୁ‫ݥ‬ȠȉȐୁ‫ݥ‬ѰȴၠǦȭ (ᙲ 3)ǍCRFȊЕDZՐ
Ǻዾॊȴѵၠǹȁ‫ފ‬ᮍȤᙦȌȅȁǮnj୻ȫǭȍዖࢠǮ
ʿǮȅȁȁȣnjǵǵȉȑȊȬǤǴȌǦǍᙲ 2, 3ȉnj bk
ȑ k ႃᄑȐᜯȐԗࣸnj wk ȑᙲ߯ࣸnj p1k , p2k , p3k ȑ
ǿȮȀȮnj֕ᜓ, ֕ᜓጅѧᬨ 1, ֕ᜓጅѧᬨ 2 ȴᆟǻǍ
Sample ȑnjୁ (1) Ȑ 5 ႃᄑȐᜯ (i = 5) ǖᣜᡜǗȴ޼ឤ
ᜯȊǹȁ‫ٿ‬ՌȐዾॊȐʸᤘȉǤȭǍ
Type
Template
uni- �bk �, �wk �,
gram �p1k �, �p2k �, �p3k �
ጎ
�bk , wk �, �bk , p1k �,
Ռ
�bk , p2k �, �bk , p3k �,
ǽ
�wk , p1k �, �wk , p2k �,
�wk , p3k �, �p1k , p2k �,
�p1k , p3k �, �p2k , p3k �
bi�bk , bk+1 �,
gram �wk , wk+1 �,
�p1k , p1k+1 �,
�p2k , p2k+1 �,
�p3k , p3k+1 �
Sample
�ᒈҼᡔ�, �ᒈҼᡔ�,
�Ցᜓ�, �Ցᜓ - ʸᒦ�
�ᒈҼᡔ, ᒈҼᡔ�,
�ᒈҼᡔ, Ցᜓ - ʸᒦ�,
�ᒈҼᡔ, Ցᜓ�,
�ᒈҼᡔ, Ցᜓ - ʸᒦ -*�,
�Ցᜓ, Ցᜓ - ʸᒦ -*�
�ᒈҼᡔ, ȴ�,
�ᒈҼᡔ, ȴ�,
�Ցᜓ, Ҩᜓ�,
�Ցᜓ - ʸᒦ, Ҩᜓ - ౦Ҩᜓ�,
�Ց - ʸᒦ -*, Ҩ - ౦Ҩᜓ - ʸᒦ�
ᙲ 2: CRF ȉѵၠǹȁዾॊ: ǵǵȉnj i ႃᄑȐᜯǮ޼ឤᜯȊ
ǻȭȊnj uni-gram ȊጎՌǽȉȑnj k = i−2, ..., i+2nj bi-gram
ȉȑnj k = i − 2, ..., i + 1Ǎ
ȴ ᣜᡜ 1
-
�2003: ଁ፪�
�1920: ұϜ�
�1560: ᙨ༸�
�1236: ̏ᨮยҾ�
�1235: ˴�
ǻȭ ̍ 4
-
�4: ̏�
�4: ̏�
�4: ̏�
�4: ̏�
�3: ˗͎�
ǖɢʂȽɩʑǗȊǦǨᜯȑnjʅɲʄ 2 ȉȑnj �3: ˗͎�
ǭ �533: С͎ྦ� ǹǭԮȬखȌǦǍǹǭǹnj௵ਯ෾ȉ
ȑnj �388: ‫ځ‬ਬ� ȌȋȐɈʂɒȤ᣹ੑՀᐺȉǤȭǍǿ
ǵȉnjǵȐȪǨȌɁʂʑȴ΋ൿǻȭȁȣnj௣ȬखȌǦ
ɈʂɒǮ᣹ੑǷȮȁ‫ٿ‬ՌnjCRFȉȑnjՀᐺȌɈʂɒȐ
ˏȉ௠ᬚȐɈʂɒȚȊ‫ۆ‬௕ǻȭǍȠȁnjMEMȉȑnjՀ
ᐺȌɈʂɒȐˏȉ௠Ȥᆂ࿲Ȑ᮰ǦɈʂɒȚȊ‫ۆ‬௕ǻ
ȭ3 Ǎᙲ 4ȍǬǦȈnjǖ΋ൿ҆Ǘȉᆟǹȁዖࢠȑnjલ‫ކ‬
጖఩ǿȐȠȠȐዖࢠȉǤȬnjǖ΋ൿऑǗȉᆟǹȁዖࢠ
ȑnj௣ȬखȌǦɈʂɒȴ΋ൿǹȁ‫ٿ‬ՌȐዖࢠȉǤȭǍ
ᙲ 4Ȑ ΋ ൿ ҆ Ȑ ጖ ఩ ǭ ȫnjCRFȑ Ȫ Ȭ ๛ Ǧ ʅ
ɲ ʄ ȉ Ȑ ዖ ࢠ Ǯ ත ᡤ ყ ᮰ Ǧ ǵ Ȋ Ǯ ȱ ǭ ȭǍ ǹ ǭ
ǹnjCRFȑMEMȪ Ȭ ‫ ۏ‬DZ Ȑ ஓ ᨬ Ȋ ɺ ɻ ʃ ȴ न ᛋ Ȋ
ǻȭǍǿǵȉnjǦDZȆǭȐΥ (∗ ȴ̝ˁǹȁହΥ) ȑnj
p2 ȴၠǦȌǦȉखȁǍͅǹnj p2 ȴၠǦȌǦ‫ٿ‬Ռnjዖ
ࢠȑ 0.1-0.2 % ᇢࢠ঄DZȌȭ4 Ǎ
ᙲ 4Ȑ ΋ ൿ ҆ Ȑ ጖ ఩ ȉ ȑnj Ǧ DZ Ȇ ǭ Ȑ ఎ ̪
ȉnjMEMȐ ୗ ǮnjCRFȪ Ȭ ᮰ Ǧ ዖ ࢠ ȴ ѡ ǹ Ȉ Ǧ
ȭǍǹǭǹnj΋ൿୗ෾ȑMEMȐୗǮ௣ѵȉǤȭȍȤᨲ
ȱȫǼnj΋ൿऑȑnjCRFɲʑɒȐዖࢠȐୗǮЕȈ᮰DZ
ȌȅȈǦȭ5 Ǎ ΋ൿऑȑCRFȐୗǮЕȈ᮰DZȌȅȁထၥ
ȊǹȈnjMEMȑ௠ѱǭȫᬚࢠȍᥔ༵ǮǬǭȮȁ‫ݯ‬Ꮥୗ
෾ȉǤȭȐȍ޼ǹnjCRFȑතᡤყያѰȊǹȈȐ‫ࣱ܋‬ॊ
ȍᥔ༵ǮǬǭȮȁ‫ݯ‬Ꮥୗ෾ȉǤȭȁȣnjᬚࢠȍȪȭ΋
ൿȑCRFȍ޼ǹȈȪȬұ఩ყȉǤȭȊǦǨထၥǮᏤǪ
ȫȮȭǍ
Sample
�᪬ᡔ�, �ᒈҼᡔ�, �̍�
�ᣜᡜ�, �ᣜᡜ�,
�Ցᜓ�, �Ցᜓ - Ɏ‫ۆ‬રጬ -*�
�ȴ�, �ᡔȴ�, �Ҽᡔȴ�
�ǻ�, �ǻȭ�, �ǻȭ̍�
ᙲ 4ǭȫnj΋ൿऑȐዖࢠȑnj΋ൿ҆ȍතțȈnjǦǼ
ȮȤʾ୷ǹȈǦȭǍǿȐȁȣnj൨ሱȐᜯᏈொஆॊᛩื
ȉȑnj΋ൿऑȐ጖఩ȴѵၠǻȭǍ
ᙲ 3: MEM ȉѵၠǹȁዾॊ: ǵǵȉnj i ႃᄑȐᜯǮ޼ឤᜯ
ȊǻȭȊnj uni-gram ȉȑnj j = i − 1, ..., i + 1Ǎ
௵ሱȉȑnj҆ሱȉ࿫खǹȁʾ͇যմɈʂɒȴၠǦȁᜯ
Ꮘொஆॊᛩื (WSD) ȍȆǦȈ᢬țȭǍȠǼnj WSD Ȑ
Type
Template
�b
�
� l� � �
uni-gram �b j ,� w
� j ,�
p1 j , p3 j
ୁ‫ݥ‬Ѱ
�cb1i �, �cb2i �, �cb3i �
�ca1i �, �ca2i �, �ca3i �
Ф‫ޗ‬ᜯ
2.3 ጘఫȌឈᝊ: ˀ͉঱նɊʄɔ઴‫ވ‬
ʾ͇যմɈʂɒȐલ‫ކ‬጖఩ȴᙲ 4ȍᆟǻǍɲʑɒʂȽʌ
(BL) ȑnjᛷፙɠʑɘˏȐ௠ᬚȐযմɈʂɒȴ᣹ੑǹȁ
‫ٿ‬ՌȐዖࢠȉǤȭǍ௵ਯ෾ȉȑnjՋʅɲʄȠȉȐЕȈ
ȐɈʂɒǮ᣹ੑՀᐺȍȌȅȈǦȭȁȣnjᜯȍȪȅȈȑ
௣ȬखȌǦɈʂɒǮ᣹ੑǷȮȭ‫ٿ‬ՌǮǤȭǍͣǪȒnj
2 ‫ފ‬᩾ȑnj Support Vector Machine (SVM, (Chang and Lin,
2001)), ȉȤ‫ފ‬ᮍǹȁǮnjMEMȪȬዖࢠǮ͈DZnjஓᨬȤ᫕ࡶȍ
ǭǭȅȁȁȣǵǵȉȑԮȬʾǴȌǦ
3
ᜱᏊௌஈौ᛫ู
ȁȣȐዾॊȍȆǦȈ᢬țȭǍ WSD ȐɌʌɟɒɡȉǤ
ȭ SENSEVAL-2 ୩௵ᜯᢍ௘ɘɒɈȍǬǦȈnj௠Ȥ᮰Ǧ
ዖࢠȴखȁఆၤȫ (2003) Ȑɐɒɟɹ (̥ʿnjMRT) ȴȝ
ȞЩ‫ފ‬ᚓǹnjਔǏȐɐɒɟɹȊතᡤǻȭǍਔǏǮЩ‫ފ‬
ᚓǹȁɐɒɟɹ (̥ʿnjCRL) ȊnjMRTȐᣥǦȍȆǦȈ
ȑ̥ʿȍ᢬țȭǍ
3 MEMȉȑnjЕɈʂɒȐᆂ࿲ǮአӾȍ࿫खȉǯȁȁȣǍ
4 ∗ ȴ̝ˁǹȁఎ̪̥‫ی‬ȉතᡤǹȁ‫ٿ‬ՌǍ
5 ‫ފ‬᩾ȍȑnjMEMȐ጖఩ȴnj CRF ȊՐഇȍnjՀᐺȌ௠ᬚ
ȐযմɈʂɒȍ΋ൿǻȭୗ෾Ȥ᜗ǹȁǮnjዖࢠȑȦȦʿǮȅ
ȁǍ
- 569 -
Corpus
Lvl
2
3
4
5
BL
91.3
83.5
79.2
70.1
ᜯᏈୁ
΋ൿ҆
CRF
96.0
92.0
90.6
85.9∗
MEM
95.4
90.8
89.3
85.1
΋ൿऑ
CRF
96.3
92.5
91.2
86.7∗
MEM
95.7
91.4
90.2
86.6
BL
87.4
80.1
76.7
67.7
ͣୁ
΋ൿ҆
CRF
88.7
84.0
82.0
77.9∗
KC
΋ൿऑ
MEM
89.4
84.3
80.8
75.4
CRF
92.0
87.6
85.7
81.9∗
MEM
91.8
87.4
84.9
81.0
BL
90.0
83.0
80.0
70.6
΋ൿ҆
CRF
93.3
89.8∗
88.2∗
MEM
95.3
91.8
89.4
86.6
΋ൿऑ
CRF
96.3
93.4∗
91.9∗
MEM
95.8
92.8
90.8
88.8
ᙲ 4: ʾ͇যմɈʂɒલ‫ކ‬጖఩ (CRF/MEM): ͅǹnj ∗ ȴ̝ˁǹȁହΥȑnj p2 ȴዾॊȊǹȈѵၠǹȈǦȌǦǍ
MRTȑnjᙲ 3Ȑዾॊ̥‫ی‬ȍnj൨Ȑ (a)-(c) Ȑ঍‫پ‬ȴѵ
ၠǹȈዾॊȴ͓ਓǹȈǦȭǍ (a) KNP ȍȪȭഁୁᛩథ
጖఩nj (b) ‫ؤ‬௘᭗Ȍȋȉ௘ᬨȐѧᬨȍၠǦȫȮȭ‫ا‬᩾Ӱ
ᣏѧᬨ෾ (UDC) ȐɌʑɢnj (c) ୩௵ᜯȐɐɖʑʂɒȉ
Ǥȭѧᬨᜯࣴᙲ (‫ا‬ሥ‫ا‬ᜯᅛሇਪ, 2004) ȐѧᬨႃՆǍ
ʾ᛺ (a)-(c) ȐǨȃnj௵‫ފ‬ᮍȉȑ (a) Ȋ (b) ȑѵၠǹȌ
ǦǍ (a) ȴѵၠǹȌǦထၥȑnj WSD Ȑ጖఩ȴഁୁᛩ
థȍѵၠǻȭȁȣnjഁୁᛩథȴ WSD Ȑ҆ѕထȊǹȈ
ȑᙦȌȱȌǦȁȣȉǤȭǍ (b) ȴѵၠǹȌǦထၥȑnj
UDC ɌʑɢǮേɌʑɪɒȍȑ̝ˁǷȮȈǦȌǦȁȣȉ
ǤȭǍ
ǵǵȉnj (c) Ȑѧᬨᜯࣴᙲȑ୩௵ᜯዯ 96,000 ᜯǮԬ
᧎ǷȮȈǬȬnj๛Ƿ 5 Ȑ௲ഁᣈȍȌȅȈǦȭǍ௠ѱȐ
ʅɲʄȉȑ 4 ɈʂɒȍȱdzȫȮnjʅɲʄ 3 ȉ 95 Ɉʂ
ɒnjʅɲʄ 5 ȉ 895 ɈʂɒȍѧdzȫȮȭǍᜯࣴ‫۔‬ያȤ
ѧᬨᜯࣴᙲȤnjМȍ୩௵ᜯȐɐɖʑʂɒȉǤȭǮnjᜯ
ࣴ‫۔‬ያǮ˕ȍʸᒦՑᜓȴѧᬨǻȭȁȣȍ͓ȫȮȁȐ
ȍ޼ǹnjѧᬨᜯࣴᙲȑnjഷᐺᜯȴ՜ȢЕȈȐᜯȴѧᬨ
޼ឤȊǹȈǦȭǍ (c) ȍȆǦȈnjMRTȑnjʅɲʄ 3 Ȋ 5
ȐɈʂɒȴˋୗѵၠǹȈǦȭǍǹǭǹnjअȫȑ‫ݥ‬᫘ȍ
ɷɜɚǹȁ௠ѱȐɈʂɒȴѵၠǹȈǬȬnj௵ᇵȐȪǨ
ȍȪȬᣱѨȌɈʂɒȴલ‫ކ‬ǻȭȪǨȌǵȊȑǹȈǦȌ
ǦǍᜯࣴ‫۔‬ያȊѧᬨᜯࣴᙲǭȫखȫȮȭɘȽɰȊዂࢠ
ȑႇȌȬnjྦྷȍѧᬨᜯࣴᙲȍȑഷᐺᜯǮ՜ȠȮȭǵȊ
ǭȫnjႇȌȭұ఩ǮखȫȮȭȊᏤǪȫȮȭȁȣnjਔǏ
Ȥ (c) ȐዾॊȑѵၠǻȭǍ
ȠȁnjMRTȑnj JUMAN/RWC Ȑࣸ৆ዾᛩథ጖఩ȴ
ˋୗѵၠǹȈǦȭǮnj௵ᇵȉȑᔀቜȍȪȭࣸ৆ዾᛩథ
጖఩ȐȡѵၠǹȈǦȭǍ
ȆȠȬnjCRL ၠȍȑѧᬨᜯࣴᙲǭȫ࿫खǹȁዾॊȴ
ᙲ 3ȍᢲҥǹnjਔǏȐɐɒɟɹȍȑnjѧᬨᜯࣴᙲǬȪȕ
લ‫ކ‬ǹȁʾ͇যմɈʂɒȍȪȭዾॊȴᢲҥǻȭǍǵȐ
ஓnjલ‫ކ‬ǹȁʅɲʄȪȬʾ͇ʅɲʄȐযմɈʂɒȤѵ
ၠǻȭǍͣǪȒnj҆ሱȉʅɲʄ 3 ȐযմɈʂɒȴલ‫ކ‬
ǹȁ‫ٿ‬Ռnjʅɲʄ 2 ȐযմɈʂɒȤዾॊȊǹȈᢲҥǹ
ȈǦȭǍ
‫ފ‬ᮍȍȑnj SENSEVAL-2 ȉȐ޼ឤᜯ (ՑᜓnjҼᜓՋ
50 ᜯ) ȴၠǦȁǍͅǹnj Lexeed ȍȌǦ 2 ᜯnjǬȪȕnj
ᛷፙ / ɟɒɡɠʑɘȐǦǼȮǭȍѡဎǹȌǭȅȁᜯȴᩣ
ǦȈǦȭǍ‫ފ‬᩾Ȑ޼ឤᜯହȴᙲ 5ȍᆟǻǍ
ਔǏȑnjMRTȊՐഇnjᜯȊ֕ᜓȐጎՌǽඪȐɻɠʄ
ȴ͓ਓǹȁǍȠȁnjMRTȑnj SVM ȊɣȽʑɯɲȽɓ
ȐˋୗȴጎȡՌȱǽȈѵၠǹȈǦȭǮnj௵‫ފ‬ᮍȉȑ
SVM(Chang and Lin, 2001) ȐȡȴѵၠǹȈǦȭǍ
ȠȁnjMRTȉȑnj‫ۏ‬ᬇ࣑ɄʑɦʄȴѵၠǹȈǦȭǮnj
ਔǏȐ‫ފ‬ᮍȉȑፍࣸɄʑɦʄȐୗǮዖࢠǮ᮰DZȌȅȁ
ȁȣnjፍࣸɄʑɦʄȴѵၠǹȁǍ
Corpus
No.
ᜯᏈୁ
ͣୁ
KC
Ցᜓ
Wd
44
41
49
Pol
6.4
6.6
6.3
Ҽᜓ
Wd
46
46
49
Pol
9.6
9.4
10.4
Ռᛱ
Wd
90
87
98
Pol
8.1
8.1
8.4
ᙲ 5: WSD Ȑ޼ឤᜯହ:Wd ȑ޼ឤᜯହnj Pol ȑ࢈‫ۏف‬Ꮘହ
3.1 ጘఫȌឈᝊᴏᜱᏊௌஈौ᛫ู
ᙲ 6 ȍ WSD Ȑ጖఩ȴᆟǻǍɲʑɒʂȽʌ (BL) ȑnjᛷ
ፙɠʑɘȐˏȉȐ௠ᬚᜯᏈȴ᣹ੑǹȁ‫ٿ‬ՌȐዖࢠȉǤ
ȭǍȠȁ BL2 ȑnj҆ሱȉલ‫ކ‬ǹȁʾ͇যմɈʂɒȴຕ
ȁǻ௠ᬚᜯᏈȴ᣹ੑǹȁ‫ٿ‬ՌȐዖࢠȉǤȭǍ
ᙲ 6ȍ Ǭ Ǧ ȈnjSCRFȑCRFnjSMEMȑMEMȍ Ȫȅ
Ȉલ‫ ކ‬/ ΋ൿǹȁʾ͇যմɈʂɒȴၠǦȁɐɒɟɹȉǤ
ȭǍЕȈȐ጖఩ȑɲʑɒʂȽʌ (BL) ȪȬ௣যȍତᒼǷ
ȮȈǦȭǍᜯᏈୁȴᩣǯnjSCRF Ȑ጖఩Ǯ௠ȤȪǦǍ
ᙲ 6ǭȫnjʾ͇যմɈʂɒȴຕȁǻ௠ᬚᜯᏈȴ᣹ੑ
ǹȁ‫ٿ‬Ռ (BL2) ȉȤnj᮰ǦዖࢠȉᜯᏈȴલ‫ކ‬ǻȭǵȊ
ǮȉǯȁǍʸᒦȍnj᩷߯Ǯ๛DZȌȭȊnjʾ͇যմɈʂ
ɒȐલ‫ކ‬ᒈ͌ȐዖࢠȑʿǮȭȍȤǭǭȱȫǼnjȪȬ๛
ǦʅɲʄȐযմɈʂɒȴၠǦȭୗǮnj WSD Ȑዖࢠᒈ
͌ȑՕʾǹȈǦȭǍ
4
ឈᝊȌ̖ओȒ᜽ᬡ
௵ᇵȉȑnj WSD ȍǬdzȭʾ͇যմɈʂɒલ‫ކ‬Ȑ௣ұ
ॊȴᆟǹȁǍǹǭǹnjʾ͇যմɈʂɒǮொஆॊȐ҃
๯ȍұ఩ǮȌǦ‫ٿ‬ՌȤ‫عݦ‬ǻȭǍͣǪȒǖȻȽʆʌǗ
ȑnj ”ࡤ‫ػ‬ȐǹȱȴȐȒǻȐȍ͞ǨᦀᚥȐᥔǦǵȈǍ”
Ȋnj “᮸ȐධȴȃȄȫǽȭᦀᚥȐǵȈǍ” Ȑ˳ȆȐযմ
ȴ੽ȆǍȻɧɟȽɘʑȑǵȮȫȐᜯᏈȴ‫ݿ‬ЕȍӬѳȉ
ǯȭǮnjˋୗȊȤnj �915: ‫ࢥޗ‬ၢС� Ȋ �969: ᪮Ҥഹ
‫ �؀‬ȍʃʌɈǷȮȈǬȬnjযմɈʂɒǭȫᜯᏈȴጚȭ
ǵȊȑȉǯȌǦǍǹǭǹnjȝȊȵȋȐ‫ۏ‬Ꮘᜯȍ޼ǹȈ
ȑnj௵ਯ෾ȑ௣ұȉǤȭǍ
- 570 -
Corpus
BL
CRL
BL2
(CRF
΋ൿऑ
ѵၠ)
BL2
(MEM
΋ൿऑ
ѵၠ)
SCRF
SMEM
Lvl
2
3
4
5
2
3
4
5
2
3
4
5
2
3
4
5
Ցᜓ
74.5
81.1
76.8
80.8
80.9
83.4
77.0
81.1
81.3
82.6
81.3
81.5
81.6
81.7
81.5
81.3
81.7
81.6
ᜯᏈୁ
Ҽᜓ ࢈‫ف‬
56.8
65.3
59.9
60.6
61.6
67.4
58.5
60.3
61.3
66.6
65.6
66.1
66.3
67.2
65.3
65.2
65.2
65.5
63.8
71.5
66.5
68.5
69.2
73.7
65.8
68.5
69.1
72.8
71.8
72.2
72.3
72.9
71.7
71.6
71.7
71.8
KC
Ցᜓ
ͣୁ
Ҽᜓ
࢈‫ف‬
Ցᜓ
Ҽᜓ
࢈‫ف‬
63.7
79.5
66.9
69.1
71.0
76.3
65.0
69.1
69.7
72.6
79.5
79.5
79.5
80.1
78.5
78.5
78.9
79.2
56.2
68.5
58.8
60.5
61.3
65.2
58.0
60.5
61.6
63.1
68.3
68.5
68.8
69.2
68.3
68.3
68.3
67.9
58.3
71.6
61.0
62.8
64.0
68.3
59.9
62.9
63.9
65.7
71.4
71.6
71.7
72.3
71.1
71.1
71.2
71.0
69.2
80.9
69.9
75.0
76.7
62.1
67.0
63.4
65.4
68.0
66.1
74.7
67.0
70.7
72.8
70.5
74.2
75.4
77.2
81.3
81.5
81.3
62.4
63.3
64.3
67.5
67.0
67.0
67.0
66.9
69.3
70.4
72.9
74.9
75.1
74.9
79.9
79.8
79.8
79.7
66.9
66.9
66.6
66.7
74.1
74.1
73.9
73.9
ᙲ 6: ᜯᏈொஆॊᛩื጖఩ (SVM)
Ƞȁnjʾ͇যմɈʂɒલ‫ކ‬Ȑ጖఩ȐȡȴഁୁᛩథȌ
ȋȍѵၠǻȭ˲ȤȉǯȭǍ‫ފ‬᩾nj Fujita et al. (2007) ȍ
ȪȭȊnjഁୁᛩథȐൿᛩ᣹ੑȍǬǦȈʅɲʄ 2 Ȑযմ
ɈʂɒǮ௠ȤዖࢠՕʾȍ‫ޜ‬ˁǹȈǦȭǍ
̔ऑȐ᜻ᬟȊǹȈȑnjᓦᜯቚȐ̛ȐᛮᜯȉȤՐഇȐ
ұ఩ȴखȫȮȭǭȋǨǭ‫ފ‬ᮍȴᙦȌǦȁǦǍȠȁnj௵
ᇵȉȑnjɪɜɊʑɑӝǷȮȁ CRF Ȑ‫ݯ‬Ꮥɝʑʄȴၠ
ǦȁǮnjࣸ৆ዾᛩథɝʑʄȉǤȭ mecab(Kudo et al.,
2004) ȉȐ‫ފ‬ᚓȊՐഇȍnjՀᐺȌ‫ݥ‬᫘Ȋʾ͇যմɈʂɒ
ȐጎՌǽȴᢍ௘ȊǹȈრ᧎ǹnjጎȡՌȱǽȴѺᩛǻȭ
ǵȊȉnjዖࢠȊ‫ݯ‬ᏕᣇࢠȐՕʾȴȑǭȬȁǦǍ
5
ǮȳȮȏ
௵ᇵȉȑnjʾ͇যմɈʂɒલ‫ކ‬ȴၠǦȁᜯᏈொஆॊᛩ
ื (WSD) ୗ෾ȴૂ౮ǹȁǍ௵ਯ෾ȉȑnjȠǼʾ͇যմ
Ɉʂɒȴલ‫ކ‬ǹȈǭȫnjǿȐલ‫ކ‬጖఩ȴၠǦȈ WSD
ȴᙦȌǨǍʾ͇যմɈʂɒȐલ‫ކ‬ȉȑnj CRF Ȋ MEM
ȴၠǦȁ‫ފ‬ᮍȴᙦȌǦnjМȍ᮰ǦዖࢠȴखȁǍȠȁnj
WSD ȉȤnj SENSEVAL-2 ȉ௠Ȥ᮰Ǧዖࢠȴѡǹȁୗ
෾ȪȬ᮰Ǧዖࢠȴखȭ˲ǮȉǯȁǍǵȮȍȪȬnjૂ౮
ਯ෾ȉǤȭʾ͇যմɈʂɒલ‫ކ‬ȴၠǦȁ WSD ȑұ఩
ყȉǤȭȊǦǪȭǍ
ᝮᢏ
௵ᅛሇȉѵၠǹȁ CRF Ȑ‫ݯ‬Ꮥɝʑʄȑnj᥹௲໗භȍछૂͬǦ
ȁȂǦȁྤ (Suzuki et al., 2006) ȉǻǍǵȐ‫ٿ‬ȴΡȬȈǬᆠၧ
ǹʾǴȠǻǍ
ԦᏦୃ࿝
Timothy Baldwin, Su Nam Kim, Francis Bond, Sanae Fujita,
David Martinez, and Takaaki Tanaka. 2008. Mrd-based word
sense disambiguation: Further extending lesk. In The Third
International Joint Conference on Natural Language Processing (IJCNLP-2008).
Francis Bond, Sanae Fujita, and Takaaki Tanaka. 2006. The Hinoki syntactic and semantic treebank of Japanese. Language
Resources and Evaluation, 40(3–4):253–261.
Yee Seng Chan, Hwee Tou Ng, and David Chiang. 2007. Word
sense disambiguation improves statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 33–40.
Chih-Chung Chang and Chih-Jen Lin. 2001. LIBSVM: a library
for support vector machines. Software available at http:
//www.csie.ntu.edu.tw/~cjlin/libsvm.
Sanae Fujita, Francis Bond, Stephan Oepen, and Takaaki
Tanaka. 2007. Exploiting semantic information for hpsg
parse selection. In ACL 2007 Workshop on Deep Linguistic Processing, pages 25–32.
Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. Applying conditional random fields to Japanese morphological
analysis. In Dekang Lin and Dekai Wu, editors, Proceedings
of EMNLP 2004, pages 230–237.
Kamal Nigam, John Lafferty, and Andrew McCallum. 1999.
Using maximum entropy for text classification. In IJCAI-99
Workshop on Machine Learning for Information Filtering,
pages 61–67.
Jun Suzuki, Erik McDermott, and Hideki Isozaki. 2006. Training conditional random fields with multivariate evaluation
measures. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting
of the Association for Computational Linguistics, pages 217–
224.
Takaaki Tanaka, Francis Bond, Timothy Baldwin, Sanae Fujita, and Chikara Hashimoto. 2007. Word sense disambiguation incorporating lexical and structural semantic information. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages
477–485.
ැԗ ॾ, ‫ ࠜސ‬᪔ࣙ, ტ˸ ᝕, ദߜ ஈၨ, ˏ߾ ฬ࡜, ߊΖ νۖᤒ,
‫ ߴ۔‬ᓐՃ, ధ ᒼࣹ. 1997. ୩௵ᜯᜯࣴ‫۔‬ያ. ߾ฅ௘࢜.
‫ا‬ሥ‫ا‬ᜯᅛሇਪ. 2004. ѧᬨᜯࣴᙲ CD-ROM (‫ښ‬ᚙତᛯ྘྘).
‫۔‬୩௵‫ؤ‬௘.
ఆၤ ᄠഭ, Фߴ ߂ۗ, ФЁ ๥ះ, ᭭ ᫐, ˸͊ԗ ‫ف‬. 2003. ੁᙩ។
୉ SENSEVAL-2J ᢍ௘ɘɒɈȉȐ CRL ȐԮȬጎȡ - ୩௵
ᜯӾᜯ‫ۏ‬ᏈॊᛩืȍǬdzȭᇭǏȐഷಓ‫ݯ‬Ꮥਯ෾Ȋዾॊ Ȑත
ᡤ. ᒈཊᛮᜯѕထ‫̸ݯ‬ᝈୁᜨ, 10(3):115–134.
቎ԗ ᛋ, ͊ᗋ ฬՃ, Francis Bond, ၤˏ ះᇑ, ᗋၤ ୭ᓚ, ᥘం ԩ
‫ݠ‬, ‫ە‬ᥕஈਓ. 2004. ǖ‫௵ٮ‬ᜯযմɠʑɘɲʑɒ:lexeedǗȐ
ഁ኉. 2004-NLC-159, pages 75–82.
- 571 -