ˀ͉նɊʄɔވȶၢǨȃᜱᏊௌஈौ᛫ู
♥
1
ᗍၦ ୯ᓜ,♥ Francis Bond,♠ ᗍᥗ ஊТ ♥
{sanae,a.fujino}@cslab.kecl.ntt.co.jp, ♠ [email protected]
♥ NTT ɌɸɾɤɊʑɐʀʌᇒٮݯᆖᅛሇਪ, ♠ پᣄ·ᅛሇഷഁ
ȓǼȥȏ
ᣄࡶnjۏDZȐᜯȑୁᐿȍȪȅȈႇȌȭᚩହȐᜯᏈȴ
ȆǍǵȮȫȐᜯᏈǭȫୁᐿȍयǺȈൿǹǦᜯᏈȴੑ
ǻȭᜯᏈொஆॊᛩื (WSD) ȑnjᢡࢉnjഁୁᛩథȐ
ൿᛩੑ (Fujita et al., 2007) ȦnjഷಓᏠᜃ (Chan et al.,
2007) ȐዖࢠՕʾȍұȉǤȭǵȊǮᆟǷȮȈǦȭǍ
WSD ȍȆǦȈȑnjۏDZȐІᙦᅛሇǮǤȭǍ୩௵ᜯ
ȉȑnjଲǤȬȐਯȍȑԨȒȌǦȤȐȐnjଲȌǹ
ȉ᮰Ǧዖࢠȴखȁnjੱࣥ Lesk ȴၠǦȭਯ (Baldwin
et al., 2008) ȦnjযմپȊഁୁپȐˋୗȴၠǦȁଲ
ǤȬȐਯ (Tanaka et al., 2007) ȌȋǮૂ౮ǷȮȈǦ
ȭǍǹǭǹnjਔǏȑ WSD ȐȴഁୁᛩథȍȤѵၠ
ǻȭȁȣnj௵ᇵȉȑnjഁୁᛩథȴѵၠǹȌǦਯ
ȴૂ౮ǻȭǍ
WSD Ȑ᪣ǹǷȐԗ؟ȐʸȆȊǹȈnjᑭ۔ȌᥖȐᜯᏈ
ȴલކǻȭȁȣȍӰѧȌݯᏕɠʑɘȐഁǮ᪣ǹǦȊ
ǦǨǵȊǮǤǴȫȮȭǍҥǪȈnjɈʂɒହǮᑭ۔ȉǤ
ȮȒnjۏDZȐഷಓݯᏕɝʑʄȑѵၠǮآ᪣ȉǤȭǍǿ
Ȑȁȣnjૂ౮ਯȉȑ WSD ȴ 2 ඟ᩷ȍѧdzȭǍ 1 ඟ
᩷ᄑȉȑnj˕͌, ٿਪ, П͌ྤ, ਗ਼ឤྤȌȋȐ (ʾ͇) য
մɈʂɒȴલކǻȭǍযմɈʂɒȑᜯᏈȍතțȈହǮ
ȃnjʅɲʄ 2 ǭȫ 5 ȠȉȐযմɈʂɒ1 ȴၠǦȈފᮍȴ
ᙦȌǦnjᜯᏈொஆॊᛩืȍұȌʅɲʄȴǻȭǍ
2.1 ᴄɡɔɣɢʓɚ
േɔʌɒɩʌɈȐФޗᜯȑnj Lexeed ȐᜯᏈȍȪȅȈɘ
ɉ̝dzǷȮȈǦȭǍǹǭǹnj௵ሱȉȑnj (ʾ͇) যմɈ
ʂɒȴલކǻȭȁȣnjേɔʌɒɩʌɈȐᜯᏈɘɉȴʾ
͇যմɈʂɒȚȊᎰǹnjᛷፙᴂɟɒɡɠʑɘȴ͓ਓ
ǻȭǍͣȊǹȈnjᣜᡜਯ 1 ȐᜯᏈୁȴୁ (1) ȍᆟǻǍୁ
(1) Ȑʿȍ cat ȉᆟǹȁᙦȑnjՋᜯᏈɘɉȍʃʌɈǷȮ
ȈǦȭᜯࣴ۔ያȐযմɈʂɒnj lvl X ȉᆟǹȁᙦȑnj
ʅɲʄ X ȍǬdzȭʾ͇যմɈʂɒȴᆟǹȈǦȭǍ௵ᇵ
ȉȑnjᜯᏈǮᚩହȐযմɈʂɒȍʃʌɈǷȮȈǦȭٿ
ՌnjѱȐযմɈʂɒȐȡȴၠǦȈǦȭǍ
ᙲ 1ȍnjᛷፙᴂɟɒɡɠʑɘȐହȴᆟǻǍᛷፙɠʑ
ɘ ȑnj 1 ᜯ Ꮘ ࢈ ف5.1 ୁ (ͣ ୁ) ǭ ȫ 17.7 ୁ (KC) ȉ Ǥ
ȭǍǹǭǹnjʾ͇যմɈʂɒȍ᪕ዯǹȁٿՌnjʅɲʄ 5
ȉȤ 1 Ɉʂɒ࢈ ف340.9 ୁ (ͣୁ) ǭȫ 539.7 ୁ (KC) Ȋ
ȌȭǍǵȐȪǨȍnjʾ͇যմɈʂɒȍ᪕ዯǻȭ˲ȍȪ
ȬnjɠʑɘɒɪʑɒɦɒȍࣦDZȌȭǍ
ȑȭǭȍߋȌǦȁȣnjතᡤყߋȌǦݯᏕɠʑɘǭȫȉ
ȤӰѧȌዖࢠȴखȭǵȊǮȉǯȭǍȠȁnjʾ͇যմɈ
ʂɒǮކȠȮȒnjᜯᏈǮʸযȍෘȠȭ˲ȤۏǦǍ 2 ඟ
Corpus
᩷ᄑȉȑnj 1 ඟ᩷ᄑȉલކǹȁযմɈʂɒȴၠǦȈᜯ
ᏈǿȐȤȐȴલކǻȭǍ
KC
ȌǬnjފᮍȍȑnjേɌʑɪɒ (Bond et al., 2006) ȴ
ၠǦȭǍേɌʑɪɒȑnjᢍ (Lexeed(ԗȫ, 2004)) Ȑ
ᜯᏈୁnjͣୁnj୕ (̇۔Ɍʑɪɒnj̥ʿnj KC) ȍǻ
ȭɝʃʑɩʌɈnjǬȪȕnjɔʌɒɩʌɈǭȫഁਓǷȮ
ȭǍ ȠȁnjǵȐᢍȍȑnjᜯᏈඪȍ୩௵ᜯɐɖʑʂɒ
ȉǤȭ୩௵ᜯᜯࣴ۔ያ (ැԗȫ, 1997) ȐযմɈʂɒ (য
մ߬ॊ) Ǯ̝ˁǷȮȈǦȭǍ
2
ᜯᏈୁ
ͣୁ
Set
Train
Test
Train
Test
Train
Test
ୁହ
ឤᜯହ
Еᜯହ
67,202
4,942
106,528
8,942
141,968
5,408
175,709
15,932
133,616
12,416
211,567
12,581
613,216
54,276
432,514
41,019
947,298
53,703
ᙲ 1: ᛷፙ / ɟɒɡɠʑɘହ: ǵǵȉឤᜯȊȑnj Lexeed
ȐᜯᏈȉɘɉ̝ˁǷȮȁᜯ
2.2 ތᮏ: ˀ͉նɊʄɔȒވ
ഷ ಓ ݯᏕ ਯ Ȋ ǹ Ȉnj Maximum Entropy Method:
MEM(Nigam et al., 1999) ǬȪȕnj Conditional Random
Fields: CRF(Suzuki et al., 2006) ȴၠǦȈފᮍȴᙦȌȅ
ˀ͉նɊʄɔȒވ
௵ሱȉȑnjʾ͇যմɈʂɒȐલކୗȍȆǦȈț
ȭǍᜯࣴ۔ያȑnj 2,710 ȐযմɈʂɒǭȫȌȬnj๛Ƿ 0
ǭȫ 11 ȠȉȐ᩷߯ (ʅɲʄ) ȍѧdzȫȮȈǦȭǍǿȐǨ
1 য մ Ɉ ʂ ɒ ȑnj ʅ ɲ ʄ 2 Ȑ ٿՌ �3: ˗ ͎� Ȧ �533: С
͎ ྦ� Ȍ ȋ 9 Ɉ ʂ ɒnj ʅ ɲ ʄ 3 Ȑ ٿՌ �4: ̏� Ȧ �706: ཊ ၞ
ྦ� Ȍȋ 30 Ɉʂɒnjʅɲʄ 4 ȐٿՌ �5: ̏ᨮ� Ȧ �760: ̏ࡖ
ྦ� Ȍȋ 136 Ɉʂɒnjʅɲʄ 5 ȐٿՌ �6: ̏ᨮᴑ̏ᇠᴓ� Ȧ
�838: ᬽୋ� Ȍȋ 392 Ɉʂɒȍ᪕ዯǷȮȭǍ
- 568 -
(1) ᪬ᡔ 1
cat
lvl 5
lvl 4
lvl 3
lvl 2
�988: ˣȮྦ (௷͎ (ᇡҾ (ᩰ�)))ز
�986: ˣȮྦ�
�760: ̏ࡖྦ�
�706: ཊၞྦ�
�533: С͎ྦ�
Ȧ ᒈҼᡔ 1
�988: ˣȮྦ (௷͎ (ᇡҾ (ᩰ�)))ز
�986: ˣȮྦ�
�760: ̏ࡖྦ�
�706: ཊၞྦ�
�533: С͎ྦ�
-
ȁ2 ǍȌǬnj௵ᇵȉȑnjᔀቜȍȪȭൿȡࣸዾᛩథ
ȴДҢȊǹȈѵၠǻȭǍ
̥ʿnjѵၠǻȭዾॊȍȆǦȈțȭǍ CRFȐዾॊȍ
ȑnj uni-gram, bi-gram, ឤᜯȐ҆ऑ 2 ᜯȐጎՌǽȴၠ
Ǧȭ (ᙲ 2)ǍMEMȐዾॊȍȑnjឤᜯᒈᡋȊǿȐ҆ऑ
ȐᜯnjឤୁˏȐǻțȈȐФޗᜯnjǬȪȕnjឤᜯȐ
҆ऑ 3 ୁݥȠȉȐୁݥѰȴၠǦȭ (ᙲ 3)ǍCRFȊЕDZՐ
ǺዾॊȴѵၠǹȁފᮍȤᙦȌȅȁǮnjȫǭȍዖࢠǮ
ʿǮȅȁȁȣnjǵǵȉȑȊȬǤǴȌǦǍᙲ 2, 3ȉnj bk
ȑ k ႃᄑȐᜯȐԗࣸnj wk ȑᙲ߯ࣸnj p1k , p2k , p3k ȑ
ǿȮȀȮnj֕ᜓ, ֕ᜓጅѧᬨ 1, ֕ᜓጅѧᬨ 2 ȴᆟǻǍ
Sample ȑnjୁ (1) Ȑ 5 ႃᄑȐᜯ (i = 5) ǖᣜᡜǗȴឤ
ᜯȊǹȁٿՌȐዾॊȐʸᤘȉǤȭǍ
Type
Template
uni- �bk �, �wk �,
gram �p1k �, �p2k �, �p3k �
ጎ
�bk , wk �, �bk , p1k �,
Ռ
�bk , p2k �, �bk , p3k �,
ǽ
�wk , p1k �, �wk , p2k �,
�wk , p3k �, �p1k , p2k �,
�p1k , p3k �, �p2k , p3k �
bi�bk , bk+1 �,
gram �wk , wk+1 �,
�p1k , p1k+1 �,
�p2k , p2k+1 �,
�p3k , p3k+1 �
Sample
�ᒈҼᡔ�, �ᒈҼᡔ�,
�Ցᜓ�, �Ցᜓ - ʸᒦ�
�ᒈҼᡔ, ᒈҼᡔ�,
�ᒈҼᡔ, Ցᜓ - ʸᒦ�,
�ᒈҼᡔ, Ցᜓ�,
�ᒈҼᡔ, Ցᜓ - ʸᒦ -*�,
�Ցᜓ, Ցᜓ - ʸᒦ -*�
�ᒈҼᡔ, ȴ�,
�ᒈҼᡔ, ȴ�,
�Ցᜓ, Ҩᜓ�,
�Ցᜓ - ʸᒦ, Ҩᜓ - ౦Ҩᜓ�,
�Ց - ʸᒦ -*, Ҩ - ౦Ҩᜓ - ʸᒦ�
ᙲ 2: CRF ȉѵၠǹȁዾॊ: ǵǵȉnj i ႃᄑȐᜯǮឤᜯȊ
ǻȭȊnj uni-gram ȊጎՌǽȉȑnj k = i−2, ..., i+2nj bi-gram
ȉȑnj k = i − 2, ..., i + 1Ǎ
ȴ ᣜᡜ 1
-
�2003: ଁ፪�
�1920: ұϜ�
�1560: ᙨ༸�
�1236: ̏ᨮยҾ�
�1235: ˴�
ǻȭ ̍ 4
-
�4: ̏�
�4: ̏�
�4: ̏�
�4: ̏�
�3: ˗͎�
ǖɢʂȽɩʑǗȊǦǨᜯȑnjʅɲʄ 2 ȉȑnj �3: ˗͎�
ǭ �533: С͎ྦ� ǹǭԮȬखȌǦǍǹǭǹnj௵ਯȉ
ȑnj �388: ځਬ� ȌȋȐɈʂɒȤੑՀᐺȉǤȭǍǿ
ǵȉnjǵȐȪǨȌɁʂʑȴൿǻȭȁȣnjȬखȌǦ
ɈʂɒǮੑǷȮȁٿՌnjCRFȉȑnjՀᐺȌɈʂɒȐ
ˏȉᬚȐɈʂɒȚȊۆǻȭǍȠȁnjMEMȉȑnjՀ
ᐺȌɈʂɒȐˏȉȤᆂȐ᮰ǦɈʂɒȚȊۆǻ
ȭ3 Ǎᙲ 4ȍǬǦȈnjǖൿ҆Ǘȉᆟǹȁዖࢠȑnjલކ
ǿȐȠȠȐዖࢠȉǤȬnjǖൿऑǗȉᆟǹȁዖࢠ
ȑnjȬखȌǦɈʂɒȴൿǹȁٿՌȐዖࢠȉǤȭǍ
ᙲ 4Ȑ ൿ ҆ Ȑ ǭ ȫnjCRFȑ Ȫ Ȭ ๛ Ǧ ʅ
ɲ ʄ ȉ Ȑ ዖ ࢠ Ǯ ත ᡤ ყ ᮰ Ǧ ǵ Ȋ Ǯ ȱ ǭ ȭǍ ǹ ǭ
ǹnjCRFȑMEMȪ Ȭ ۏDZ Ȑ ஓ ᨬ Ȋ ɺ ɻ ʃ ȴ न ᛋ Ȋ
ǻȭǍǿǵȉnjǦDZȆǭȐΥ (∗ ȴ̝ˁǹȁହΥ) ȑnj
p2 ȴၠǦȌǦȉखȁǍͅǹnj p2 ȴၠǦȌǦٿՌnjዖ
ࢠȑ 0.1-0.2 % ᇢࢠDZȌȭ4 Ǎ
ᙲ 4Ȑ ൿ ҆ Ȑ ȉ ȑnj Ǧ DZ Ȇ ǭ Ȑ ఎ ̪
ȉnjMEMȐ ୗ ǮnjCRFȪ Ȭ ᮰ Ǧ ዖ ࢠ ȴ ѡ ǹ Ȉ Ǧ
ȭǍǹǭǹnjൿୗȑMEMȐୗǮѵȉǤȭȍȤᨲ
ȱȫǼnjൿऑȑnjCRFɲʑɒȐዖࢠȐୗǮЕȈ᮰DZ
ȌȅȈǦȭ5 Ǎ ൿऑȑCRFȐୗǮЕȈ᮰DZȌȅȁထၥ
ȊǹȈnjMEMȑѱǭȫᬚࢠȍᥔ༵ǮǬǭȮȁݯᏕୗ
ȉǤȭȐȍǹnjCRFȑතᡤყያѰȊǹȈȐࣱ܋ॊ
ȍᥔ༵ǮǬǭȮȁݯᏕୗȉǤȭȁȣnjᬚࢠȍȪȭ
ൿȑCRFȍǹȈȪȬұყȉǤȭȊǦǨထၥǮᏤǪ
ȫȮȭǍ
Sample
�᪬ᡔ�, �ᒈҼᡔ�, �̍�
�ᣜᡜ�, �ᣜᡜ�,
�Ցᜓ�, �Ցᜓ - Ɏۆરጬ -*�
�ȴ�, �ᡔȴ�, �Ҽᡔȴ�
�ǻ�, �ǻȭ�, �ǻȭ̍�
ᙲ 4ǭȫnjൿऑȐዖࢠȑnjൿ҆ȍතțȈnjǦǼ
ȮȤʾ୷ǹȈǦȭǍǿȐȁȣnj൨ሱȐᜯᏈொஆॊᛩื
ȉȑnjൿऑȐȴѵၠǻȭǍ
ᙲ 3: MEM ȉѵၠǹȁዾॊ: ǵǵȉnj i ႃᄑȐᜯǮឤᜯ
ȊǻȭȊnj uni-gram ȉȑnj j = i − 1, ..., i + 1Ǎ
௵ሱȉȑnj҆ሱȉखǹȁʾ͇যմɈʂɒȴၠǦȁᜯ
Ꮘொஆॊᛩื (WSD) ȍȆǦȈțȭǍȠǼnj WSD Ȑ
Type
Template
�b
�
� l� � �
uni-gram �b j ,� w
� j ,�
p1 j , p3 j
ୁݥѰ
�cb1i �, �cb2i �, �cb3i �
�ca1i �, �ca2i �, �ca3i �
Фޗᜯ
2.3 ጘఫȌឈᝊ: ˀ͉նɊʄɔވ
ʾ͇যմɈʂɒȐલކȴᙲ 4ȍᆟǻǍɲʑɒʂȽʌ
(BL) ȑnjᛷፙɠʑɘˏȐᬚȐযմɈʂɒȴੑǹȁ
ٿՌȐዖࢠȉǤȭǍ௵ਯȉȑnjՋʅɲʄȠȉȐЕȈ
ȐɈʂɒǮੑՀᐺȍȌȅȈǦȭȁȣnjᜯȍȪȅȈȑ
ȬखȌǦɈʂɒǮੑǷȮȭٿՌǮǤȭǍͣǪȒnj
2 ފȑnj Support Vector Machine (SVM, (Chang and Lin,
2001)), ȉȤފᮍǹȁǮnjMEMȪȬዖࢠǮ͈DZnjஓᨬȤࡶȍ
ǭǭȅȁȁȣǵǵȉȑԮȬʾǴȌǦ
3
ᜱᏊௌஈौ᛫ู
ȁȣȐዾॊȍȆǦȈțȭǍ WSD ȐɌʌɟɒɡȉǤ
ȭ SENSEVAL-2 ୩௵ᜯᢍɘɒɈȍǬǦȈnjȤ᮰Ǧ
ዖࢠȴखȁఆၤȫ (2003) Ȑɐɒɟɹ (̥ʿnjMRT) ȴȝ
ȞЩފᚓǹnjਔǏȐɐɒɟɹȊතᡤǻȭǍਔǏǮЩފ
ᚓǹȁɐɒɟɹ (̥ʿnjCRL) ȊnjMRTȐᣥǦȍȆǦȈ
ȑ̥ʿȍțȭǍ
3 MEMȉȑnjЕɈʂɒȐᆂǮአӾȍखȉǯȁȁȣǍ
4 ∗ ȴ̝ˁǹȁఎ̪̥یȉතᡤǹȁٿՌǍ
5 ފȍȑnjMEMȐȴnj CRF ȊՐഇȍnjՀᐺȌᬚ
ȐযմɈʂɒȍൿǻȭୗȤǹȁǮnjዖࢠȑȦȦʿǮȅ
ȁǍ
- 569 -
Corpus
Lvl
2
3
4
5
BL
91.3
83.5
79.2
70.1
ᜯᏈୁ
ൿ҆
CRF
96.0
92.0
90.6
85.9∗
MEM
95.4
90.8
89.3
85.1
ൿऑ
CRF
96.3
92.5
91.2
86.7∗
MEM
95.7
91.4
90.2
86.6
BL
87.4
80.1
76.7
67.7
ͣୁ
ൿ҆
CRF
88.7
84.0
82.0
77.9∗
KC
ൿऑ
MEM
89.4
84.3
80.8
75.4
CRF
92.0
87.6
85.7
81.9∗
MEM
91.8
87.4
84.9
81.0
BL
90.0
83.0
80.0
70.6
ൿ҆
CRF
93.3
89.8∗
88.2∗
MEM
95.3
91.8
89.4
86.6
ൿऑ
CRF
96.3
93.4∗
91.9∗
MEM
95.8
92.8
90.8
88.8
ᙲ 4: ʾ͇যմɈʂɒલކ (CRF/MEM): ͅǹnj ∗ ȴ̝ˁǹȁହΥȑnj p2 ȴዾॊȊǹȈѵၠǹȈǦȌǦǍ
MRTȑnjᙲ 3Ȑዾॊ̥یȍnj൨Ȑ (a)-(c) Ȑپȴѵ
ၠǹȈዾॊȴ͓ਓǹȈǦȭǍ (a) KNP ȍȪȭഁୁᛩథ
nj (b) ؤ᭗ȌȋȉᬨȐѧᬨȍၠǦȫȮȭاӰ
ᣏѧᬨ (UDC) ȐɌʑɢnj (c) ୩௵ᜯȐɐɖʑʂɒȉ
Ǥȭѧᬨᜯࣴᙲ (اሥاᜯᅛሇਪ, 2004) ȐѧᬨႃՆǍ
ʾ (a)-(c) ȐǨȃnj௵ފᮍȉȑ (a) Ȋ (b) ȑѵၠǹȌ
ǦǍ (a) ȴѵၠǹȌǦထၥȑnj WSD Ȑȴഁୁᛩ
థȍѵၠǻȭȁȣnjഁୁᛩథȴ WSD Ȑ҆ѕထȊǹȈ
ȑᙦȌȱȌǦȁȣȉǤȭǍ (b) ȴѵၠǹȌǦထၥȑnj
UDC ɌʑɢǮേɌʑɪɒȍȑ̝ˁǷȮȈǦȌǦȁȣȉ
ǤȭǍ
ǵǵȉnj (c) Ȑѧᬨᜯࣴᙲȑ୩௵ᜯዯ 96,000 ᜯǮԬ
ǷȮȈǬȬnj๛Ƿ 5 Ȑ௲ഁᣈȍȌȅȈǦȭǍѱȐ
ʅɲʄȉȑ 4 ɈʂɒȍȱdzȫȮnjʅɲʄ 3 ȉ 95 Ɉʂ
ɒnjʅɲʄ 5 ȉ 895 ɈʂɒȍѧdzȫȮȭǍᜯࣴ۔ያȤ
ѧᬨᜯࣴᙲȤnjМȍ୩௵ᜯȐɐɖʑʂɒȉǤȭǮnjᜯ
ࣴ۔ያǮ˕ȍʸᒦՑᜓȴѧᬨǻȭȁȣȍ͓ȫȮȁȐ
ȍǹnjѧᬨᜯࣴᙲȑnjഷᐺᜯȴ՜ȢЕȈȐᜯȴѧᬨ
ឤȊǹȈǦȭǍ (c) ȍȆǦȈnjMRTȑnjʅɲʄ 3 Ȋ 5
ȐɈʂɒȴˋୗѵၠǹȈǦȭǍǹǭǹnjअȫȑݥȍ
ɷɜɚǹȁѱȐɈʂɒȴѵၠǹȈǬȬnj௵ᇵȐȪǨ
ȍȪȬᣱѨȌɈʂɒȴલކǻȭȪǨȌǵȊȑǹȈǦȌ
ǦǍᜯࣴ۔ያȊѧᬨᜯࣴᙲǭȫखȫȮȭɘȽɰȊዂࢠ
ȑႇȌȬnjྦྷȍѧᬨᜯࣴᙲȍȑഷᐺᜯǮ՜ȠȮȭǵȊ
ǭȫnjႇȌȭұǮखȫȮȭȊᏤǪȫȮȭȁȣnjਔǏ
Ȥ (c) ȐዾॊȑѵၠǻȭǍ
ȠȁnjMRTȑnj JUMAN/RWC Ȑࣸዾᛩథȴ
ˋୗѵၠǹȈǦȭǮnj௵ᇵȉȑᔀቜȍȪȭࣸዾᛩథ
ȐȡѵၠǹȈǦȭǍ
ȆȠȬnjCRL ၠȍȑѧᬨᜯࣴᙲǭȫखǹȁዾॊȴ
ᙲ 3ȍᢲҥǹnjਔǏȐɐɒɟɹȍȑnjѧᬨᜯࣴᙲǬȪȕ
લކǹȁʾ͇যմɈʂɒȍȪȭዾॊȴᢲҥǻȭǍǵȐ
ஓnjલކǹȁʅɲʄȪȬʾ͇ʅɲʄȐযմɈʂɒȤѵ
ၠǻȭǍͣǪȒnj҆ሱȉʅɲʄ 3 ȐযմɈʂɒȴલކ
ǹȁٿՌnjʅɲʄ 2 ȐযմɈʂɒȤዾॊȊǹȈᢲҥǹ
ȈǦȭǍ
ފᮍȍȑnj SENSEVAL-2 ȉȐឤᜯ (ՑᜓnjҼᜓՋ
50 ᜯ) ȴၠǦȁǍͅǹnj Lexeed ȍȌǦ 2 ᜯnjǬȪȕnj
ᛷፙ / ɟɒɡɠʑɘȐǦǼȮǭȍѡဎǹȌǭȅȁᜯȴᩣ
ǦȈǦȭǍފȐឤᜯହȴᙲ 5ȍᆟǻǍ
ਔǏȑnjMRTȊՐഇnjᜯȊ֕ᜓȐጎՌǽඪȐɻɠʄ
ȴ͓ਓǹȁǍȠȁnjMRTȑnj SVM ȊɣȽʑɯɲȽɓ
ȐˋୗȴጎȡՌȱǽȈѵၠǹȈǦȭǮnj௵ފᮍȉȑ
SVM(Chang and Lin, 2001) ȐȡȴѵၠǹȈǦȭǍ
ȠȁnjMRTȉȑnjۏᬇ࣑ɄʑɦʄȴѵၠǹȈǦȭǮnj
ਔǏȐފᮍȉȑፍࣸɄʑɦʄȐୗǮዖࢠǮ᮰DZȌȅȁ
ȁȣnjፍࣸɄʑɦʄȴѵၠǹȁǍ
Corpus
No.
ᜯᏈୁ
ͣୁ
KC
Ցᜓ
Wd
44
41
49
Pol
6.4
6.6
6.3
Ҽᜓ
Wd
46
46
49
Pol
9.6
9.4
10.4
Ռᛱ
Wd
90
87
98
Pol
8.1
8.1
8.4
ᙲ 5: WSD Ȑឤᜯହ:Wd ȑឤᜯହnj Pol ȑ࢈ۏفᏈହ
3.1 ጘఫȌឈᝊᴏᜱᏊௌஈौ᛫ู
ᙲ 6 ȍ WSD ȐȴᆟǻǍɲʑɒʂȽʌ (BL) ȑnjᛷ
ፙɠʑɘȐˏȉȐᬚᜯᏈȴੑǹȁٿՌȐዖࢠȉǤ
ȭǍȠȁ BL2 ȑnj҆ሱȉલކǹȁʾ͇যմɈʂɒȴຕ
ȁǻᬚᜯᏈȴੑǹȁٿՌȐዖࢠȉǤȭǍ
ᙲ 6ȍ Ǭ Ǧ ȈnjSCRFȑCRFnjSMEMȑMEMȍ Ȫȅ
Ȉલ ކ/ ൿǹȁʾ͇যմɈʂɒȴၠǦȁɐɒɟɹȉǤ
ȭǍЕȈȐȑɲʑɒʂȽʌ (BL) ȪȬযȍତᒼǷ
ȮȈǦȭǍᜯᏈୁȴᩣǯnjSCRF ȐǮȤȪǦǍ
ᙲ 6ǭȫnjʾ͇যմɈʂɒȴຕȁǻᬚᜯᏈȴੑ
ǹȁٿՌ (BL2) ȉȤnj᮰ǦዖࢠȉᜯᏈȴલކǻȭǵȊ
ǮȉǯȁǍʸᒦȍnj᩷߯Ǯ๛DZȌȭȊnjʾ͇যմɈʂ
ɒȐલކᒈ͌ȐዖࢠȑʿǮȭȍȤǭǭȱȫǼnjȪȬ๛
ǦʅɲʄȐযմɈʂɒȴၠǦȭୗǮnj WSD Ȑዖࢠᒈ
͌ȑՕʾǹȈǦȭǍ
4
ឈᝊȌ̖ओȒᬡ
௵ᇵȉȑnj WSD ȍǬdzȭʾ͇যմɈʂɒલކȐұ
ॊȴᆟǹȁǍǹǭǹnjʾ͇যմɈʂɒǮொஆॊȐ҃
ȍұǮȌǦٿՌȤعݦǻȭǍͣǪȒǖȻȽʆʌǗ
ȑnj ”ࡤػȐǹȱȴȐȒǻȐȍ͞ǨᦀᚥȐᥔǦǵȈǍ”
Ȋnj “᮸ȐධȴȃȄȫǽȭᦀᚥȐǵȈǍ” Ȑ˳ȆȐযմ
ȴȆǍȻɧɟȽɘʑȑǵȮȫȐᜯᏈȴݿЕȍӬѳȉ
ǯȭǮnjˋୗȊȤnj �915: ࢥޗၢС� Ȋ �969: Ҥഹ
�ȍʃʌɈǷȮȈǬȬnjযմɈʂɒǭȫᜯᏈȴጚȭ
ǵȊȑȉǯȌǦǍǹǭǹnjȝȊȵȋȐۏᏈᜯȍǹȈ
ȑnj௵ਯȑұȉǤȭǍ
- 570 -
Corpus
BL
CRL
BL2
(CRF
ൿऑ
ѵၠ)
BL2
(MEM
ൿऑ
ѵၠ)
SCRF
SMEM
Lvl
2
3
4
5
2
3
4
5
2
3
4
5
2
3
4
5
Ցᜓ
74.5
81.1
76.8
80.8
80.9
83.4
77.0
81.1
81.3
82.6
81.3
81.5
81.6
81.7
81.5
81.3
81.7
81.6
ᜯᏈୁ
Ҽᜓ ࢈ف
56.8
65.3
59.9
60.6
61.6
67.4
58.5
60.3
61.3
66.6
65.6
66.1
66.3
67.2
65.3
65.2
65.2
65.5
63.8
71.5
66.5
68.5
69.2
73.7
65.8
68.5
69.1
72.8
71.8
72.2
72.3
72.9
71.7
71.6
71.7
71.8
KC
Ցᜓ
ͣୁ
Ҽᜓ
࢈ف
Ցᜓ
Ҽᜓ
࢈ف
63.7
79.5
66.9
69.1
71.0
76.3
65.0
69.1
69.7
72.6
79.5
79.5
79.5
80.1
78.5
78.5
78.9
79.2
56.2
68.5
58.8
60.5
61.3
65.2
58.0
60.5
61.6
63.1
68.3
68.5
68.8
69.2
68.3
68.3
68.3
67.9
58.3
71.6
61.0
62.8
64.0
68.3
59.9
62.9
63.9
65.7
71.4
71.6
71.7
72.3
71.1
71.1
71.2
71.0
69.2
80.9
69.9
75.0
76.7
62.1
67.0
63.4
65.4
68.0
66.1
74.7
67.0
70.7
72.8
70.5
74.2
75.4
77.2
81.3
81.5
81.3
62.4
63.3
64.3
67.5
67.0
67.0
67.0
66.9
69.3
70.4
72.9
74.9
75.1
74.9
79.9
79.8
79.8
79.7
66.9
66.9
66.6
66.7
74.1
74.1
73.9
73.9
ᙲ 6: ᜯᏈொஆॊᛩื (SVM)
Ƞȁnjʾ͇যմɈʂɒલކȐȐȡȴഁୁᛩథȌ
ȋȍѵၠǻȭ˲ȤȉǯȭǍފnj Fujita et al. (2007) ȍ
ȪȭȊnjഁୁᛩథȐൿᛩੑȍǬǦȈʅɲʄ 2 Ȑযմ
ɈʂɒǮȤዖࢠՕʾȍޜˁǹȈǦȭǍ
̔ऑȐᬟȊǹȈȑnjᓦᜯቚȐ̛ȐᛮᜯȉȤՐഇȐ
ұȴखȫȮȭǭȋǨǭފᮍȴᙦȌǦȁǦǍȠȁnj௵
ᇵȉȑnjɪɜɊʑɑӝǷȮȁ CRF ȐݯᏕɝʑʄȴၠ
ǦȁǮnjࣸዾᛩథɝʑʄȉǤȭ mecab(Kudo et al.,
2004) ȉȐފᚓȊՐഇȍnjՀᐺȌݥȊʾ͇যմɈʂɒ
ȐጎՌǽȴᢍȊǹȈრǹnjጎȡՌȱǽȴѺᩛǻȭ
ǵȊȉnjዖࢠȊݯᏕᣇࢠȐՕʾȴȑǭȬȁǦǍ
5
ǮȳȮȏ
௵ᇵȉȑnjʾ͇যմɈʂɒલކȴၠǦȁᜯᏈொஆॊᛩ
ื (WSD) ୗȴૂ౮ǹȁǍ௵ਯȉȑnjȠǼʾ͇যմ
ɈʂɒȴલކǹȈǭȫnjǿȐલކȴၠǦȈ WSD
ȴᙦȌǨǍʾ͇যմɈʂɒȐલކȉȑnj CRF Ȋ MEM
ȴၠǦȁފᮍȴᙦȌǦnjМȍ᮰ǦዖࢠȴखȁǍȠȁnj
WSD ȉȤnj SENSEVAL-2 ȉȤ᮰Ǧዖࢠȴѡǹȁୗ
ȪȬ᮰Ǧዖࢠȴखȭ˲ǮȉǯȁǍǵȮȍȪȬnjૂ౮
ਯȉǤȭʾ͇যմɈʂɒલކȴၠǦȁ WSD ȑұ
ყȉǤȭȊǦǪȭǍ
ᝮᢏ
௵ᅛሇȉѵၠǹȁ CRF ȐݯᏕɝʑʄȑnj௲໗භȍछૂͬǦ
ȁȂǦȁྤ (Suzuki et al., 2006) ȉǻǍǵȐٿȴΡȬȈǬᆠၧ
ǹʾǴȠǻǍ
ԦᏦୃ
Timothy Baldwin, Su Nam Kim, Francis Bond, Sanae Fujita,
David Martinez, and Takaaki Tanaka. 2008. Mrd-based word
sense disambiguation: Further extending lesk. In The Third
International Joint Conference on Natural Language Processing (IJCNLP-2008).
Francis Bond, Sanae Fujita, and Takaaki Tanaka. 2006. The Hinoki syntactic and semantic treebank of Japanese. Language
Resources and Evaluation, 40(3–4):253–261.
Yee Seng Chan, Hwee Tou Ng, and David Chiang. 2007. Word
sense disambiguation improves statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 33–40.
Chih-Chung Chang and Chih-Jen Lin. 2001. LIBSVM: a library
for support vector machines. Software available at http:
//www.csie.ntu.edu.tw/~cjlin/libsvm.
Sanae Fujita, Francis Bond, Stephan Oepen, and Takaaki
Tanaka. 2007. Exploiting semantic information for hpsg
parse selection. In ACL 2007 Workshop on Deep Linguistic Processing, pages 25–32.
Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. Applying conditional random fields to Japanese morphological
analysis. In Dekang Lin and Dekai Wu, editors, Proceedings
of EMNLP 2004, pages 230–237.
Kamal Nigam, John Lafferty, and Andrew McCallum. 1999.
Using maximum entropy for text classification. In IJCAI-99
Workshop on Machine Learning for Information Filtering,
pages 61–67.
Jun Suzuki, Erik McDermott, and Hideki Isozaki. 2006. Training conditional random fields with multivariate evaluation
measures. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting
of the Association for Computational Linguistics, pages 217–
224.
Takaaki Tanaka, Francis Bond, Timothy Baldwin, Sanae Fujita, and Chikara Hashimoto. 2007. Word sense disambiguation incorporating lexical and structural semantic information. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages
477–485.
ැԗ ॾ, ࠜސ᪔ࣙ, ტ˸ , ദߜ ஈၨ, ˏ߾ ฬ, ߊΖ νۖᤒ,
ߴ۔ᓐՃ, ధ ᒼࣹ. 1997. ୩௵ᜯᜯࣴ۔ያ. ߾ฅ࢜.
اሥاᜯᅛሇਪ. 2004. ѧᬨᜯࣴᙲ CD-ROM (ښᚙତᛯ).
۔୩௵ؤ.
ఆၤ ᄠഭ, Фߴ ߂ۗ, ФЁ ះ, ᭭ , ˸͊ԗ ف. 2003. ੁᙩ។
SENSEVAL-2J ᢍɘɒɈȉȐ CRL ȐԮȬጎȡ - ୩௵
ᜯӾᜯۏᏈॊᛩืȍǬdzȭᇭǏȐഷಓݯᏕਯȊዾॊ Ȑත
ᡤ. ᒈཊᛮᜯѕထ̸ݯᝈୁᜨ, 10(3):115–134.
ԗ ᛋ, ͊ᗋ ฬՃ, Francis Bond, ၤˏ ះᇑ, ᗋၤ ୭ᓚ, ᥘం ԩ
ݠ, ەᥕஈਓ. 2004. ǖ௵ٮᜯযմɠʑɘɲʑɒ:lexeedǗȐ
ഁ. 2004-NLC-159, pages 75–82.
- 571 -
© Copyright 2026 ExpyDoc