ˀ͉նɊʄɔވȶၢǨȃᜱᏊௌஈौ᛫ู ♥ 1 ᗍၦ ୯ᓜ,♥ Francis Bond,♠ ᗍᥗ ஊТ ♥ {sanae,a.fujino}@cslab.kecl.ntt.co.jp, ♠ [email protected] ♥ NTT ɌɸɾɤɊʑɐʀʌᇒٮݯᆖᅛሇਪ, ♠ پᣄ·ᅛሇഷഁ ȓǼȥȏ ᣄࡶnjۏDZȐᜯȑୁᐿȍȪȅȈႇȌȭᚩହȐᜯᏈȴ ȆǍǵȮȫȐᜯᏈǭȫୁᐿȍयǺȈൿǹǦᜯᏈȴੑ ǻȭᜯᏈொஆॊᛩื (WSD) ȑnjᢡࢉnjഁୁᛩథȐ ൿᛩੑ (Fujita et al., 2007) ȦnjഷಓᏠᜃ (Chan et al., 2007) ȐዖࢠՕʾȍұȉǤȭǵȊǮᆟǷȮȈǦȭǍ WSD ȍȆǦȈȑnjۏDZȐІᙦᅛሇǮǤȭǍ୩௵ᜯ ȉȑnjଲǤȬȐਯȍȑԨȒȌǦȤȐȐnjଲȌǹ ȉ᮰Ǧዖࢠȴखȁnjੱࣥ Lesk ȴၠǦȭਯ (Baldwin et al., 2008) ȦnjযմپȊഁୁپȐˋୗȴၠǦȁଲ ǤȬȐਯ (Tanaka et al., 2007) ȌȋǮૂ౮ǷȮȈǦ ȭǍǹǭǹnjਔǏȑ WSD ȐȴഁୁᛩథȍȤѵၠ ǻȭȁȣnj௵ᇵȉȑnjഁୁᛩథȴѵၠǹȌǦਯ ȴૂ౮ǻȭǍ WSD Ȑ᪣ǹǷȐԗ؟ȐʸȆȊǹȈnjᑭ۔ȌᥖȐᜯᏈ ȴલކǻȭȁȣȍӰѧȌݯᏕɠʑɘȐഁǮ᪣ǹǦȊ ǦǨǵȊǮǤǴȫȮȭǍҥǪȈnjɈʂɒହǮᑭ۔ȉǤ ȮȒnjۏDZȐഷಓݯᏕɝʑʄȑѵၠǮآ᪣ȉǤȭǍǿ Ȑȁȣnjૂ౮ਯȉȑ WSD ȴ 2 ඟ᩷ȍѧdzȭǍ 1 ඟ ᩷ᄑȉȑnj˕͌, ٿਪ, П͌ྤ, ਗ਼ឤྤȌȋȐ (ʾ͇) য մɈʂɒȴલކǻȭǍযմɈʂɒȑᜯᏈȍතțȈହǮ ȃnjʅɲʄ 2 ǭȫ 5 ȠȉȐযմɈʂɒ1 ȴၠǦȈފᮍȴ ᙦȌǦnjᜯᏈொஆॊᛩืȍұȌʅɲʄȴǻȭǍ 2.1 ᴄɡɔɣɢʓɚ േɔʌɒɩʌɈȐФޗᜯȑnj Lexeed ȐᜯᏈȍȪȅȈɘ ɉ̝dzǷȮȈǦȭǍǹǭǹnj௵ሱȉȑnj (ʾ͇) যմɈ ʂɒȴલކǻȭȁȣnjേɔʌɒɩʌɈȐᜯᏈɘɉȴʾ ͇যմɈʂɒȚȊᎰǹnjᛷፙᴂɟɒɡɠʑɘȴ͓ਓ ǻȭǍͣȊǹȈnjᣜᡜਯ 1 ȐᜯᏈୁȴୁ (1) ȍᆟǻǍୁ (1) Ȑʿȍ cat ȉᆟǹȁᙦȑnjՋᜯᏈɘɉȍʃʌɈǷȮ ȈǦȭᜯࣴ۔ያȐযմɈʂɒnj lvl X ȉᆟǹȁᙦȑnj ʅɲʄ X ȍǬdzȭʾ͇যմɈʂɒȴᆟǹȈǦȭǍ௵ᇵ ȉȑnjᜯᏈǮᚩହȐযմɈʂɒȍʃʌɈǷȮȈǦȭٿ ՌnjѱȐযմɈʂɒȐȡȴၠǦȈǦȭǍ ᙲ 1ȍnjᛷፙᴂɟɒɡɠʑɘȐହȴᆟǻǍᛷፙɠʑ ɘ ȑnj 1 ᜯ Ꮘ ࢈ ف5.1 ୁ (ͣ ୁ) ǭ ȫ 17.7 ୁ (KC) ȉ Ǥ ȭǍǹǭǹnjʾ͇যմɈʂɒȍ᪕ዯǹȁٿՌnjʅɲʄ 5 ȉȤ 1 Ɉʂɒ࢈ ف340.9 ୁ (ͣୁ) ǭȫ 539.7 ୁ (KC) Ȋ ȌȭǍǵȐȪǨȍnjʾ͇যմɈʂɒȍ᪕ዯǻȭ˲ȍȪ ȬnjɠʑɘɒɪʑɒɦɒȍࣦDZȌȭǍ ȑȭǭȍߋȌǦȁȣnjතᡤყߋȌǦݯᏕɠʑɘǭȫȉ ȤӰѧȌዖࢠȴखȭǵȊǮȉǯȭǍȠȁnjʾ͇যմɈ ʂɒǮކȠȮȒnjᜯᏈǮʸযȍෘȠȭ˲ȤۏǦǍ 2 ඟ Corpus ᩷ᄑȉȑnj 1 ඟ᩷ᄑȉલކǹȁযմɈʂɒȴၠǦȈᜯ ᏈǿȐȤȐȴલކǻȭǍ KC ȌǬnjފᮍȍȑnjേɌʑɪɒ (Bond et al., 2006) ȴ ၠǦȭǍേɌʑɪɒȑnjᢍ (Lexeed(ԗȫ, 2004)) Ȑ ᜯᏈୁnjͣୁnj୕ (̇۔Ɍʑɪɒnj̥ʿnj KC) ȍǻ ȭɝʃʑɩʌɈnjǬȪȕnjɔʌɒɩʌɈǭȫഁਓǷȮ ȭǍ ȠȁnjǵȐᢍȍȑnjᜯᏈඪȍ୩௵ᜯɐɖʑʂɒ ȉǤȭ୩௵ᜯᜯࣴ۔ያ (ැԗȫ, 1997) ȐযմɈʂɒ (য մ߬ॊ) Ǯ̝ˁǷȮȈǦȭǍ 2 ᜯᏈୁ ͣୁ Set Train Test Train Test Train Test ୁହ ឤᜯହ Еᜯହ 67,202 4,942 106,528 8,942 141,968 5,408 175,709 15,932 133,616 12,416 211,567 12,581 613,216 54,276 432,514 41,019 947,298 53,703 ᙲ 1: ᛷፙ / ɟɒɡɠʑɘହ: ǵǵȉឤᜯȊȑnj Lexeed ȐᜯᏈȉɘɉ̝ˁǷȮȁᜯ 2.2 ތᮏ: ˀ͉նɊʄɔȒވ ഷ ಓ ݯᏕ ਯ Ȋ ǹ Ȉnj Maximum Entropy Method: MEM(Nigam et al., 1999) ǬȪȕnj Conditional Random Fields: CRF(Suzuki et al., 2006) ȴၠǦȈފᮍȴᙦȌȅ ˀ͉նɊʄɔȒވ ௵ሱȉȑnjʾ͇যմɈʂɒȐલކୗȍȆǦȈț ȭǍᜯࣴ۔ያȑnj 2,710 ȐযմɈʂɒǭȫȌȬnj๛Ƿ 0 ǭȫ 11 ȠȉȐ᩷߯ (ʅɲʄ) ȍѧdzȫȮȈǦȭǍǿȐǨ 1 য մ Ɉ ʂ ɒ ȑnj ʅ ɲ ʄ 2 Ȑ ٿՌ �3: ˗ ͎� Ȧ �533: С ͎ ྦ� Ȍ ȋ 9 Ɉ ʂ ɒnj ʅ ɲ ʄ 3 Ȑ ٿՌ �4: ̏� Ȧ �706: ཊ ၞ ྦ� Ȍȋ 30 Ɉʂɒnjʅɲʄ 4 ȐٿՌ �5: ̏ᨮ� Ȧ �760: ̏ࡖ ྦ� Ȍȋ 136 Ɉʂɒnjʅɲʄ 5 ȐٿՌ �6: ̏ᨮᴑ̏ᇠᴓ� Ȧ �838: ᬽୋ� Ȍȋ 392 Ɉʂɒȍ᪕ዯǷȮȭǍ - 568 - (1) ᪬ᡔ 1 cat lvl 5 lvl 4 lvl 3 lvl 2 �988: ˣȮྦ (௷͎ (ᇡҾ (ᩰ�)))ز �986: ˣȮྦ� �760: ̏ࡖྦ� �706: ཊၞྦ� �533: С͎ྦ� Ȧ ᒈҼᡔ 1 �988: ˣȮྦ (௷͎ (ᇡҾ (ᩰ�)))ز �986: ˣȮྦ� �760: ̏ࡖྦ� �706: ཊၞྦ� �533: С͎ྦ� - ȁ2 ǍȌǬnj௵ᇵȉȑnjᔀቜȍȪȭൿȡࣸዾᛩథ ȴДҢȊǹȈѵၠǻȭǍ ̥ʿnjѵၠǻȭዾॊȍȆǦȈțȭǍ CRFȐዾॊȍ ȑnj uni-gram, bi-gram, ឤᜯȐ҆ऑ 2 ᜯȐጎՌǽȴၠ Ǧȭ (ᙲ 2)ǍMEMȐዾॊȍȑnjឤᜯᒈᡋȊǿȐ҆ऑ ȐᜯnjឤୁˏȐǻțȈȐФޗᜯnjǬȪȕnjឤᜯȐ ҆ऑ 3 ୁݥȠȉȐୁݥѰȴၠǦȭ (ᙲ 3)ǍCRFȊЕDZՐ ǺዾॊȴѵၠǹȁފᮍȤᙦȌȅȁǮnjȫǭȍዖࢠǮ ʿǮȅȁȁȣnjǵǵȉȑȊȬǤǴȌǦǍᙲ 2, 3ȉnj bk ȑ k ႃᄑȐᜯȐԗࣸnj wk ȑᙲ߯ࣸnj p1k , p2k , p3k ȑ ǿȮȀȮnj֕ᜓ, ֕ᜓጅѧᬨ 1, ֕ᜓጅѧᬨ 2 ȴᆟǻǍ Sample ȑnjୁ (1) Ȑ 5 ႃᄑȐᜯ (i = 5) ǖᣜᡜǗȴឤ ᜯȊǹȁٿՌȐዾॊȐʸᤘȉǤȭǍ Type Template uni- �bk �, �wk �, gram �p1k �, �p2k �, �p3k � ጎ �bk , wk �, �bk , p1k �, Ռ �bk , p2k �, �bk , p3k �, ǽ �wk , p1k �, �wk , p2k �, �wk , p3k �, �p1k , p2k �, �p1k , p3k �, �p2k , p3k � bi�bk , bk+1 �, gram �wk , wk+1 �, �p1k , p1k+1 �, �p2k , p2k+1 �, �p3k , p3k+1 � Sample �ᒈҼᡔ�, �ᒈҼᡔ�, �Ցᜓ�, �Ցᜓ - ʸᒦ� �ᒈҼᡔ, ᒈҼᡔ�, �ᒈҼᡔ, Ցᜓ - ʸᒦ�, �ᒈҼᡔ, Ցᜓ�, �ᒈҼᡔ, Ցᜓ - ʸᒦ -*�, �Ցᜓ, Ցᜓ - ʸᒦ -*� �ᒈҼᡔ, ȴ�, �ᒈҼᡔ, ȴ�, �Ցᜓ, Ҩᜓ�, �Ցᜓ - ʸᒦ, Ҩᜓ - ౦Ҩᜓ�, �Ց - ʸᒦ -*, Ҩ - ౦Ҩᜓ - ʸᒦ� ᙲ 2: CRF ȉѵၠǹȁዾॊ: ǵǵȉnj i ႃᄑȐᜯǮឤᜯȊ ǻȭȊnj uni-gram ȊጎՌǽȉȑnj k = i−2, ..., i+2nj bi-gram ȉȑnj k = i − 2, ..., i + 1Ǎ ȴ ᣜᡜ 1 - �2003: ଁ፪� �1920: ұϜ� �1560: ᙨ༸� �1236: ̏ᨮยҾ� �1235: ˴� ǻȭ ̍ 4 - �4: ̏� �4: ̏� �4: ̏� �4: ̏� �3: ˗͎� ǖɢʂȽɩʑǗȊǦǨᜯȑnjʅɲʄ 2 ȉȑnj �3: ˗͎� ǭ �533: С͎ྦ� ǹǭԮȬखȌǦǍǹǭǹnj௵ਯȉ ȑnj �388: ځਬ� ȌȋȐɈʂɒȤੑՀᐺȉǤȭǍǿ ǵȉnjǵȐȪǨȌɁʂʑȴൿǻȭȁȣnjȬखȌǦ ɈʂɒǮੑǷȮȁٿՌnjCRFȉȑnjՀᐺȌɈʂɒȐ ˏȉᬚȐɈʂɒȚȊۆǻȭǍȠȁnjMEMȉȑnjՀ ᐺȌɈʂɒȐˏȉȤᆂȐ᮰ǦɈʂɒȚȊۆǻ ȭ3 Ǎᙲ 4ȍǬǦȈnjǖൿ҆Ǘȉᆟǹȁዖࢠȑnjલކ ǿȐȠȠȐዖࢠȉǤȬnjǖൿऑǗȉᆟǹȁዖࢠ ȑnjȬखȌǦɈʂɒȴൿǹȁٿՌȐዖࢠȉǤȭǍ ᙲ 4Ȑ ൿ ҆ Ȑ ǭ ȫnjCRFȑ Ȫ Ȭ ๛ Ǧ ʅ ɲ ʄ ȉ Ȑ ዖ ࢠ Ǯ ත ᡤ ყ ᮰ Ǧ ǵ Ȋ Ǯ ȱ ǭ ȭǍ ǹ ǭ ǹnjCRFȑMEMȪ Ȭ ۏDZ Ȑ ஓ ᨬ Ȋ ɺ ɻ ʃ ȴ न ᛋ Ȋ ǻȭǍǿǵȉnjǦDZȆǭȐΥ (∗ ȴ̝ˁǹȁହΥ) ȑnj p2 ȴၠǦȌǦȉखȁǍͅǹnj p2 ȴၠǦȌǦٿՌnjዖ ࢠȑ 0.1-0.2 % ᇢࢠDZȌȭ4 Ǎ ᙲ 4Ȑ ൿ ҆ Ȑ ȉ ȑnj Ǧ DZ Ȇ ǭ Ȑ ఎ ̪ ȉnjMEMȐ ୗ ǮnjCRFȪ Ȭ ᮰ Ǧ ዖ ࢠ ȴ ѡ ǹ Ȉ Ǧ ȭǍǹǭǹnjൿୗȑMEMȐୗǮѵȉǤȭȍȤᨲ ȱȫǼnjൿऑȑnjCRFɲʑɒȐዖࢠȐୗǮЕȈ᮰DZ ȌȅȈǦȭ5 Ǎ ൿऑȑCRFȐୗǮЕȈ᮰DZȌȅȁထၥ ȊǹȈnjMEMȑѱǭȫᬚࢠȍᥔ༵ǮǬǭȮȁݯᏕୗ ȉǤȭȐȍǹnjCRFȑතᡤყያѰȊǹȈȐࣱ܋ॊ ȍᥔ༵ǮǬǭȮȁݯᏕୗȉǤȭȁȣnjᬚࢠȍȪȭ ൿȑCRFȍǹȈȪȬұყȉǤȭȊǦǨထၥǮᏤǪ ȫȮȭǍ Sample �᪬ᡔ�, �ᒈҼᡔ�, �̍� �ᣜᡜ�, �ᣜᡜ�, �Ցᜓ�, �Ցᜓ - Ɏۆરጬ -*� �ȴ�, �ᡔȴ�, �Ҽᡔȴ� �ǻ�, �ǻȭ�, �ǻȭ̍� ᙲ 4ǭȫnjൿऑȐዖࢠȑnjൿ҆ȍතțȈnjǦǼ ȮȤʾ୷ǹȈǦȭǍǿȐȁȣnj൨ሱȐᜯᏈொஆॊᛩื ȉȑnjൿऑȐȴѵၠǻȭǍ ᙲ 3: MEM ȉѵၠǹȁዾॊ: ǵǵȉnj i ႃᄑȐᜯǮឤᜯ ȊǻȭȊnj uni-gram ȉȑnj j = i − 1, ..., i + 1Ǎ ௵ሱȉȑnj҆ሱȉखǹȁʾ͇যմɈʂɒȴၠǦȁᜯ Ꮘொஆॊᛩื (WSD) ȍȆǦȈțȭǍȠǼnj WSD Ȑ Type Template �b � � l� � � uni-gram �b j ,� w � j ,� p1 j , p3 j ୁݥѰ �cb1i �, �cb2i �, �cb3i � �ca1i �, �ca2i �, �ca3i � Фޗᜯ 2.3 ጘఫȌឈᝊ: ˀ͉նɊʄɔވ ʾ͇যմɈʂɒȐલކȴᙲ 4ȍᆟǻǍɲʑɒʂȽʌ (BL) ȑnjᛷፙɠʑɘˏȐᬚȐযմɈʂɒȴੑǹȁ ٿՌȐዖࢠȉǤȭǍ௵ਯȉȑnjՋʅɲʄȠȉȐЕȈ ȐɈʂɒǮੑՀᐺȍȌȅȈǦȭȁȣnjᜯȍȪȅȈȑ ȬखȌǦɈʂɒǮੑǷȮȭٿՌǮǤȭǍͣǪȒnj 2 ފȑnj Support Vector Machine (SVM, (Chang and Lin, 2001)), ȉȤފᮍǹȁǮnjMEMȪȬዖࢠǮ͈DZnjஓᨬȤࡶȍ ǭǭȅȁȁȣǵǵȉȑԮȬʾǴȌǦ 3 ᜱᏊௌஈौ᛫ู ȁȣȐዾॊȍȆǦȈțȭǍ WSD ȐɌʌɟɒɡȉǤ ȭ SENSEVAL-2 ୩௵ᜯᢍɘɒɈȍǬǦȈnjȤ᮰Ǧ ዖࢠȴखȁఆၤȫ (2003) Ȑɐɒɟɹ (̥ʿnjMRT) ȴȝ ȞЩފᚓǹnjਔǏȐɐɒɟɹȊතᡤǻȭǍਔǏǮЩފ ᚓǹȁɐɒɟɹ (̥ʿnjCRL) ȊnjMRTȐᣥǦȍȆǦȈ ȑ̥ʿȍțȭǍ 3 MEMȉȑnjЕɈʂɒȐᆂǮአӾȍखȉǯȁȁȣǍ 4 ∗ ȴ̝ˁǹȁఎ̪̥یȉතᡤǹȁٿՌǍ 5 ފȍȑnjMEMȐȴnj CRF ȊՐഇȍnjՀᐺȌᬚ ȐযմɈʂɒȍൿǻȭୗȤǹȁǮnjዖࢠȑȦȦʿǮȅ ȁǍ - 569 - Corpus Lvl 2 3 4 5 BL 91.3 83.5 79.2 70.1 ᜯᏈୁ ൿ҆ CRF 96.0 92.0 90.6 85.9∗ MEM 95.4 90.8 89.3 85.1 ൿऑ CRF 96.3 92.5 91.2 86.7∗ MEM 95.7 91.4 90.2 86.6 BL 87.4 80.1 76.7 67.7 ͣୁ ൿ҆ CRF 88.7 84.0 82.0 77.9∗ KC ൿऑ MEM 89.4 84.3 80.8 75.4 CRF 92.0 87.6 85.7 81.9∗ MEM 91.8 87.4 84.9 81.0 BL 90.0 83.0 80.0 70.6 ൿ҆ CRF 93.3 89.8∗ 88.2∗ MEM 95.3 91.8 89.4 86.6 ൿऑ CRF 96.3 93.4∗ 91.9∗ MEM 95.8 92.8 90.8 88.8 ᙲ 4: ʾ͇যմɈʂɒલކ (CRF/MEM): ͅǹnj ∗ ȴ̝ˁǹȁହΥȑnj p2 ȴዾॊȊǹȈѵၠǹȈǦȌǦǍ MRTȑnjᙲ 3Ȑዾॊ̥یȍnj൨Ȑ (a)-(c) Ȑپȴѵ ၠǹȈዾॊȴ͓ਓǹȈǦȭǍ (a) KNP ȍȪȭഁୁᛩథ nj (b) ؤ᭗ȌȋȉᬨȐѧᬨȍၠǦȫȮȭاӰ ᣏѧᬨ (UDC) ȐɌʑɢnj (c) ୩௵ᜯȐɐɖʑʂɒȉ Ǥȭѧᬨᜯࣴᙲ (اሥاᜯᅛሇਪ, 2004) ȐѧᬨႃՆǍ ʾ (a)-(c) ȐǨȃnj௵ފᮍȉȑ (a) Ȋ (b) ȑѵၠǹȌ ǦǍ (a) ȴѵၠǹȌǦထၥȑnj WSD Ȑȴഁୁᛩ థȍѵၠǻȭȁȣnjഁୁᛩథȴ WSD Ȑ҆ѕထȊǹȈ ȑᙦȌȱȌǦȁȣȉǤȭǍ (b) ȴѵၠǹȌǦထၥȑnj UDC ɌʑɢǮേɌʑɪɒȍȑ̝ˁǷȮȈǦȌǦȁȣȉ ǤȭǍ ǵǵȉnj (c) Ȑѧᬨᜯࣴᙲȑ୩௵ᜯዯ 96,000 ᜯǮԬ ǷȮȈǬȬnj๛Ƿ 5 Ȑ௲ഁᣈȍȌȅȈǦȭǍѱȐ ʅɲʄȉȑ 4 ɈʂɒȍȱdzȫȮnjʅɲʄ 3 ȉ 95 Ɉʂ ɒnjʅɲʄ 5 ȉ 895 ɈʂɒȍѧdzȫȮȭǍᜯࣴ۔ያȤ ѧᬨᜯࣴᙲȤnjМȍ୩௵ᜯȐɐɖʑʂɒȉǤȭǮnjᜯ ࣴ۔ያǮ˕ȍʸᒦՑᜓȴѧᬨǻȭȁȣȍ͓ȫȮȁȐ ȍǹnjѧᬨᜯࣴᙲȑnjഷᐺᜯȴ՜ȢЕȈȐᜯȴѧᬨ ឤȊǹȈǦȭǍ (c) ȍȆǦȈnjMRTȑnjʅɲʄ 3 Ȋ 5 ȐɈʂɒȴˋୗѵၠǹȈǦȭǍǹǭǹnjअȫȑݥȍ ɷɜɚǹȁѱȐɈʂɒȴѵၠǹȈǬȬnj௵ᇵȐȪǨ ȍȪȬᣱѨȌɈʂɒȴલކǻȭȪǨȌǵȊȑǹȈǦȌ ǦǍᜯࣴ۔ያȊѧᬨᜯࣴᙲǭȫखȫȮȭɘȽɰȊዂࢠ ȑႇȌȬnjྦྷȍѧᬨᜯࣴᙲȍȑഷᐺᜯǮ՜ȠȮȭǵȊ ǭȫnjႇȌȭұǮखȫȮȭȊᏤǪȫȮȭȁȣnjਔǏ Ȥ (c) ȐዾॊȑѵၠǻȭǍ ȠȁnjMRTȑnj JUMAN/RWC Ȑࣸዾᛩథȴ ˋୗѵၠǹȈǦȭǮnj௵ᇵȉȑᔀቜȍȪȭࣸዾᛩథ ȐȡѵၠǹȈǦȭǍ ȆȠȬnjCRL ၠȍȑѧᬨᜯࣴᙲǭȫखǹȁዾॊȴ ᙲ 3ȍᢲҥǹnjਔǏȐɐɒɟɹȍȑnjѧᬨᜯࣴᙲǬȪȕ લކǹȁʾ͇যմɈʂɒȍȪȭዾॊȴᢲҥǻȭǍǵȐ ஓnjલކǹȁʅɲʄȪȬʾ͇ʅɲʄȐযմɈʂɒȤѵ ၠǻȭǍͣǪȒnj҆ሱȉʅɲʄ 3 ȐযմɈʂɒȴલކ ǹȁٿՌnjʅɲʄ 2 ȐযմɈʂɒȤዾॊȊǹȈᢲҥǹ ȈǦȭǍ ފᮍȍȑnj SENSEVAL-2 ȉȐឤᜯ (ՑᜓnjҼᜓՋ 50 ᜯ) ȴၠǦȁǍͅǹnj Lexeed ȍȌǦ 2 ᜯnjǬȪȕnj ᛷፙ / ɟɒɡɠʑɘȐǦǼȮǭȍѡဎǹȌǭȅȁᜯȴᩣ ǦȈǦȭǍފȐឤᜯହȴᙲ 5ȍᆟǻǍ ਔǏȑnjMRTȊՐഇnjᜯȊ֕ᜓȐጎՌǽඪȐɻɠʄ ȴ͓ਓǹȁǍȠȁnjMRTȑnj SVM ȊɣȽʑɯɲȽɓ ȐˋୗȴጎȡՌȱǽȈѵၠǹȈǦȭǮnj௵ފᮍȉȑ SVM(Chang and Lin, 2001) ȐȡȴѵၠǹȈǦȭǍ ȠȁnjMRTȉȑnjۏᬇ࣑ɄʑɦʄȴѵၠǹȈǦȭǮnj ਔǏȐފᮍȉȑፍࣸɄʑɦʄȐୗǮዖࢠǮ᮰DZȌȅȁ ȁȣnjፍࣸɄʑɦʄȴѵၠǹȁǍ Corpus No. ᜯᏈୁ ͣୁ KC Ցᜓ Wd 44 41 49 Pol 6.4 6.6 6.3 Ҽᜓ Wd 46 46 49 Pol 9.6 9.4 10.4 Ռᛱ Wd 90 87 98 Pol 8.1 8.1 8.4 ᙲ 5: WSD Ȑឤᜯହ:Wd ȑឤᜯହnj Pol ȑ࢈ۏفᏈହ 3.1 ጘఫȌឈᝊᴏᜱᏊௌஈौ᛫ู ᙲ 6 ȍ WSD ȐȴᆟǻǍɲʑɒʂȽʌ (BL) ȑnjᛷ ፙɠʑɘȐˏȉȐᬚᜯᏈȴੑǹȁٿՌȐዖࢠȉǤ ȭǍȠȁ BL2 ȑnj҆ሱȉલކǹȁʾ͇যմɈʂɒȴຕ ȁǻᬚᜯᏈȴੑǹȁٿՌȐዖࢠȉǤȭǍ ᙲ 6ȍ Ǭ Ǧ ȈnjSCRFȑCRFnjSMEMȑMEMȍ Ȫȅ Ȉલ ކ/ ൿǹȁʾ͇যմɈʂɒȴၠǦȁɐɒɟɹȉǤ ȭǍЕȈȐȑɲʑɒʂȽʌ (BL) ȪȬযȍତᒼǷ ȮȈǦȭǍᜯᏈୁȴᩣǯnjSCRF ȐǮȤȪǦǍ ᙲ 6ǭȫnjʾ͇যմɈʂɒȴຕȁǻᬚᜯᏈȴੑ ǹȁٿՌ (BL2) ȉȤnj᮰ǦዖࢠȉᜯᏈȴલކǻȭǵȊ ǮȉǯȁǍʸᒦȍnj᩷߯Ǯ๛DZȌȭȊnjʾ͇যմɈʂ ɒȐલކᒈ͌ȐዖࢠȑʿǮȭȍȤǭǭȱȫǼnjȪȬ๛ ǦʅɲʄȐযմɈʂɒȴၠǦȭୗǮnj WSD Ȑዖࢠᒈ ͌ȑՕʾǹȈǦȭǍ 4 ឈᝊȌ̖ओȒᬡ ௵ᇵȉȑnj WSD ȍǬdzȭʾ͇যմɈʂɒલކȐұ ॊȴᆟǹȁǍǹǭǹnjʾ͇যմɈʂɒǮொஆॊȐ҃ ȍұǮȌǦٿՌȤعݦǻȭǍͣǪȒǖȻȽʆʌǗ ȑnj ”ࡤػȐǹȱȴȐȒǻȐȍ͞ǨᦀᚥȐᥔǦǵȈǍ” Ȋnj “᮸ȐධȴȃȄȫǽȭᦀᚥȐǵȈǍ” Ȑ˳ȆȐযմ ȴȆǍȻɧɟȽɘʑȑǵȮȫȐᜯᏈȴݿЕȍӬѳȉ ǯȭǮnjˋୗȊȤnj �915: ࢥޗၢС� Ȋ �969: Ҥഹ �ȍʃʌɈǷȮȈǬȬnjযմɈʂɒǭȫᜯᏈȴጚȭ ǵȊȑȉǯȌǦǍǹǭǹnjȝȊȵȋȐۏᏈᜯȍǹȈ ȑnj௵ਯȑұȉǤȭǍ - 570 - Corpus BL CRL BL2 (CRF ൿऑ ѵၠ) BL2 (MEM ൿऑ ѵၠ) SCRF SMEM Lvl 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 Ցᜓ 74.5 81.1 76.8 80.8 80.9 83.4 77.0 81.1 81.3 82.6 81.3 81.5 81.6 81.7 81.5 81.3 81.7 81.6 ᜯᏈୁ Ҽᜓ ࢈ف 56.8 65.3 59.9 60.6 61.6 67.4 58.5 60.3 61.3 66.6 65.6 66.1 66.3 67.2 65.3 65.2 65.2 65.5 63.8 71.5 66.5 68.5 69.2 73.7 65.8 68.5 69.1 72.8 71.8 72.2 72.3 72.9 71.7 71.6 71.7 71.8 KC Ցᜓ ͣୁ Ҽᜓ ࢈ف Ցᜓ Ҽᜓ ࢈ف 63.7 79.5 66.9 69.1 71.0 76.3 65.0 69.1 69.7 72.6 79.5 79.5 79.5 80.1 78.5 78.5 78.9 79.2 56.2 68.5 58.8 60.5 61.3 65.2 58.0 60.5 61.6 63.1 68.3 68.5 68.8 69.2 68.3 68.3 68.3 67.9 58.3 71.6 61.0 62.8 64.0 68.3 59.9 62.9 63.9 65.7 71.4 71.6 71.7 72.3 71.1 71.1 71.2 71.0 69.2 80.9 69.9 75.0 76.7 62.1 67.0 63.4 65.4 68.0 66.1 74.7 67.0 70.7 72.8 70.5 74.2 75.4 77.2 81.3 81.5 81.3 62.4 63.3 64.3 67.5 67.0 67.0 67.0 66.9 69.3 70.4 72.9 74.9 75.1 74.9 79.9 79.8 79.8 79.7 66.9 66.9 66.6 66.7 74.1 74.1 73.9 73.9 ᙲ 6: ᜯᏈொஆॊᛩื (SVM) Ƞȁnjʾ͇যմɈʂɒલކȐȐȡȴഁୁᛩథȌ ȋȍѵၠǻȭ˲ȤȉǯȭǍފnj Fujita et al. (2007) ȍ ȪȭȊnjഁୁᛩథȐൿᛩੑȍǬǦȈʅɲʄ 2 Ȑযմ ɈʂɒǮȤዖࢠՕʾȍޜˁǹȈǦȭǍ ̔ऑȐᬟȊǹȈȑnjᓦᜯቚȐ̛ȐᛮᜯȉȤՐഇȐ ұȴखȫȮȭǭȋǨǭފᮍȴᙦȌǦȁǦǍȠȁnj௵ ᇵȉȑnjɪɜɊʑɑӝǷȮȁ CRF ȐݯᏕɝʑʄȴၠ ǦȁǮnjࣸዾᛩథɝʑʄȉǤȭ mecab(Kudo et al., 2004) ȉȐފᚓȊՐഇȍnjՀᐺȌݥȊʾ͇যմɈʂɒ ȐጎՌǽȴᢍȊǹȈრǹnjጎȡՌȱǽȴѺᩛǻȭ ǵȊȉnjዖࢠȊݯᏕᣇࢠȐՕʾȴȑǭȬȁǦǍ 5 ǮȳȮȏ ௵ᇵȉȑnjʾ͇যմɈʂɒલކȴၠǦȁᜯᏈொஆॊᛩ ื (WSD) ୗȴૂ౮ǹȁǍ௵ਯȉȑnjȠǼʾ͇যմ ɈʂɒȴલކǹȈǭȫnjǿȐલކȴၠǦȈ WSD ȴᙦȌǨǍʾ͇যմɈʂɒȐલކȉȑnj CRF Ȋ MEM ȴၠǦȁފᮍȴᙦȌǦnjМȍ᮰ǦዖࢠȴखȁǍȠȁnj WSD ȉȤnj SENSEVAL-2 ȉȤ᮰Ǧዖࢠȴѡǹȁୗ ȪȬ᮰Ǧዖࢠȴखȭ˲ǮȉǯȁǍǵȮȍȪȬnjૂ౮ ਯȉǤȭʾ͇যմɈʂɒલކȴၠǦȁ WSD ȑұ ყȉǤȭȊǦǪȭǍ ᝮᢏ ௵ᅛሇȉѵၠǹȁ CRF ȐݯᏕɝʑʄȑnj௲໗භȍछૂͬǦ ȁȂǦȁྤ (Suzuki et al., 2006) ȉǻǍǵȐٿȴΡȬȈǬᆠၧ ǹʾǴȠǻǍ ԦᏦୃ Timothy Baldwin, Su Nam Kim, Francis Bond, Sanae Fujita, David Martinez, and Takaaki Tanaka. 2008. Mrd-based word sense disambiguation: Further extending lesk. In The Third International Joint Conference on Natural Language Processing (IJCNLP-2008). Francis Bond, Sanae Fujita, and Takaaki Tanaka. 2006. The Hinoki syntactic and semantic treebank of Japanese. Language Resources and Evaluation, 40(3–4):253–261. Yee Seng Chan, Hwee Tou Ng, and David Chiang. 2007. Word sense disambiguation improves statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 33–40. Chih-Chung Chang and Chih-Jen Lin. 2001. LIBSVM: a library for support vector machines. Software available at http: //www.csie.ntu.edu.tw/~cjlin/libsvm. Sanae Fujita, Francis Bond, Stephan Oepen, and Takaaki Tanaka. 2007. Exploiting semantic information for hpsg parse selection. In ACL 2007 Workshop on Deep Linguistic Processing, pages 25–32. Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. Applying conditional random fields to Japanese morphological analysis. In Dekang Lin and Dekai Wu, editors, Proceedings of EMNLP 2004, pages 230–237. Kamal Nigam, John Lafferty, and Andrew McCallum. 1999. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61–67. Jun Suzuki, Erik McDermott, and Hideki Isozaki. 2006. Training conditional random fields with multivariate evaluation measures. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 217– 224. Takaaki Tanaka, Francis Bond, Timothy Baldwin, Sanae Fujita, and Chikara Hashimoto. 2007. Word sense disambiguation incorporating lexical and structural semantic information. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 477–485. ැԗ ॾ, ࠜސ᪔ࣙ, ტ˸ , ദߜ ஈၨ, ˏ߾ ฬ, ߊΖ νۖᤒ, ߴ۔ᓐՃ, ధ ᒼࣹ. 1997. ୩௵ᜯᜯࣴ۔ያ. ߾ฅ࢜. اሥاᜯᅛሇਪ. 2004. ѧᬨᜯࣴᙲ CD-ROM (ښᚙତᛯ). ۔୩௵ؤ. ఆၤ ᄠഭ, Фߴ ߂ۗ, ФЁ ះ, ᭭ , ˸͊ԗ ف. 2003. ੁᙩ។ SENSEVAL-2J ᢍɘɒɈȉȐ CRL ȐԮȬጎȡ - ୩௵ ᜯӾᜯۏᏈॊᛩืȍǬdzȭᇭǏȐഷಓݯᏕਯȊዾॊ Ȑත ᡤ. ᒈཊᛮᜯѕထ̸ݯᝈୁᜨ, 10(3):115–134. ԗ ᛋ, ͊ᗋ ฬՃ, Francis Bond, ၤˏ ះᇑ, ᗋၤ ୭ᓚ, ᥘం ԩ ݠ, ەᥕஈਓ. 2004. ǖ௵ٮᜯযմɠʑɘɲʑɒ:lexeedǗȐ ഁ. 2004-NLC-159, pages 75–82. - 571 -
© Copyright 2024 ExpyDoc