Kinectにより観測された人の動作を説明する確率的言語生成への 取り組み

ARG WI2 No.3, 2013
Kinect ʹΑΓ‫؍‬ଌ͞Εͨਓͷಈ࡞Λઆ໌͢Δ֬཰త‫ޠݴ‬ੜ੒΁ͷ
औΓ૊Έ
খྛ ਸ਼‫† ق‬, a
খྛ Ұ࿠ †, b
ຑੜ ӳथ ††, c
† ͓஡ͷਫঁࢠେֶେֶӃ ਓؒจԽ૑੒Պֶ‫ڀݚ‬Պ ཧֶઐ߈ ৘ใՊֶίʔε
†† ࢈‫ٕۀ‬ज़૯߹‫ ॴڀݚ‬஌ೳγεςϜ‫ڀݚ‬෦໳
a) [email protected] b) [email protected] c) [email protected]
֓ཁ ηϯαͳͲʹΑͬͯ‫؍‬ଌ͞ΕΔ৘ใͷຆͲ͸࣌‫ྻܥ‬σʔλͰ͋ΓɼϏοάσʔλΛѻ͏࣌୅ʹ͓͍ͯ͸ɼ‫؍‬
ଌ͞Εͨ࣌‫ྻܥ‬σʔλͷத͔Β༗ӹͳ৘ใΛऔಘ͠ɼͦͷ಺༰Λཧղ͢Δख๏ͷ։ൃ͕ॏཁͱͳΔɽ࣌‫ྻܥ‬σʔ
λͷ෼ੳํ๏ʹ͸ɼτϨϯυͷ༧ଌ΍ෳ਺σʔλؒͷ૬ؔؔ܎ͷ෼ੳͳͲ༷ʑͳํ๏͕ଘࡏ͢ΔɽҰํͰɼ࣌‫ܥ‬
ྻσʔλͷ಺༰Λཧղ͢Δʹ͸ՄࢹԽͳͲͷख๏͕༻͍ΒΕ͍ͯΔɽ͔͠͠ɼϩϘοτͳͲෳ਺ͷηϯαʹΑͬ
ͯऔಘ͞Εͨ࣌‫ྻܥ‬σʔλͷ৘ใʹ‫͖ͮج‬ঢ়‫گ‬Λೝࣝ͢Δඞཁ͕͋Δ৔߹ɼऔಘ͞Εͨ৘ใΛΑΓந৅౓ͷߴ͍
ϨϕϧͰ‫؍‬ଌ͞ΕͨσʔλΛද‫͢ݱ‬Δඞཁ͕͋Δɽͦͷ͜ͱʹண໨͠ɼզʑ͸‫؍‬ଌ͞Εͨ࣌‫ྻܥ‬σʔλͷৼΔ෣
͍Λ‫Ͱޠݴ‬આ໌͢Δख๏ͷ։ൃΛ໨ࢦ͠ɼͦͷҰͭͱͯ͠ɼKinect ͔ΒಘΒΕͨಈը૾ͷ৘ใΛೖྗͱͨ֬͠཰
తͳςΩετੜ੒ख๏ΛఏҊ͢Δɽ
Ωʔϫʔυ Kinectɼ࣌‫ྻܥ‬σʔλɼSAXɼର਺ઢ‫ܗ‬ϞσϧɼόΠάϥϜϞσϧɼಈత‫ܭ‬ը๏
1
͸͡Ίʹ
࡞Λઆ໌͢ΔςΩετੜ੒ͱͯ͠ɼRegneri Β [2] ͸ௐ
ηϯαͳͲʹΑͬͯ‫؍‬ଌ͞ΕΔ৘ใͷຆͲ͸࣌‫ྻܥ‬
ཧΛߦ͍ͬͯΔಈը૾͔ΒɼඃࣸମͷߦಈΛઆ໌͢Δς
σʔλͰ͋ΓɼϏοάσʔλΛѻ͏࣌୅ʹ͓͍ͯ͸ɼ‫؍‬
Ωετੜ੒Λߦ͍ͬͯΔɽ൴Β΋ Yu Βͱಉ༷ʹਓͷಈ
ଌ͞Εͨ࣌‫ྻܥ‬σʔλͷத͔Β༗ӹͳ৘ใΛऔಘ͠ɼͦ
࡞ͱͦΕΛઆ໌͢ΔจষͷϖΞ͔Βಈ࡞ʹରͯ͠ద੾ͳ
ͷ಺༰Λཧղ͢Δख๏ͷ։ൃ͕ॏཁͱͳΔɽ࣌‫ྻܥ‬σʔ
ද‫ݱ‬Λ‫͍ͯ͠ূݕ‬ΔɽUshiku Β [3] ͸ɼ੩ࢭըͱͦΕΛ
λͷ෼ੳํ๏ʹ͸ɼτϨϯυͷ༧ଌ΍ෳ਺σʔλؒͷ૬
આ໌͢ΔΩϟϓγϣϯͷϖΞΛֶश͠ɼྨࣅͨ͠ը૾ʹ
ؔؔ܎ͷ෼ੳͳͲ༷ʑͳํ๏͕ଘࡏ͢ΔɽҰํͰɼ࣌‫ܥ‬
ྨࣅͨ͠ΩϟϓγϣϯΛੜ੒͢Δख๏ΛఏҊ͍ͯ͠Δɽ
ྻσʔλͷ಺༰Λཧղ͢Δʹ͸ՄࢹԽͳͲͷख๏͕༻͍
·ͨɼಈը૾ͷ‫ޠݴ‬Խʹ͓͍ͯɼTakano Β [4, 5] ͸ɼҰ
ΒΕ͍ͯΔɽ͔͠͠ɼϩϘοτͳͲෳ਺ͷηϯαʹΑͬ
࿈ͷਓͷߦಈΛӡಈ‫߸ه‬ͷ n-gram ͱͯ͠ද‫͠ݱ‬ɼӡಈ
ͯऔಘ͞Εͨ࣌‫ྻܥ‬σʔλͷ৘ใʹ‫͖ͮج‬ঢ়‫گ‬Λೝࣝ͢
‫ݺͱ߸ه‬͹ΕΔਓͷಈ࡞Λࣔͨ͠ҙຯϥϕϧ͔Β୯‫ޠ‬ͷ
Δඞཁ͕͋Δ৔߹ɼऔಘ͞Εͨ৘ใΛΑΓந৅౓ͷߴ͍
࿈૝ߏ଄Λද‫ͨ͠ݱ‬ϞσϧΛ௨ͯ͡୯‫ޠ‬ͷฒͼΛ࡞Δจ
ϨϕϧͰ‫؍‬ଌ͞ΕͨσʔλΛද‫͢ݱ‬Δඞཁ͕͋Δɽͦͷ
ੜ੒ख๏ΛఏҊ͍ͯ͠Δɽ
͜ͱʹண໨͠ɼզʑ͸‫؍‬ଌ͞Εͨ࣌‫ྻܥ‬σʔλͷৼΔ෣
·ͨɼLiang Β [6] ͸ɼςΩετͱҙຯͱͷؔ܎Λֶ
͍Λ‫Ͱޠݴ‬આ໌͢Δख๏ͷ։ൃΛ໨ࢦ͠ɼͦͷҰͭͱ͠
श͢Δख๏ΛఏҊ͓ͯ͠Γɼͦ͜Ͱ͸ɼΠϕϯτ͸σʔ
ͯɼKinect ͔ΒಘΒΕͨಈը૾ͷ৘ใΛೖྗͱͨ֬͠
λϕʔεͷϨίʔυͰදͤΔͱԾఆ͠ɼϨίʔυͱࣗવ
཰తͳςΩετੜ੒ख๏ΛఏҊ͢Δɽ
‫Ͱޠݴ‬ද‫͞ه‬Εͨઆ໌จͱͷؔ࿈Λ‫ػ‬ցֶशʹΑͬͯऔ
ಘ͍ͯ͠ΔɽAngeli Β [7] ͸ɼ Liang Β [6] ͕ఏҊͨ͠
2
ؔ࿈‫ڀݚ‬
Ϟσϧʹ‫ͮ͘ج‬જࡏ৘ใͱද૚৘ใΛςΩετੜ੒͢Δ
ۙ೥ɼϚϧνϞʔμϧ৘ใΛ‫ޠݴ‬৘ใͰද͢‫͕ڀݚ‬੝
ख๏ΛఏҊ͍ͯ͠Δɽ·ͨɼKonstas Β [8, 9] ͸ɼೖྗ
Μʹ‫͞ڀݚ‬Ε͖͍ͯͯΔɽ2013 ೥ͷࣗવ‫ॲޠݴ‬ཧ෼໺
৘ใ‫ݻ‬༗ͷߏ଄Λઆ໌͢Δ֬཰తͳࣗ༝จ຺จ๏Λఆٛ
Ͱͷ࠷ߴๆͱ͞ΕΔࠃࡍձٞ ACL ʹ͓͚Δ࠷༏ल࿦จ
͓ͯ͠ΓɼLiang Β [6] ΍ Angeli Β [7] ͱಉ༷ʹɼσʔ
৆ʹ͸ɼYu Β [1] ʹΑΔಈըʹөΔਓͱ෺ͱͷ૬‫༻࡞ޓ‬
λϕʔεͷϨίʔυͱઆ໌จΛ༻͍͍ͯΔɽ൴Β͸ɼॏ
Λઆ໌͢ΔςΩετੜ੒‫͕ڀݚ‬બ͹Εͨɽ൴Βͷ‫Ͱڀݚ‬
ΈΛՃ͑ͨάϥϑʹΑͬͯจ๏Λද‫͠ݱ‬ɼ༩͑ΒΕͨೖ
͸ɼ෺ମಈ࡞ͷೝࣝͷํ๏ͱͯ͠ɼ෺ମಈ࡞ΛҰͭͷ࣌
ྗʹର͠΋ͬͱ΋ద੾ͳಋग़໦Λ‫͚ͭݟ‬Δ͜ͱͰςΩε
‫ྻܥ‬σʔλͱͯ͠ද͠ɼӅΕϚϧίϑϞσϧΛ༻͍ͯͦ
τੜ੒Λߦ͍ͬͯΔɽ
ͷಈ࡞ϞσϧΛֶशɾೝࣝ͠ɼ‫ޠݴ‬Λද͢ҙຯϥϕϧΛ
ຊ‫Ͱڀݚ‬͸ɼLiang Β [6] ͷख๏Λࢀߟʹͯ͠ɼࢹ֮
෇༩͢Δ͜ͱʹΑΓ‫ޠݴ‬ԽΛߦ͍ͬͯΔɽ·ͨɼਓͷಈ
৘ใͱͯ͠औಘ͞ΕΔਓͱ෺ମͷৼΔ෣͍Λද࣌͢‫ྻܥ‬
σʔλͱಈ࡞Λදࣗ͢વ‫ޠݴ‬ͷઆ໌จͱͷରԠΛର਺ઢ
Copyright is held by the author(s).
The article has been published without reviewing.
‫ܗ‬ϞσϧΛ༻ֶ͍ͯश͠ɼಈ࡞ͷҙຯΛද͢தؒද‫ݱ‬Λ
Web ΠϯςϦδΣϯεͱΠϯλϥΫγϣϯ‫ڀݚ‬ձ༧ߘू
ਤ 1 ಈը૾Λೖྗͱ͢Δ֬཰తςΩετੜ੒ͷ࿮૊Έ
൑ผ͢Δɽதؒද‫༻ʹͱ͝ݱ‬ҙ͞ΕͨόΠάϥϜωοτ
࠲ඪͷ࣌‫ྻܥ‬σʔλΛऔಘ͢Δ (ਤ 2 ࢀর)ɽ·ͨɼ෺ମ
ϫʔΫΛ༻͍Δ͜ͱʹΑΓɼͦͷಈ࡞ͷҙຯΛද‫͢ݱ‬Δ
ͷಈ࡞ͷ࣌‫ྻܥ‬σʔλ͸ɼύʔςΟΫϧϑΟϧλʢ3.1.1
໬΋Β͍͠จΛੜ੒͢ΔɽఏҊ͢ΔςΩετੜ੒ख๏ʹ
ʹৄઆʣΛ༻͍Δ͜ͱͰऔಘ͢Δɽ
͸ɼจ๏Λඞཁͱ͢ΔΑ͏ͳෳࡶͳจ͸ੜ੒͢Δ͜ͱ͕
Ͱ͖ͳ͍͕ɼಈը૾Λೖྗͱ͠ɼ໬౓͕ߴ͍ද‫ߏͰݱ‬੒
͞ΕΔจΛ༰қʹੜ੒͢Δ͜ͱ͕ՄೳͰ͋Δɽ
ࢹ֮৘ใͷ‫ޠݴ‬Խͷ࿮૊Έ
3
ຊ‫ڀݚ‬ͷ֓ཁΛਤ 1 ʹࣔ͢. ·ͣɼKinect1 ͕΋ͭਓ
ͷࠎ֨Λ௥੻͢ΔϥΠϒϥϦͱύʔςΟΫϧϑΟϧλΛ
༻͍Δ͜ͱͰɼਓͱ෺ͷಈ͖Λ࣌‫ྻܥ‬σʔλͱͯ͠औಘ
͢Δɽऔಘ͞Εͨ࣌‫ྻܥ‬σʔλ͸͍͔ͭ͘ͷ࣍‫ݩ‬ѹॖ࡞
‫ۀ‬Λߦ͍ɼσʔλͱࣗવ‫ޠݴ‬ͷ஥ཱͪΛ͢Δதؒද‫ͱݱ‬
ͱ΋ʹσʔλϕʔεʹ֨ೲ͞ΕΔɽ ͦͷ‫ޙ‬ɼσʔλϕʔ
ਤ2
Kinect ͱύʔςΟΫϧϑΟϧλΛ༻͍ͨ࣌‫ྻܥ‬σʔ
λऔಘ
ε಺ʹ஝ੵ͞Εͨ࣌‫ྻܥ‬σʔλͱதؒද‫ݱ‬ͷରԠؔ܎Λ
‫ػ‬ցֶश͢Δ͜ͱͰɼಈ࡞൑ผ‫ث‬Λੜ੒͢ΔɽςΩετ
ੜ੒ʹ༻͍ΒΕΔ‫ݯࢿޠݴ‬͸ɼਓͷಈ࡞ͷද‫ݱ‬Λඃ‫ऀݧ‬
࣮‫ʹݧ‬Αͬͯऩू͠ɼͦΕͧΕͷதؒද‫ʹݱ‬ରͯ͠όΠ
3.1.1
ύʔςΟΫϧϑΟϧλ
ύʔςΟΫϧϑΟϧλ͸ঢ়ଶͷ֬཰ີ౓ؔ਺ʹ୯ๆੑɾ
άϥϜϞσϧΛߏங͢Δɽ͜ΕʹΑΓதؒද‫ݱ‬Λબ୒͢
Ψ΢εੑͱ੍͍ͬͨ໿͸ͳ͘ɼඇઢ‫ܗ‬ɾඇΨ΢εੑͷঢ়
Δͱɼͦͷதؒද‫ʹݱ‬ରԠͨ͠όΠάϥϜϞσϧ͕બ୒
ଶۭؒΛਪఆ͢Δ͜ͱ͕Ͱ͖Δɼ࣌‫ྻܥ‬ϑΟϧλͷ̍ͭ
͞ΕɼͦͷϞσϧʹಈత‫ܭ‬ը๏Λద༻͢Δ͜ͱͰɼਓͷ
Ͱ͋ΔɽίϯϐϡʔλϏδϣϯͷ෼໺ʹ͓͍ͯ͸ɼର৅
ಈ࡞Λද‫͢ݱ‬Δ΋ͬͱ΋Β͍͠‫ޠ‬ͷ૊Έ߹ΘͤΛબͿ͜
௥੻ͳͲʹύʔςΟΫϧϑΟϧλΛ࢖ͬͨख๏͕਺ଟ͘
ͱ͕Ͱ͖Δɽ
ఏҊ͞Ε͍ͯΔɽύʔςΟΫϧϑΟϧλ͸, ‫؍‬ଌͰ͖ͳ
͍ঢ়ଶϕΫτϧ xt Λ‫؍‬ଌՄೳͳ‫؍‬ଌϕΫτϧ yt ͔Β
3.1
࣌‫ྻܥ‬σʔλͷऔಘͱॲཧ
ਓؒͷಈ࡞ͷ࣌‫ྻܥ‬σʔλ͸ɼKinect ΧϝϥΛ༻͍
ͯऔಘ͢ΔɽKinect ͷ։ൃ‫͋Ͱݩ‬Δ MicroSoft ࣾ͸ɼਓ
ؒͷࠎ֨ΛਪఆͰ͖Δඪ४ϥΠϒϥϦ΋ఏ‫͓ͯ͠ڙ‬Γɼ
ͦͷϥΠϒϥϦΛ༻͍Δͱਓͷؔઅͷ 3 ࣍‫ݩ‬৘ใΛਪఆ
ਪఆ͢Δɽঢ়ଶ xt ͱ‫؍‬ଌ஋ yt ͸, ͦΕͧΕҎԼʹࣔ͢
γεςϜϞσϧ (1), ‫؍‬ଌϞσϧ (2) ʹΑͬͯಘΒΕΔ.
P (xt | xt−1 )
(1)
P (yt | xt )
(2)
͢Δ͜ͱ͕Ͱ͖Δɽ
(k)
(k)
K
Λ༻͍ͨਓ෺ͷؔઅҐஔਪఆ΋༻͍ɼRGB ಈը૾ͱਓ
K ‫ݸ‬ͷॏΈ෇͚͞Εཻͨࢠͷू߹ Xt = {(xt , πt )}k=1
Ͱঢ়ଶ xt ͷࣄ‫ޙ‬෼෍Λද͢. ͜͜Ͱ πt (k) ͸‫ݸ‬ʑͷύʔ
෺ͷ‫ݞ‬ͷӈखɾࠨखɾӈගɾࠨගɾ‫ݞ‬ͷத৺ͷ 5 Օॴͷ xyz
ςΟΫϧͷॏΈΛ͍ࣔͯ͠Δ. ҰൠతͳύʔςΟΫϧ
ຊ‫Ͱڀݚ‬͸ɼRGB ը૾ͱਂ౓ηϯαʔɼ·ͨͦΕΒ
1 http://www.microsoft.com/en-us/kinectforwindows/
ϑΟϧλͷΞϦΰϦζϜΛҎԼʹࣔ͢.
Proceedings of ARG WI2
step1 ॳ‫ظ‬ઃఆɿϥϯμϜͳ K ‫ݸ‬ͷ xt−1 Λੜ੒͢Δɽ
͜ΕΛॳ‫ظ‬ύʔςΟΫϧͱ͢Δ
(k)
step2 ༧ଌɿࣜ (1) ʹै͍ xt
(k)
(k)
Λ෼෍ P (xt |xt−1 ) ͔
Βαϯϓϧ͢Δ
(k)
(k)
(k)
(k)
step3 ໬౓‫ࢉܭ‬ɿwt = P (yt |xt ) Кt = wt /
(k)
ͷΑ͏ʹཻ֤ࢠͷॏΈ Кt Λ‫͢ࢉܭ‬Δ
P
(k)
k
wt
step4 ϦαϯϓϦϯάɿπt ʹൺྫͨ֬͠཰Ͱ xt Λ K
‫ݸ‬நग़͢Δ
ਤ4
ಈత‫ܭ‬ը๏ʹΑΔϑϨʔϜ෼ׂΛ༻͍ͨ SAX ͷҰྫ
step5 ࣌ؒߋ৽ɿt→t + 1 ͱͯ͠ step2ʙstep4 Λ‫܁‬ฦ͢
SAX ʹΑͬͯม‫ͯ͠׵‬ಘΒΕͨจࣈྻ͔Βಈ࡞ͱΈ
ΒΕΔ‫ॴݸ‬ΛऔΓग़͢ɽ͜͜Ͱ͸ɼ͋Δಈը૾σʔλத
ͷશͯͷจࣈྻʹ͓͍ͯ 3 ͭલͷจࣈ͔ΒมԽ͕ͳ͚Ε
͹ʮಈ͖͕ͳ͍ʯɼมԽ͕͋Ε͹ʮಈ͖͕͋ΔʯͱΈͳ
͢ (ਤ 5 ࢀর)ɽɹ
ਤ 3 ύʔςΟΫϧϑΟϧλ
ύʔςΟΫϧϑΟϧλ͸, ਤ 3 ʹࣔ͢Α͏ʹɼ༧ଌ, ໬
౓‫ࢉܭ‬ɼϦαϯϓϦϯάΛ‫܁‬ฦͯ࣌ؒ͠ߋ৽Λߦ͏. ͜
ͷΑ͏ʹͯ͠ɼ‫؍‬ଌ஋ yt Λଟ਺ͷཻࢠ Xt ʹΑΓ௥੻
͍ͯ͘͠ɽຊ‫Ͱڀݚ‬͸ɼ෺ମͷ৭Λ໬౓‫ࢉܭ‬ͷର৅ͱ͢
Δ͜ͱͰɼ෺ମͷ௥੻Λߦ͏ɽ
3.1.2
࣌‫ྻܥ‬σʔλͷॲཧ
ਓͷࠎ֨ͱ෺ମͷ৭Λ௥੻͢Δ͜ͱͰಘΒΕͨ࣌‫ྻܥ‬
σʔλ͸ɼSymbolic Aggregation approXimationʢSAXʣ
[Lin 2003] Λ࢖͍ɼจࣈྻʹม‫͢׵‬Δɽ
SAX ͱ͸ɼ࣌‫ྻܥ‬σʔλͷۙࣅද‫ํݱ‬๏ͷ̍ͭͰɼ࣌
‫ྻܥ‬σʔλΛจࣈྻʹม‫͢׵‬Δํ๏Ͱ͋ΔɽSAX Λߦ
͏ࡍɼ·ͣ PAA(Piecewise Aggregate Approximation)
ͱ͍͏σʔλѹॖ࡞‫ۀ‬Λߦ͏ɽ௕͞ n ͷ࣌‫ྻܥ‬σʔλ C
Λ༻͍ͯɼw ࣍‫ݩ‬ͷۭؒϕΫτϧ C̄ = c¯1 , . . . , c¯w ʹม
‫͢׵‬ΔͱԾఆ͢ΔɽC̄ ͷ i ൪໨ͷཁૉ͸ࣜ (3) Λ༻͍ͯ
‫͞ࢉܭ‬ΕΔɽ
ਤ 5 ಈ͖ͷநग़ྫ
ͦͷ‫ޙ‬ɼ
ʮಈ͖͕͋ΔʯͱΈͳ͞Εͨ‫ॴݸ‬ͷจࣈྻΛ
มԽྔ (ਤ 6 தͷΞϧϑΝϕοτͷԼͷ਺஋) ʹม‫͠׵‬ɼ
ѹॖ͢Δ (ਤ 7 ࢀর)ɽ͜Ε͸ಉ͡ಈ࡞Ͱ΋Ґஔ΍εϐʔ
υʹΑͬͯ͸จࣈྻ͕͋ΔҰఆͷִؒͰͣΕͨΓจࣈྻ
ͷ௕͕͞มԽͨ͠Γͯ͠͠·͍ɼಉ͡ಈ͖ͱֶͯ͠श͞
Εͳ͍ͨΊͰ͋Δɽ͜ΕʹΑΓɼҰఆͷִؒͰͣΕͯ͠
·ͬͨ΋ͷ΋௕͕͞ҧ͏΋ͷͰ΋ɼಉ͡ಈ͖ͱͯ͠ͱΒ
͑Δ͜ͱΛՄೳͱ͢Δɽ·ͨɼΑΓಛ௃తͳಈ࡞Λநग़
͢ΔͨΊʹɼѹॖ͞ΕͨมԽྔ͏ͪ࠷େͷେ͖͕͞ᮢ஋
C̄i =
w
n
X
ʹຬͨͳ͍΋ͷ͸औΓআ͘ (ਤ 7 ࢀর)ɽ
n
wi
Cj
(3)
n
(i−1)+1
j= w
ຊ‫Ͱڀݚ‬͸ಈ࡞൑ผͷਫ਼౓ΛߴΊΔͨΊɼҰൠʹσʔ
λΛ౳ִؒʹ w ‫ݸ‬ͷϑϨʔϜʹ෼ׂ͢Δͱ͜ΖΛɼ֤
σʔλʹಈత‫ܭ‬ը๏Λ༻͍΋ͬͱ΋Β͍۠͠੾ΓΛऔಘ
͢Δ͜ͱͰɼΑΓσʔλʹԊͬͨจࣈྻΛऔಘ͢Δ (ਤ
4 ࢀর)ɽ
ਤ 6 จࣈྻͷมԽྔ
Web ΠϯςϦδΣϯεͱΠϯλϥΫγϣϯ‫ڀݚ‬ձ༧ߘू
SAX
䝕䞊䝍
䠴䠖cccccbbcccccc
䠵䠖aaabbeeeeeeee
䠶䠖dccbbaaaabbbc
䠴䠖cc
䠵䠖ee
䠶䠖cb
ኚ໬㔞
䠴䠖0,0,0,0,-1,0,1,0,0,0,0,0
䠵䠖0,0,1,0,2,0,0,0,0,0,0,0
䠶䠖-1,0,-1,0,-1,0,0,0,1,0,0,1
䠴䠖0
䠵䠖0
䠶䠖-1
3.4
…
όΠάϥϜϞσϧʹΑΔςΩετੜ੒
ຊ‫Ͱڀݚ‬͸ɼόΠάϥϜϞσϧΛ༻͍ͨ୯७ͳςΩε
τੜ੒Λߦ͏ɽͦΕͧΕͷಈ࡞ʹର͠όΠάϥϜϞσϧ
Λߏங͢ΔͨΊʹඃ‫ݧ࣮ऀݧ‬Λߦ͍ɼಛఆͷಈ࡞ʹର͠
…
༷ͯʑͳࣗવ‫ޠݴ‬ද‫ݱ‬ΛूΊͨɽ͜ΕʹΑΓɼ‫؍‬ଌ͞Ε
ͨ࣌‫ྻܥ‬σʔλʹରͯ͠ಛఆͷதؒද‫͕ݱ‬༩͑ΒΕͨͱ
͖ɼ‫ͯ͠ͱݯࢿޠݴ‬όΠάϥϜϞσϧΛબ୒͠ςΩετ
䠴䠖-1,1
䠵䠖3
䠶䠖-3,2
ᅽ⦰
䠴䠖0
䠵䠖0
䠶䠖-1
ੜ੒Λߦ͏ɽ
͔͠͠ɼྫ͑͹ಉ͡ಈ࡞Ͱ΋ɼ͋Δਓ͸ 10 ‫Ͱޠ‬ද‫ݱ‬
͠ɼ·ͨ͋Δਓ͸ 15 ‫Ͱޠ‬ද‫͢ݱ‬ΔͳͲɼද‫ݱ‬ͷ࢓ํ͕
䠴䠖-1,1
䠵䠖3
䠶䠖-3,2
㑅ู
…
…
ҟͳΔɽจͷੜ੒֬཰͸ɼόΠάϥϜωοτϫʔΫ্Ͱ
ͷબ୒͞Εͨ୯‫ޠ‬ͷੜ‫ͱ཰֬ى‬୯‫ؒޠ‬ͷભҠ֬཰ͷੵʹ
Αܾͬͯ·ΔͨΊ จதʹ‫·ؚ‬ΕΔ‫͕਺ޠ‬ଟ͘ͳΔ΄Ͳ
ਤ 7 σʔλͷѹॖɾબผͷྫ
จͷੜ੒֬཰͕Լ͕ͬͯ͠·͏ɽ͜ͷ͜ͱ͔Βɼจͷ௕
͞ʹґଘ͠ͳ͍ςΩετੜ੒͕ߦ͑ΔΑ͏ɼόΠάϥϜ
3.2
தؒද‫ݱ‬
Ϟσϧʹ null ϥϕϧΛಋೖ͢Δɽ
ςΩετੜ੒Ͱ͸ɼ࣌‫ྻܥ‬σʔλͱࣗવ‫ޠݴ‬จΛͭͳ
͙தؒද‫ݱ‬Λ༻͍Δ͜ͱͰςΩετੜ੒ʹ࢖͏‫ݯࢿޠݴ‬
null ϥϕϧ͸ɼจͷதͷ୯‫ͯ͠ͱޠ‬ѻΘΕɼଞͷ୯‫ޠ‬
ͱಉ͡Α͏ʹϢχάϥϜͱόΠάϥϜͷߏ੒ཁૉͱͳ
Λબ୒͢Δɽதؒද‫ݱ‬͸ද 1 ͷΑ͏ʹఆٛ͢Δɽ
Δɽ͜ͷΑ͏ʹ null ϥϕϧΛѻ͏ͨΊʹɼߏ੒͞Εͨ
ද1
action
up
όΠάϥϜϞσϧʹରͯ͠ಈత‫ܭ‬ը๏Λద༻͢Δલʹ
தؒද‫ݱ‬
தؒද‫ݱ‬
“up(joint,null)”
ҎԼʹଓ͘લॲཧΛͦΕͧΕͷจʹରͯ͠ߦ͏ɽ·ͣɼ
ҙຯ
upward
શͯͷจͰ୯‫਺ޠ‬ͷ࠷େ஋ maxɼ࠷খ஋ min ΛಘΔɽ
movement
࣍ʹɼmax ͔Β min ΛҾ͖ɼnull ʹৼΔ൪߸ͷ࠷େ஋
null max Λ‫ٻ‬ΊΔɽ࠷‫ʹޙ‬ɼͦΕͧΕͷจʹର͠ɼ୯
‫ ͕਺ޠ‬max ʹຬͨͳ͚Ε͹ɼnull max ͔Β 1 ͣͭҾ
͍ͨ஋Λɼ଍Γͳ͍਺͚ͩจ຤͔Βจ಄ʹ޲͚ૠೖͯ͠
down
“down(joint,null)”
downward
movement
pick
“up(joint,object)”
pick
up
movement
put
“down(joint,object)”
put
͍͘ɽnull ϥϕϧಋೖͷΠϝʔδΛɼਤ 8 ʹࣔ͢ɽ
move-
ment
3.3
࣌‫ྻܥ‬σʔλͷಈ࡞൑ผ
ຊ‫Ͱڀݚ‬͸ਓͷಈ࡞ͷ൑ผΛߦ͏ͨΊʹର਺ઢ‫ܗ‬Ϟσ
ϧΛ༻͍ɼॲཧ͞Εͨ࣌‫ྻܥ‬σʔλͱதؒද‫ݱ‬ͷରԠΛ
ਤ 8 null ϥϕϧಋೖͷΠϝʔδ
‫ػ‬ցֶशͤ͞Δɽ3.1 Ͱड़΂ͨ࣌‫ྻܥ‬σʔλॲཧΛࢪ͠
ͨσʔλ d ͱɼਓͷಈ࡞Λද͢தؒද‫ ݱ‬r ͔Βߏ੒ͨ͠
ૉੑϕΫτϧ φ Λ༻͍ͯɼࣜ (4) ͷର਺ઢ‫ܗ‬ϞσϧΛߏ
੒͢Δ͜ͱͰɼσʔλ͕༩͑ΒΕͨԼͰͷ֤தؒද‫͕ݱ‬
બ͹ΕΔ֬཰ P (r|d) ΛϞσϧԽͨ͠ɽ͜͜ͰɼZd,w ͸
ਖ਼‫ن‬Խ܎਺Ͱ͋Δɽ
จதͷ֤ null ϥϕϧʹҧ͏൪߸Λ͚ͭΔ͜ͱʹΑͬͯ
ผͷ୯‫͠ͳݟͯ͠ͱޠ‬ɼͦΕͧΕ͕όΠάϥϜϞσϧͷ
1 ཁૉͱͯ͠ѻ͏ɽ·ͨɼຊ‫Ͱڀݚ‬͸όΠάϥϜϞσϧ
Λߏங͢Δࡍʹɼ࢖༻͢Δจͷऔࣺબ୒ΛߦΘͳ͍͜ͱ
Ͱɼଟ͘ͷ‫ؔͱޠ‬࿈͚ͮΔ͜ͱ͕Ͱ͖ΔͨΊɼΑΓෳࡶ
ͳςΩετੜ੒Λߦ͏͜ͱ͕Ͱ͖Δɽ
P (r|d) =
1
Zd,w
exp(w · φ(d, r))
(4)
ਓͷಈ࡞Λઆ໌͢ΔͷʹΑ͘༻͍ΒΕΔจΛੜ੒͢Δ
ͨΊʹ͸ɼ͜ͷόΠάϥϜϞσϧʹಈత‫ܭ‬ը๏Λద༻͢
Δ͜ͱͰ໬౓͕࠷΋ߴ͘ͳΔ୯‫ޠ‬ͷ૊Έ߹Θ͔ͤΒͳΔ
จΛબͿɽ
Proceedings of ARG WI2
࣮‫ݧ‬
4
4.2
ߏஙͨࣝ͠ผ‫ػ‬Λ༻͍ͯςετσʔλ͔Β൑ผ͞Εͨ
͜͜Ͱ͸ɼ
ʮखΛ্͛ΔʯʮखΛԼ͛ΔʯʮϘʔϧΛऔ
ΔʯʮϘʔϧΛஔ͘ʯͱ͍͏؆୯ͳಈ࡞ (ਤ 9) Λ‫ݴ‬༿Ͱ
࣮‫݁ݧ‬Ռ
தؒද‫ݱ‬͸ɼॱʹ
ද‫͢ݱ‬Δ͜ͱΛ໨తͱ͢Δɽ
1. “up((left hand),null)”
2. “up((right hand),null)”
3. “down((left hand,right hand),null)”
4. “up((right hand),green)”
ਤ 9 ‫ޠݴ‬Խͷର৅ͱͳΔಈ࡞
5. “down((right hand),green)”
4.1
࣮‫༷࢓ݧ‬
6. “up((left hand,right hand),null)”
‫ޠݴ‬Խͷର৅ͱͳΔಈ࡞ΛʮࠨखΛ͋͛ΔʯʮӈखΛ
্͛Δʯʮ྆खΛԼ͛ΔʯʮϘʔϧΛऔΔʯʮϘʔϧΛஔ
7. “down((right hand),null)”
͘ʯʮ྆खΛ্͛ΔʯʮӈखΛԼ͛ΔʯʮࠨखΛԼ͛Δʯ
8. “down((left hand),null)”
ͷ 8 ͭͷ‫ج‬ຊಈ࡞͔Β੒Δͱఆٛ͢Δɽ·ͨɼͦΕͧΕ
ͷಈ࡞ʹର͠ɼࣗવ‫Ͱޠݴ‬ͷઆ໌จΛੜ੒͢Δ͜ͱͱ͢
ͱͳͬͨɽ࣍ʹɼબ͹Εͨதؒද‫ʹݱ‬ରͯ͋͠Β͔͡Ί
Δɽඃ‫ͯ͠ͱݧ࣮ऀݧ‬ɼର৅ͱͳΔਓͷಈ࡞ͷ Kinect
ߏங͞ΕͨόΠάϥϜϞσϧʹಈత‫ܭ‬ը๏Λద༻͢Δ͜
ϏσΦΛ‫؍‬৆͠ɼͦΕʹ͍ͭͯࣗવ‫Ͱޠݴ‬આ໌ͯ͠΋Β
ͱͰɼಈ࡞Λઆ໌͢Δ΋ͬͱ΋Β͍͠จΛੜ੒͢Δɽ
͏ͱ͍͏࣮‫ݧ‬Λ 12 ਓʹର͠ߦͬͨɽऩूͨ͠೔ຊ‫ޠ‬ͷ
આ໌จΛ‫ܗ‬ଶૉղੳ‫ ػ‬MeCab Λ༻͍ͯ୯‫ʹͱ͝ޠ‬෼͚ɼ
݁Ռͱͯ͠ɼͦΕͧΕͷಈ࡞ʹରͯ͠໬౓ͷߴ͔ͬͨ
্Ґ 3 จΛද 3 ʹࣔ͢ɽ
͜ΕΛ‫ͯ͠ͱݯࢿޠݴ‬όΠάϥϜϞσϧΛߏஙͨ͠ɽ‫ݴ‬
‫ͨͬͳͱݯࢿޠ‬આ໌จͷશจ਺ɼ‫਺ޠ‬ɼ‫ޠ‬ͷछྨ਺Λද
2 ʹࣔ͢ɽ
4.3
ߟ࡯
࣮‫݁ݧ‬Ռ͔Βɼਓͷಈ࡞Λਖ਼֬ʹද‫͢ݱ‬Δจ͕ੜ੒ग़
ද 2 ऩू͞Εͨจͷಛ௃
དྷ͍ͯΔ͜ͱ͕֬ೝͰ͖ͨɽ·ͨɼද 3 ͷੜ੒จΛΈ
ಈ͖
จ਺
‫਺ޠ‬
‫ޠ‬ͷछྨ਺
Δͱɼ͍͔ͭ͘ͷจͰऴ୺จࣈ ʮEOFʯ͕ग़͖͍ͯͯ
ࠨखΛ্͛Δ
27
145
43
ͳ͍͜ͱ͕෼͔Δɽ͜Ε͸ɼόΠάϥϜϞσϧ͕ूΊͨ
ࠨखΛԼ͛Δ
28
146
47
จʹ‫ݱ‬ΕΔ‫ޠ‬ͷόΠάϥϜͷ૊Έ߹ΘͤʹΑͬͯߏ੒͞
ӈखΛ্͛Δ
25
131
44
Ε͍ͯΔͨΊͰ͋Δɽ͜ΕʹΑΓɼόΠάϥϜϞσϧ΁
ӈखΛԼ͛Δ
31
174
51
null ϥϕϧΛՃ͑ͨจ͕ɼूΊΒΕͨͲͷจΑΓ΋௕͘
྆खΛ্͛Δ
32
163
50
ੜ੒͞ΕΔՄೳੑ͕͋Δɽ·ͨҰํͰɼจ͕௕͘ͳΕ͹
྆खΛԼ͛Δ
30
165
53
ͳΔ΄Ͳɼͦͷจͷ໬౓͕௿͘ͳ͍ͬͯ͘ɽ͕ͨͬͯ͠ɼ
ϘʔϧΛऔΔ
29
162
37
ूΊΒΕͨจΑΓ௕͍จ͸ੜ੒͞Εͳ͍ͱ͍͏ԾఆͷԼ
ϘʔϧΛஔ͘
29
170
43
ͰɼूΊͨจͷ࠷େͷ୯‫਺ޠ‬Λੜ੒จͷ୯‫ͨ͠ͱ਺ޠ‬ɽ
ςετσʔλʹ͸ɼ
ʮࠨखΛ͋͛ΔʯʮӈखΛ্͛Δʯ
5
·ͱΊͱࠓ‫ޙ‬ͷ՝୊
ʮ྆खΛԼ͛Δʯ
ʮϘʔϧΛऔΔʯ
ʮϘʔϧΛஔ͘ʯ
ʮ྆ख
ຊ‫Ͱڀݚ‬͸ɼಈը૾தͷਓͷಈ࡞Λද‫͢ݱ‬Δ֬཰త‫ݴ‬
Λ্͛Δʯ
ʮӈखΛԼ͛Δʯ
ʮࠨखΛԼ͛ΔʯͷॱͰಈ࡞
‫ޠ‬ੜ੒ͷ࿮૊ΈΛఏҊͨ͠ɽKinect ϏσΦͰநग़͞Ε
Λߦͬͨ Kinect ಈըΛ࢖༻ͨ͠ɽಈ࡞൑ผʹ͸ 3.3 Ͱ
ͨਓͷಈ࡞͓ΑͼύʔςΟΫϧϑΟϧλͰऔಘ͞Εͨ෺
ࣔͨ͠ର਺ઢ‫ܗ‬ϞσϧΛద༻͠ɼςΩετੜ੒ʹ࢖ΘΕ
ମͷ‫੻ي‬͸ɼ࣌‫ྻܥ‬σʔλͱͯ͠औಘ͞ΕɼSAX ʹಈ
Δதؒද‫ݱ‬ͷ൑ผʹ༻͍ͨɽ
త‫ܭ‬ը๏Λ༻͍΋ͬͱ΋Β͍۠͠੾ΓΛಋೖͯ͠‫߸ه‬Խ
͢Δख๏ͳͲɼ͍͔ͭ͘ͷ࣍‫ݩ‬ѹॖख๏Λద༻͢Δ͜ͱ
Ͱ‫ػ‬ցֶशʹదͨ͠‫ʹܗ‬ม‫͞׵‬ΕΔɽ·ͨ‫؍‬ଌ͞Εͨਓ
ͷಈ͖Λද‫͢ݱ‬ΔͨΊʹɼඃ‫ʹݧ࣮ऀݧ‬ΑͬͯूΊΒΕ
ͨࣗવ‫ޠݴ‬จʹ‫͖ͮج‬όΠάϥϜϞσϧΛߏங͠ɼಈత
‫ܭ‬ը๏Λద༻͢Δ͜ͱͰɼ΋ͬͱ΋Β͍͠‫ޠ‬ͷ૊Έ߹Θ
ͤΛऔಘ͢Δɽ͞ΒʹɼόΠάϥϜϞσϧʹ൪߸Λ෇͚
Web ΠϯςϦδΣϯεͱΠϯλϥΫγϣϯ‫ڀݚ‬ձ༧ߘू
ද 3 ֤ಈ࡞ʹର͢Δੜ੒จͷ্Ґ 3 จ
ಈ࡞
1
2
3
4
5
6
7
8
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
ੜ੒จ
ࠨख, Λ, ্͛Δ, ɻ, null 5, null 6, null 7, null 8, EOF
ࠨख, Λ, ্͛Δ, ɻ, null 4, null 5, null 6, null 7, null 8
ࠨख, Λ, ͻ͡, Λ, ্͛Δ, ɻ, null 4, null 5, null 6
ӈख, Λ, ্͛Δ, ɻ, null 4, null 5, null 6, null 7, null 8
ӈख, Λ, ্͛Δ, ɻ, null 5, null 6, null 7, null 8, EOF
ӈख, Λ, ͢͜͠, ͋͛Δ, ɻ, null 4, null 5, null 6, null 7
྆ख, Λ, ԼΖ͢, ɻ, null 4, null 5, null 6, null 7, null 8, null 9
྆ख, Λ, ԼΖ͢, ɻ, null 5, null 6, null 7, null 8, null 9, EOF
྆ख, Λ, ಉ࣌ʹ, Լ͛Δ, ɻ, null 4, null 5, null 6, null 7, null 8
Ϙʔϧ, Λ, ্࣋ͪ͛Δ, ɻ, null 5, null 6, null 7, null 8, null 9, EOF
ࠨख, Ͱ, Ϙʔϧ, Λ, ্࣋ͪ͛Δ, ɻ, null 6, null 7, null 8, null 9
Ϙʔϧ, Λ, ࠨख, Ͱ, Ϙʔϧ, Λ, ্࣋ͪ͛Δ, ɻ, null 6, null 7
Ϙʔϧ, Λ, ஔ͘, ɻ, null 5, null 6, null 7, null 8, null 9, EOF
Ϙʔϧ, Λ, ஔ͘, ɻ, null 4, null 5, null 6, null 7, null 8, null 9
ࠨख, Ͱ, Ϙʔϧ, Λ, ஔ͘, ɻ, null 5, null 6, null 7, null 8
྆ख, Λ, ্͛Δ, ɻ, null 5, null 6, null 7, null 8, null 9, EOF |
྆ख, Λ, ্͛Δ, ɻ, null 4, null 5, null 6, null 7, null 8, null 9
྆ख, Λ, ಉ࣌ʹ, ‫͛ڍ‬Δ, ɻ, null 4, null 5, null 6, null 7, null 8
ӈख, Λ, ԼΖ͢, ɻ, null 4, null 5, null 6, null 7, null 8
ӈख, Λ, ԼΖ͢, ɻ, null 5, null 6, null 7, null 8, EOF
ӈख, Λ, ‫ހ‬, Λ, ԼΖ͢, ɻ, null 4, null 5, null 6
ࠨख, Λ, Լ͛Δ, ɻ, null 4, null 5, null 6, null 7, null 8, null 9, null 10
ࠨख, Λ, Լ͛Δ, ɻ, null 5, null 6, null 7, null 8, null 9, null 10, EOF
ࠨख, Λ, ৳͹͠, ͨ, ɻ, null 4, null 5, null 6, null 7, null 8, null 9
ͨ null ϥϕϧΛಋೖ͢Δ͜ͱʹΑΓɼจੜ੒ʹ୯‫਺ޠ‬
ͷ੍‫ݶ‬Λ͚ͭͣʹࣗવ‫ޠݴ‬จੜ੒Λߦ͏͜ͱ͕Ͱ͖ͨɽ
·ͨɼఏҊख๏͸ςϯϓϨʔτʹΑΔςΩετੜ੒Ͱ͸
[5]
ͳ͘ɼ֬཰తͳϞσϧʹΑΔੜ੒Ͱ͋Δ͜ͱ͔Βɼྫ͑
͹͞ΒʹจΛऩू͢Ε͹ͦΕʹ߹Θͤͯग़ྗจ΋มԽ͠
[6]
͍ͯ͘ͳͲɼࢿ‫ͳͱݯ‬ΔจॻʹΑ༷ͬͯʑͳࣗવ‫ޠݴ‬ද
‫ݱ‬ΛಘΔ͜ͱ͕Ͱ͖Δɽ
ҰํͰɼ‫ݱ‬ஈ֊Ͱ͸ߏจ੍໿ΛऔΓೖΕͯ͸͍ͳ͍ɽ
[7]
ͦͷͨΊࠓ‫ޙ‬ͷ՝୊ͱͯ͠ɼ͜͏ͨ͠஌ࣝΛಋೖ͢Δͱ
ͱ΋ʹɼΑΓਖ਼֬ʹΠϕϯτΛઆ໌͢ΔΑ͏ͳςΩετ
ੜ੒͕ߦ͑ΔΑ͏ൃల͍͖͍ͤͯͨ͞ͱߟ͑Δɽ·ͨɼ
[8]
தؒද‫ͱݱ‬όΠάϥϜϞσϧͱͷରԠ෇͚ΛΑΓॊೈ͠
ͨΓɼҰ࿈ͷಈ࡞͔Βࣗવ‫ޠݴ‬จʹΑͬͯઆ໌͞ΕΔಈ
࡞Λ۠੾Δ໰୊ʹ΋औΓ૊ΜͰ͍͖͍ͨɽ
[9]
ࢀߟจ‫ݙ‬
[1] Haonan Yu and Jeffrey Mark Siskind, Grounded
Language Learning from Video Described with Sentences, 51th Associcatoin for Computational Linguistics, Bulgaria, 2013.
[2] Regneri,M., Rohrbach,M., Wetzel,D.,Thater, S.,
Schiele, B., and Pinkal, M., Grounding Action Descriptions in Videos, 51th Associcatoin for Computational Linguistics, Bulgaria, 2013.
[3] Yoshitaka Ushiku, Tatsuya Harada, and Yasuo Kuniyoshi. A Understanding Images with Natural Sentences. the 19th Annual ACM International Conference on Multimedia (ACMMM 2011), pp.679-682,
2011.
[4] Takano, W. and Nakamura, Y.:Integrating whole
body motion primitives and natural language for
[10]
໬౓
1.76e-12
1.57e-12
1.19e-14
8.63e-13
5.40e-13
3.98e-15
2.91e-14
2.68 e-14
1.49e-16
2.64e-14
2.03e-14
2.77e-15
3.99e-15
2.79 e-15
2.57e-16
2.53e-14
1.41e-14
1.32e-15
2.05e-12
5.96e-13
3.71e-15
2.67e-15
8.90e-16
9.53e-18
humanoid robots, Proc. IEEE-RAS Int. Conf. Humanoid Robots, pp.708-713, 2008.
Takano, W. and Nakamura, Y.:Incremental learning
of integrated semiotics absed on linguistic and behavioral symbols, Proc. IEEE/RSJ Int. Conf. Intelligent
Robots and Systems, pp.1780-1785, 2010.
Percy Liang, Michael I. Jordan, Dan Klein 2009.
Learning Semantic Correspondences with Less Supervision, ACL-IJCNLP
Angeli, Gabor and Liang, Percy and Klein, Dan,
2010. A simple domain-independent probabilistic approach to generation, Proceedings of the 2010 Conference on Empirical Methods in Natural Language
Processing, pp. 502–512,Cambridge, Massachusetts
Konstas, Ioannis and Lapata, Mirella, 2012. Unsupervised concept-to-text generation with hypergraphs, Proceedings of the 2012 Conference of the
North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montreal, Canada, pp.752–761
Konstas, Ioannis and Lapata, Mirella,
2012.
Concept-to-text generation via discriminative
reranking, Proceedings of the 50th Annual Meeting
of the Association for Computational Linguistics:
Long Papers - Volume 1, pp. 369–378, Jeju Island,
Korea
Lin, J., Keogh, E., Lonardi, S. and Chiu, B. 2003.
A Symbolic Representation of Time Series, with Implications for Streaming Algorithms DMKD’ 03