PowerPoint プレゼンテーション

Probabilistic Semantic Similarity Measurements
for Noisy Short Texts Using Wikipedia Entities
Masumi Shirakawa1, Kotaro Nakayama2, Takahiro Hara1, Shojiro Nishio1
1Osaka
University, Osaka, Japan
2University of Tokyo, Tokyo, Japan
1
Challenge in short text analysis
Statistics are not always enough.
A year and a half after Google
pulled its popular search engine
out of mainland China
Baidu and Microsoft did not
disclose terms of the agreement
2
Challenge in short text analysis
Statistics are not always enough.
A year and a half after Google
pulled its popular search engine
out of mainland China
They are talking about...
Search engines
and China
Baidu and Microsoft did not
disclose terms of the agreement
3
Challenge in short text analysis
Statistics are not always enough.
A year and a half after Google
pulled its popular search engine
out of mainland China
They are talking about...
Search engines
and China
Baidu and Microsoft did not
disclose terms of the agreement
How do machines know that the two
sentences mention about the similar topic?
4
Reasonable solution
Use external knowledge.
A year and a half after Google
pulled its popular search engine
out of mainland China
Baidu and Microsoft did not
disclose terms of the agreement
Wikipedia Thesaurus [Nakayama06]
5
Related work
ESA: Explicit Semantic Analysis [Gabrilovich07]
Add Wikipedia articles (entities) to a text as its semantic representation.
1. Get search ranking of Wikipedia for each term (i.e. Wiki articles and scores).
2. Simply sum up the scores for aggregation.
Key term
extraction
Apple sells
a new product
Input: T
Apple
product
sells
new
t
Related entity
finding
pear
iPhone
pricing
iPad
Apple Inc.
business
c
Aggregation
Apple Inc.
iPhone
pear
Output: ranked list
of c
6
Problems in real world noisy short texts
“Noisy” means semantically noisy in this work.
(We do not handle informal or casual surface forms, or misspells)
Term ambiguity
• Apple (fruit) should not be related with Microsoft.
Fluctuation of term dominance
• A term is not always important in texts.
We explore more effective aggregation method.
7
Probabilistic method
We propose Extended
naïve Bayes to aggregate related entities
Key term
extraction
Related entity
finding
Apple
Apple sells
a new product
product
Input: T
t
pear
iPhone
pricing
iPad
Apple Inc.
𝑃 𝑒𝑡
[Mihalcea07]
[Milne08]
From text T to related articles c
𝑃 𝑐|𝑇 =
𝐾
𝑘=1
Apple Inc.
iPhone
iPad
Output: ranked
list of c
c
𝑃 𝑐𝑡
𝑃(𝑡 ∈ 𝑇)
Aggregation
𝑃 𝑐𝑒
[Nakayama06]
𝑘𝑃
𝑐 𝑡𝑘
𝑃 𝑐
𝐾−1
[Song11]
𝑃 𝑡𝑘 ∈ 𝑇 𝑃 𝑐 𝑡𝑘 + 1 − 𝑃 𝑡𝑘 ∈ 𝑇 𝑃(𝑐)
𝑃 𝑐 𝐾−1
8
When input is multiple terms
Apply naïve Bayes [Song11] to multiple terms 𝑡1 , … , 𝑡𝐾
to obtain related entity c using each probability P (c |tk ).
𝑃 𝑐 𝑡1 , … , 𝑡𝐾
𝑃 𝑡1 , … , 𝑡𝐾 𝑐 𝑃 𝑐
𝑃 𝑐 𝑘 𝑃 𝑡𝑘 𝑐
𝑘 𝑃 𝑐 𝑡𝑘
=
=
=
𝑃 𝑡1 , … , 𝑡𝐾
𝑃 𝑡1 , … , 𝑡𝐾
𝑃 𝑐 𝐾−1
𝑡1
Apple
𝑡2
product
𝑃(𝑐|𝑡1 )
𝑃(𝑐|𝑡2 )
𝑐 = “iPhone”
iPhone
𝑐
…
𝑡𝐾
new
𝑃(𝑐|𝑡𝐾 )
Compute 𝑃(𝑐|𝑡1 , ⋯ , 𝑡𝐾 ) for each
related entity c
9
When input is multiple terms
Apply naïve Bayes [Song11] to multiple terms 𝑡1 , … , 𝑡𝐾
to obtain related entity c using each probability P (c |tk ).
𝑃 𝑐 𝑡1 , … , 𝑡𝐾
𝑡1
𝑡2
𝑃 𝑡1 , … , 𝑡𝐾 𝑐 𝑃 𝑐
𝑃 𝑐 𝑘 𝑃 𝑡𝑘 𝑐
𝑘 𝑃 𝑐 𝑡𝑘
=
=
=
𝑃 𝑡1 , … , 𝑡𝐾
𝑃 𝑡1 , … , 𝑡𝐾
𝑃 𝑐 𝐾−1
Apple
𝑃(𝑐|𝑡1 )
𝑐 = “iPhone”
By using naïve Bayes, entities that are related
product
to multiple terms
can be boosted.iPhone 𝑐
𝑃(𝑐|𝑡 )
2
…
𝑡𝐾
new
𝑃(𝑐|𝑡𝐾 )
Compute 𝑃(𝑐|𝑡1 , ⋯ , 𝑡𝐾 ) for each
related entity c
10
When input is text
Not “multiple terms” but “text,” i.e., we don’t know which terms are key terms.
We developed extended naïve Bayes to solve this problem.
Cannot observe
which are key terms
𝑡1
Apple
𝑡2
product
𝑃(𝑐|𝑡1 )
𝑃(𝑐|𝑡2 )
𝑐 = “iPhone”
iPhone
𝑐
…
𝑡𝐾
new
𝑃(𝑐|𝑡𝐾 )
11
Extended naïve Bayes
Apple
product
Candidates of
key term
𝑇
Apple
…
…
new
𝑇′ = {𝑡1 }
𝑇′ = {𝑡1 , 𝑡2 }
Apple
product
…
Apply naïve Bayes
to each state T’
Apple
𝑇′ = {𝑡1 , ⋯ , 𝑡𝐾 }
product
…
Probability that the set of key
terms T is a state T’ : P(𝑇
= 𝑇′)
new
12
Extended naïve Bayes
Apple
product
Candidates of
key term
𝑇′ = {𝑡1 }
Apple
…
…
new
𝑇′ = {𝑡1 , 𝑡2 }
𝑇
Apple
product
…
Apply naïve Bayes
to each state T’
Apple
𝑇′
𝑇′ = {𝑡1 , ⋯ , 𝑡𝐾 }
𝑃(𝑐|𝑇 ′ ) 𝑃(𝑇 = 𝑇 ′ ) =
𝑘
product
…
Probability that the set of key
terms T is a state T’ :
P(𝑇 = 𝑇′)
new
𝑃 𝑡𝑘 ∈ 𝑇 𝑃 𝑐 𝑡𝑘 + 1 − 𝑃 𝑡𝑘 ∈ 𝑇 𝑃(𝑐)
𝑃 𝑐 𝐾−1
13
Extended naïve Bayes
Apple
product
Candidates of
key term
𝑇′ = {𝑡1 }
Apple
…
…
new
𝑇′ = {𝑡1 , 𝑡2 }
𝑇
Apple
product
…
Apply naïve Bayes
to each state T’
Term dominance is incorporated into naïve Bayes
Apple
𝑇′
𝑇′ = {𝑡1 , ⋯ , 𝑡𝐾 }
𝑃(𝑐|𝑇 ′ ) 𝑃(𝑇 = 𝑇 ′ ) =
𝑘
product
…
Probability that the set of key
terms T is a state T’ :
P(𝑇 = 𝑇′)
new
𝑃 𝑡𝑘 ∈ 𝑇 𝑃 𝑐 𝑡𝑘 + 1 − 𝑃 𝑡𝑘 ∈ 𝑇 𝑃(𝑐)
𝑃 𝑐 𝐾−1
14
Experiments on short text sim datasets
[Datasets] Four datasets derived from word similarity datasets using
dictionary
[Comparative methods] Original ESA [Gabrilovich07], ESA with 16
parameter settings
[Metrics] Spearman’s rank correlation coefficient
ESA with well-adjusted parameter is superior
to our method for “clean” texts.
15
Tweet clustering
K-means clustering using the vector of related entities for measuring distance
[Dataset] 12,385 tweets including 13 topics
#MacBook (1,251) #Silverlight (221) #VMWare (890)
#MySQL (1,241) #Ubuntu (988)
#Chrome (1,018)
#NFL (1,044)
#NHL (1,045)
#MLB (752)
#MLS (981)
#NASCAR (878)
#NBA (1,085)
#UFC (991)
[Comparative methods] Bag-of-words (BOW), ESA with the same
ESA with well-adjusted parameter
parameter,
[Metric] Average of Normalized Mutual Information (NMI), 20 runs
16
Results
0.567
0.6
NMI score
0.5
0.524
0.429
0.4
p-value < 0.01
0.421
0.3
0.2
0.1
0
10
BOW
20
50
100 200 500 1,000 2,000
Number of related entities
ESA-same
ESA-adjusted
Our method
17
Results
0.567
0.6
NMI score
0.5
0.524
0.429
0.4
p-value < 0.01
0.421
0.3
0.2
Our method outperformed ESA with
well-adjusted parameter for noisy short texts.
0.1
0
10
BOW
20
50
100 200 500 1,000 2,000
Number of related entities
ESA-same
ESA-adjusted
Our method
18
Conclusion
We proposed extended naïve Bayes to derive related Wikipedia
entities given a real world noisy short text.
[Future work]
Tackle multilingual short texts
Develop applications of the method
19