Measuring Closeness of Search Engine

Measuring Closeness of
Search Engine
- Identification of Outliers
- Visualization of Closeness
Wang Hua 王化
情報科学科四年
Motivation
Too many search engines
More than 20 major general-purpose engines
 More specific-purpose engines

Simple aggregation of rankings is popular.
We address the need to quantify and visualize
the closeness between search engines.
Too Many Search
Engines with Different
Policy
Major search engines

Yahoo, Altavista, Google,Lycos etc.
Distinct ranking policy
Directory type
 Robot type
 Pagerank type with hyperlink

Outline of Methods
Ranking
List distance measure
Distance between search engines
Ranking
Partial List
Cases for WWW web sites
 Top 100 list

List of results from
search engines
Footrule Distance
among Ranking Lists

s, t: ranking lists

S i |s(i) - t(i)|
[a,b,c,d,e]
[a,d,e,c,b]
0+2+1+2+3=8

Kendall-tau Distance
Definition [Dwork, WWW10, 2001]

Counts the number of pairwise disagreements
between two lists
| { i < j | s(i) < s(j) but t(i) > t(j) } |
 [a,b,c,d]
[a,d,c,b]
6 pairs: (a,b) (a,c) (a,d) (b,c) (b,d) (c,d)
0+0+0+1+1+1=3
Character of Distance
Kendall-tau has O(n log n)-time
complexity
Meets triangle inequality and norm
distance
Matrix of Distance
Keyword = “university
Engines
Dmos
Alta
Yahoo
Overture
Dmos
Alta
Yahoo OverT Excite Lycos Aol
Sprinks
Galay
441
100
132
121
190
213
211
42
490
737
574
895
915
100
720
2324
2123
1349
879
1221
1766
7162
7113
6254
945
312
8927
9699
282
192
8712
462
354
461
365
Excite
Lycos
Aol
Sprinks
123
Galaxy
Table 4.2 The Closeness of Search Engines
Visualization
Kernighan-Lin Algorithm
Kamada Spring Model
Comparison of the 2 methods
Kernighan-Lin Method
Brief explanation
Kernighan-Lin by Color Coding
Keyword1 =“Totti”
Keyword2=“Nakata”
Kernighan-Lin by Color Coding
Keyword1=“Gucci”
Keyword2=“Hermes”
Kamada Spring Model
Brief explanation
An example
Kamada Spring Model
Keyword1=“Totti”
Keyword2=“Nakata”
Comparison of the 2 methods
Results
Distances between search engines are
different.
Different fields have different characters
Some search engines such as Sprinks
are far away from others.
Excite, Aol are near to each other in
most cases.
Conclusion
Address the need to quantify and visualize
the closeness between search engines.
Provide users GUI to see the closeness of
search engines.
Help users to select the proper search
engines
Help users to see the features of each search
engines in carious fields.
Future Work
Use more search engines
Use both general-purpose and specialpurpose search engines
Use hyperlinks to find the resemblance
Apply this idea to other fields