Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年 Motivation Too many search engines More than 20 major general-purpose engines More specific-purpose engines Simple aggregation of rankings is popular. We address the need to quantify and visualize the closeness between search engines. Too Many Search Engines with Different Policy Major search engines Yahoo, Altavista, Google,Lycos etc. Distinct ranking policy Directory type Robot type Pagerank type with hyperlink Outline of Methods Ranking List distance measure Distance between search engines Ranking Partial List Cases for WWW web sites Top 100 list List of results from search engines Footrule Distance among Ranking Lists s, t: ranking lists S i |s(i) - t(i)| [a,b,c,d,e] [a,d,e,c,b] 0+2+1+2+3=8 Kendall-tau Distance Definition [Dwork, WWW10, 2001] Counts the number of pairwise disagreements between two lists | { i < j | s(i) < s(j) but t(i) > t(j) } | [a,b,c,d] [a,d,c,b] 6 pairs: (a,b) (a,c) (a,d) (b,c) (b,d) (c,d) 0+0+0+1+1+1=3 Character of Distance Kendall-tau has O(n log n)-time complexity Meets triangle inequality and norm distance Matrix of Distance Keyword = “university Engines Dmos Alta Yahoo Overture Dmos Alta Yahoo OverT Excite Lycos Aol Sprinks Galay 441 100 132 121 190 213 211 42 490 737 574 895 915 100 720 2324 2123 1349 879 1221 1766 7162 7113 6254 945 312 8927 9699 282 192 8712 462 354 461 365 Excite Lycos Aol Sprinks 123 Galaxy Table 4.2 The Closeness of Search Engines Visualization Kernighan-Lin Algorithm Kamada Spring Model Comparison of the 2 methods Kernighan-Lin Method Brief explanation Kernighan-Lin by Color Coding Keyword1 =“Totti” Keyword2=“Nakata” Kernighan-Lin by Color Coding Keyword1=“Gucci” Keyword2=“Hermes” Kamada Spring Model Brief explanation An example Kamada Spring Model Keyword1=“Totti” Keyword2=“Nakata” Comparison of the 2 methods Results Distances between search engines are different. Different fields have different characters Some search engines such as Sprinks are far away from others. Excite, Aol are near to each other in most cases. Conclusion Address the need to quantify and visualize the closeness between search engines. Provide users GUI to see the closeness of search engines. Help users to select the proper search engines Help users to see the features of each search engines in carious fields. Future Work Use more search engines Use both general-purpose and specialpurpose search engines Use hyperlinks to find the resemblance Apply this idea to other fields
© Copyright 2024 ExpyDoc