A Recommendation System for Software Function Discovery Naoki Ohsugi Software Engineering Laboratory, Graduate School of Information Science, Nara Institute of Science and Technology Tuesday 16 December, 2003. International Workshop on Community-Driven Evolution of Knowledge Artifacts Growth of Software Functions Application software is getting more complicated and providing more functions. Total number of menu items (Microsoft Office) Word 2000: 660 Word 2002: 772 Excel 2000: 705 Excel 2002: 792 PowerPoint 2000: 565 PowerPoint 2002: 646 Users can’t find useful functions from too many functions. Screenshot of MS-Word 2002 2 of 14 Users Could Not Find Some Useful Functions! Total Number of Different Functions Subjects: 32 users in our lab. Period: 22 months Maximum Number of Functions Used Minimum Number of Functions Used Number of Functions Average Number of Functions Used 900 792 800 772 705 700 660 646 565 600 10.6% 10.5% 33.5% 22.8% 21.7% 15.5% 500 400 1.4% 1.5% 5.4% 300 3.3% 4.8% 14.2% 3.3% 10.4% 1.4% 10.0% 4.1% 189 200 100 3.2% 147 83 75 10 38 143 80 12 26 18 120 67 31 66 22 11 32 0 Excel2000 Excel2002 PPT2000 PPT2002 Word2000 Word2002 3 of 14 A Recommendation System for Software Function Discovery The system recommends individual users a set of candidate functions, which may be useful. Our solution is a Collaborative Filtering approach. Here’s my recommendation: Tools Word Count… 21 pts Insert Date Time… Tools Thesaurus… Insert Footnote… Tools Spelling… 20 pts 18 pts 18 pts 17 pts 4 of 14 What is Collaborative Filtering (CF)? “Collaborative” means using some users’ knowledge for filtering. “Filtering” means selecting useful items from large amount of items. Selecting useful items F is good! K is cool! F ? K ? Using some users’ knowledge A B C D E F G H I J K L M N O P Q R S T 5 of 14 Large amount of items Voting-based Recommendation Systems with CF The systems collect explicit votes as users’ knowledge. Amazon.com (Book recommendation system) MovieLens (Movie recommendation system) http://www.amazon.com http://www.movielens.umn.edu 6 of 14 Logging Usage as Users’ Knowledge The proposed system automatically collects the records of executed functions (Usage logs) as users' knowledge. Usage logs are collected from some users via the Internet. Application Software e.g. MS-Word, Excel User Log Collector VBA Plug-In The Internet Usage log as shown below: 2002/02/03 18:50:41 Formatting->Font… Server of the System 2002/02/03 18:50:45 File->Save As… 7 of 14 Step1: Computing Similarities Computing similarities between the target user and the other users Function AA Function Function B Function CC Function Function D User 1 Function Function AA Function Function BA Function Function CC Target user Similar users Function AA Function Function B Function CC Function Function D User 2 Function E Function F Function G User 3 Function H Function I Function J Function K User 4 Dissimilar users 8 of 14 Step 2: Delivering Knowledge Delivering the useful functions candidate, which were frequently used by the similar users'. Function A Function BB Function Function C Function D User 1 Function A Function Function BB Function C Function D Target user Similar users Function A Function BB Function Function C Function DD Function User 2 Function E Function F Function G User 3 Function H Function I Function J Function K User 4 Dissimilar users 9 of 14 Conventional Similarity Calculation Calculating Similarities by Correlation Coefficient The dominant frequencies (e.g., “Undo” or “Save”) over-affect similarity computations. Target user 1 2 3 4 5 6 7 Undo Save Redo Copy Paste Cut Clear User 2 60% 20% 10% 4% 3% 2% 1% 1 2 3 4 5 6 7 Correlation based similarity (Range of value [-1.00, +1.00]) User 3 Save Undo Redo Copy Paste Cut Clear +0.41 55% 25% 10% 4% 3% 2% 1% 1 2 3 4 5 6 7 Undo Save Clear Cut Copy Paste Redo 60% 20% 6% 5% 4% 3% 2% +0.97 10 of 14 Better Similarity Calculation Calculating Similarities by Rank Correlation The dominant frequencies ("Undo" & "Save") do not affect similarity computations. Target user 1 2 3 4 5 6 7 Undo Save Redo Copy Paste Cut Clear User 2 60% 20% 10% 4% 3% 2% 1% 1 2 3 4 5 6 7 User 3 Save Undo Redo Copy Paste Cut Clear 55% 25% 10% 4% 3% 2% 1% 1 2 3 4 5 6 7 Undo Save Clear Cut Copy Paste Redo Correlation based similarity +0.41 +0.97 Rank correlation based similarity +0.90 +0.05 60% 20% 6% 5% 4% 3% 2% 11 of 14 Evaluating Accuracy of Recommendation Yao’s ndpm measure * Y.Y. Yao, “Measuring Retrieval Effectiveness Based on User Preference of Documents”, J. of American Society for Information Science, 46, 2, 1995, pp.133-145. Interview for user System Usage logs 6 users 22 months 1. Function A 2. Function B 3. Function C 4. Function D User’s Ideal Recommendation Ndpm [0.0, 1.0] 0.0 is the best 1.0 is the worst Comparison 1. Function A 2. Function B 3. Function C 4. Function D System’s Recommendation 12 of 14 Experimental Result Ndpm Collected usage logs of Ms-Word 2000 Subjects: 6 users in our lab. Period: 22 months Each user’s ndpm Average of ndpm 0.5 of ndpm 0.6 0.5 0.514 0.404 0.4 0.396 0.383 0.355 0.3 0.2 Random User Count Base Case Correlation based Similarity Rank Correlation based Similarity Algorithms 13 of 14 Conclusion I proposed a recommendation system to help users discover useful functions. I evaluated the accuracy of recommendation. The result suggested the proposed system has a potential to provide useful recommendation for software function discovery. 14 of 14
© Copyright 2025 ExpyDoc