A Recommendation System for Software Function

A Recommendation System for
Software Function Discovery
Naoki Ohsugi
Software Engineering Laboratory,
Graduate School of Information Science,
Nara Institute of Science and Technology
Tuesday 16 December, 2003.
International Workshop on Community-Driven Evolution of Knowledge Artifacts
Growth of Software Functions
 Application software is getting more complicated and
providing more functions.

Total number of menu items (Microsoft Office)
 Word 2000: 660
 Word 2002: 772
 Excel 2000: 705
 Excel 2002: 792
 PowerPoint 2000: 565
 PowerPoint 2002: 646
Users can’t find useful functions
from too many functions.
Screenshot of MS-Word 2002
2 of 14
Users Could Not Find Some Useful Functions!
Total Number of Different Functions
Subjects: 32 users in our lab.
Period: 22 months
Maximum Number of Functions Used
Minimum Number of Functions Used
Number of Functions
Average Number of Functions Used
900
792
800
772
705
700
660
646
565
600
10.6%
10.5%
33.5%
22.8%
21.7%
15.5%
500
400
1.4%
1.5%
5.4%
300
3.3%
4.8%
14.2%
3.3%
10.4%
1.4%
10.0%
4.1%
189
200
100
3.2%
147
83
75
10
38
143
80
12 26
18
120
67
31
66
22
11
32
0
Excel2000
Excel2002
PPT2000
PPT2002
Word2000
Word2002 3 of 14
A Recommendation System for
Software Function Discovery
 The system recommends individual users a set of
candidate functions, which may be useful.
 Our solution is a Collaborative Filtering approach.
Here’s my recommendation:
Tools  Word Count…
21 pts
Insert  Date Time…
Tools  Thesaurus…
Insert  Footnote…
Tools  Spelling…
20 pts
18 pts
18 pts
17 pts
4 of 14
What is Collaborative Filtering (CF)?
“Collaborative” means using some users’ knowledge
for filtering.
“Filtering” means selecting useful items from large
amount of items.
Selecting useful items
F is good!
K is cool!
F ?
K
?
Using some users’ knowledge
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
5 of 14
Large amount of items
Voting-based Recommendation Systems
with CF
The systems collect explicit votes as users’ knowledge.
Amazon.com
(Book recommendation system)
MovieLens
(Movie recommendation system)
http://www.amazon.com
http://www.movielens.umn.edu
6 of 14
Logging Usage as Users’ Knowledge
The proposed system automatically collects the records of
executed functions (Usage logs) as users' knowledge.
Usage logs are collected from some users via the Internet.
Application Software
e.g. MS-Word, Excel
User
Log Collector VBA Plug-In
The Internet
Usage log as shown below:
2002/02/03
18:50:41 Formatting->Font…
Server of
the System
2002/02/03 18:50:45 File->Save As… 7 of 14
Step1: Computing Similarities
 Computing similarities between the target user and the
other users
Function AA
Function
Function B
Function CC
Function
Function D
User 1
Function
Function AA
Function
Function BA
Function
Function CC
Target user
Similar users
Function AA
Function
Function B
Function CC
Function
Function D
User 2
Function E
Function F
Function G
User 3
Function H
Function I
Function J
Function K
User 4
Dissimilar users
8 of 14
Step 2: Delivering Knowledge
 Delivering the useful functions candidate, which were
frequently used by the similar users'.
Function A
Function BB
Function
Function C
Function D
User 1
Function A
Function
Function BB
Function C
Function D
Target user
Similar users
Function A
Function BB
Function
Function C
Function DD
Function
User 2
Function E
Function F
Function G
User 3
Function H
Function I
Function J
Function K
User 4
Dissimilar users
9 of 14
Conventional Similarity Calculation
 Calculating Similarities by Correlation Coefficient

The dominant frequencies (e.g., “Undo” or “Save”) over-affect similarity
computations.
Target user
1
2
3
4
5
6
7
Undo
Save
Redo
Copy
Paste
Cut
Clear
User 2
60%
20%
10%
4%
3%
2%
1%
1
2
3
4
5
6
7
Correlation based similarity
(Range of value [-1.00, +1.00])
User 3
Save
Undo
Redo
Copy
Paste
Cut
Clear
+0.41
55%
25%
10%
4%
3%
2%
1%
1
2
3
4
5
6
7
Undo
Save
Clear
Cut
Copy
Paste
Redo
60%
20%
6%
5%
4%
3%
2%
+0.97
10 of 14
Better Similarity Calculation
 Calculating Similarities by Rank Correlation

The dominant frequencies ("Undo" & "Save") do not affect similarity
computations.
Target user
1
2
3
4
5
6
7
Undo
Save
Redo
Copy
Paste
Cut
Clear
User 2
60%
20%
10%
4%
3%
2%
1%
1
2
3
4
5
6
7
User 3
Save
Undo
Redo
Copy
Paste
Cut
Clear
55%
25%
10%
4%
3%
2%
1%
1
2
3
4
5
6
7
Undo
Save
Clear
Cut
Copy
Paste
Redo
Correlation based similarity
+0.41
+0.97
Rank correlation based similarity
+0.90
+0.05
60%
20%
6%
5%
4%
3%
2%
11 of 14
Evaluating Accuracy of Recommendation
 Yao’s ndpm measure

* Y.Y. Yao, “Measuring Retrieval Effectiveness Based on User Preference of Documents”,
J. of American Society for Information Science, 46, 2, 1995, pp.133-145.
Interview for user
System
Usage logs
6 users
22 months
1. Function A
2. Function B
3. Function C
4. Function D
User’s Ideal Recommendation
Ndpm
[0.0, 1.0]
0.0 is the best
1.0 is the worst
Comparison
1. Function A
2. Function B
3. Function C
4. Function D
System’s Recommendation
12 of 14
Experimental Result
Ndpm
Collected usage logs of Ms-Word 2000
Subjects: 6 users in our lab.
Period: 22 months
Each user’s ndpm
Average of ndpm
0.5 of ndpm
0.6
0.5
0.514
0.404
0.4
0.396
0.383
0.355
0.3
0.2
Random
User Count
Base Case Correlation based
Similarity
Rank Correlation
based Similarity
Algorithms
13 of 14
Conclusion
 I proposed a recommendation system to help users
discover useful functions.
 I evaluated the accuracy of recommendation.

The result suggested the proposed system has a potential
to provide useful recommendation for software function
discovery.
14 of 14