Method Property - Software Engineering Laboratory

Building Domain Specific Dictionaries
of Verb-Object Relation
from Source Code
Yasuhiro Hayase†, Yu Kashima‡,
Yuki Manabe‡, Katsuro Inoue‡
†:
Faculty of Information Sciences and Arts, Toyo University
‡: Graduate
School of Information Science and Technology, Osaka University
1
Program Comprehension
• Program comprehension consumes at
least half the time allocated to the software
maintenance process.[1]
• Identifiers in source code are very
important for program comprehension.[2][3]
– Software developers try to understand a
program by guessing the roles of the program
elements from their identifiers.
[1]: R. K. Fieldstad and W. T. Hamlen. Application Program Maintenance Study: Report to Our
Respondents
[2]: A. Von Mayrhauser and A. M. Vans. Identification of Dynamic Comprehension Processes
During Large Scale Maintenance
[3]: Nancy Pennington. Comprehension strategies in programming
2
Presenting Behavior using Combination of Identifiers
Combinations of multiple identifiers in
source code represent program behaviors.
Ex.
class JMenu {
void addMenuListener(MenuListener) {
…
add MenuListener to JMenu
Verb Direct Object (DO)
Indirect Object (IO)
• Method has Verb-Object (V-O) relations.[4]
• Complicate combinations of identifiers represent rich
meaning.
• Understanding these combinations is important for
program comprehension.
[4] : D. Shepherd, L. Pollock, K and Vijay-Shanker. Analyzing source code: looking
for useful verb-direct object pairs in all the right places
3
Problem for Naming
• Developers need to learn the rules of
various word and their combinations in
different domain.
– Programming language
– Organization
– Application domain
• If the rules are not documented, the only
way to learn these rules is through
examples.
4
Approach for Support Naming
• Problem
– The learning through examples is difficult and timeconsuming task.
• Approach
– Building dictionary by collecting V-O relations from
software products in a domain
• Presenting good example for appropriate naming
• Input
– Software products in a same domain written in objectoriented programming language
• Output
– A dictionary including <Verb, DO, IO> tuples
Verb
DO
IO
Add
Product
Stock
Set
Password
User
5
Overview of Method
Extraction Patterns
Software Products in a Same Domain
Return
Type
Method Properties
Return
Type
void
Method
Name
addProduct
Step1. Obtaining
Method Property
Argument
Product
void
Class
Name
Stock
Prepared by hand
Method
Name
Class
Name
Argument
Verb1 Noun2
Noun2
Noun3
Verb
DO
IO
Verb1
Noun2
Noun3
add Product
void
Verb Noun
Noun
<Verb,
DO, IO> tuples
Dictionary
Step2. Extracting
V-O Relations
Noun
Step3. Filtering V-O Relations
Verb
DO
IO
# of Products
Add
Product
Stock
3
Build
Data
BooleanMatrix 1
Set
Password
User
4
6
Method Property
A tuple of four sequences of words together with
part-of-speech (i.e., word class, POS)
Return Type
Noun
void
Ex.
Method Name
Argument
Class Name
Noun Sequence
Split composed word,
then perform POS tagging
( OpenNLP [5] + several heuristics)
Noun
Server Class : void createTicketForUser(User)
void
createTicketForUser
User
Server
Noun
Noun
create Ticket For User
void
Verb Noun PrePos Noun
[5] : http://opennlp.sourceforge.net/projects.html
7
Extraction Pattern
Structure Spec
Return Type
void
Noun
Wild Card
Method Name
Argument
POS Sequence
Class Name
Noun Sequence
Wild Card
Extraction Spec
Verb
DO
Noun
Wild Card
IO
Three words in the structure spec
Ex.
Structure Spec
Return Type
void
Method Name
Argument
Verb1 Noun2 PrePos3 Noun4
Class Name
Wild Card
Wild Card
Extraction Spec
Verb
DO
IO
Verb1
Noun2
Noun4
8
Extracting V-O Relations
1. Match method property to a structure spec
2. If the matching succeed, extract a <Verb, DO, IO> tuple
according to the extraction spec
Ex.
Method Property
void
createTicketForUser
User
Server
Verb
DO
IO
create
Ticket
User
create Ticket For User
void
Verb Noun PrePos Noun Noun
Noun
Extraction Pattern
Structure Spec
Return Type
void
Method Name
Verb1 Noun2 PrePos3 Noun4
Extraction Spec
Verb
DO
Verb1
Noun2
Argument
Class Name
Wild Card
Wild Card
IO
Noun4
9
Evaluation
Evaluate the validity of the dictionary built
with our method
• Overview
1. Prepare 31 extraction patterns by hand
2. Build 4 domain dictionaries using the
patterns as the experimental target
3. Evaluate tuples in the dictionaries by
questionnaire investigation by 6 students in
a software engineering laboratory
10
Experimental Target
Built 4 domain dictionaries
– Web application (WEB)
– XML processing (XML)
– Database (DB)
– GUI
Web
XML
DB
GUI
Input
# of Software
Products
Output
# of Methods in
the Products
# of tuples in the
dictionary
WEB
10
74707
282
XML
11
55812
547
DB
9
74127
672
GUI
7
298696
407
11
Questionnaire Investigation
1. Extract 90 tuples randomly from each
dictionary
2. Evaluate the tuples by 6 students in a
software engineering laboratory
Q1. Is the V-O relation of the tuple actually used
in the dictionary domain or in common Java
programs?
Q2. Does the tuple include an inappropriate Verb,
DO, or IO?
Q3. Is the tuple useful for appropriate naming of
identifiers?
12
Task Assignment
30 tuples
30 tuples
90 tuples
30 tuples
30 tuples
30 tuples
One participant was assigned
two dictionaries in which domain
he/she has an experience.
90 tuples in one dictionary were
assigned to three participants.
Each participant was assigned
30 tuples per one dictionary.
13
Result ( Q1 )
Q1. Is the V-O relation of the tuple actually used
in the dictionary domain or in common Java programs?
100%
higher is better
80%
60%
40%
20%
0%
WEB
XML
The Dictionary Domain
DB
GUI
Java
• Ratios of tuples used in the dictionary domain
• 62% ~ 75%
• Ratios of tuples used in common Java program
• 38% ~ 76%
The dictionaries include:
• Many tuples used in the dictionary domain
• Tuples used in common Java programs
14
Result ( Q2 )
Q2. Does the tuple include an inappropriate Verb, DO, or IO?
100%
lower is better
80%
60%
40%
20%
0%
WEB
XML
DB
GUI
• Ratios of tuples including an inappropriate
Verb, DO, or IO
• 6% ~ 13%
• Most tuples are given an appropriate word.
15
Result ( Q3 )
Q3. Is the tuple useful for appropriate naming of identifiers?
100%
80%
higher is better
60%
40%
20%
0%
WEB
XML
The Dictionary Domain
DB
GUI
Java
• Ratios of useful tuples in the dictionary domain
• 53% ~ 71%
• Ratios of useful tuples in common Java program
• 30% ~ 61%
• The dictionaries include many useful tuples used in
each domain.
16
Tuples evaluated useful in Q3
Tuples evaluated useful in the dictionary domain
Verb
DO
IO
WEB
Destroy Session
HttpSessionEvent
XML
Declare Prefix
NamespaceSupport
DB
Add
Constraint Table
GUI
Click
Mouse
MouseEvent
17
Tuples evaluated Not Useful in Q3
Verb
DO
IO
DB
Release
Mouse
MouseEvent
GUI
Gain
Focus
Fe
Reasons why tuples evaluated not useful
These tuples:
– belong to other domains
– contain uncertain words
– are common sense for average developers
– are used not in the whole domain, but in the
programs that dependent on a specific library
18
Discussion
• The dictionaries included tuples in other domains.
– The threshold for filtering was too low to remove noise.
→ More input products are needed to use a higher threshold.
– Some of the input products belong to multiple domains
(e.g., both WEB and DB)
→ If a tuple is appeared in multiple dictionaries, treat the
tuple specially
• The POS tagger gave inaccurate POSs to words in a
method.
– Our POS tagger uses OpenNLP with several heuristic but
the tagger was not effective in case.
→ Optimize the method of POS tagging for words in a
method
19
Conclusion and Future Work
• Conclusion
– Proposed an approach for building domain
specific dictionary of V-O relations in methods
• Future Work
– Develop a method for filtering out tuples in
other domains
– Develop an environment to support naming
with a dictionary built by our method
20