Building Domain Specific Dictionaries of Verb-Object Relation from Source Code Yasuhiro Hayase†, Yu Kashima‡, Yuki Manabe‡, Katsuro Inoue‡ †: Faculty of Information Sciences and Arts, Toyo University ‡: Graduate School of Information Science and Technology, Osaka University 1 Program Comprehension • Program comprehension consumes at least half the time allocated to the software maintenance process.[1] • Identifiers in source code are very important for program comprehension.[2][3] – Software developers try to understand a program by guessing the roles of the program elements from their identifiers. [1]: R. K. Fieldstad and W. T. Hamlen. Application Program Maintenance Study: Report to Our Respondents [2]: A. Von Mayrhauser and A. M. Vans. Identification of Dynamic Comprehension Processes During Large Scale Maintenance [3]: Nancy Pennington. Comprehension strategies in programming 2 Presenting Behavior using Combination of Identifiers Combinations of multiple identifiers in source code represent program behaviors. Ex. class JMenu { void addMenuListener(MenuListener) { … add MenuListener to JMenu Verb Direct Object (DO) Indirect Object (IO) • Method has Verb-Object (V-O) relations.[4] • Complicate combinations of identifiers represent rich meaning. • Understanding these combinations is important for program comprehension. [4] : D. Shepherd, L. Pollock, K and Vijay-Shanker. Analyzing source code: looking for useful verb-direct object pairs in all the right places 3 Problem for Naming • Developers need to learn the rules of various word and their combinations in different domain. – Programming language – Organization – Application domain • If the rules are not documented, the only way to learn these rules is through examples. 4 Approach for Support Naming • Problem – The learning through examples is difficult and timeconsuming task. • Approach – Building dictionary by collecting V-O relations from software products in a domain • Presenting good example for appropriate naming • Input – Software products in a same domain written in objectoriented programming language • Output – A dictionary including <Verb, DO, IO> tuples Verb DO IO Add Product Stock Set Password User 5 Overview of Method Extraction Patterns Software Products in a Same Domain Return Type Method Properties Return Type void Method Name addProduct Step1. Obtaining Method Property Argument Product void Class Name Stock Prepared by hand Method Name Class Name Argument Verb1 Noun2 Noun2 Noun3 Verb DO IO Verb1 Noun2 Noun3 add Product void Verb Noun Noun <Verb, DO, IO> tuples Dictionary Step2. Extracting V-O Relations Noun Step3. Filtering V-O Relations Verb DO IO # of Products Add Product Stock 3 Build Data BooleanMatrix 1 Set Password User 4 6 Method Property A tuple of four sequences of words together with part-of-speech (i.e., word class, POS) Return Type Noun void Ex. Method Name Argument Class Name Noun Sequence Split composed word, then perform POS tagging ( OpenNLP [5] + several heuristics) Noun Server Class : void createTicketForUser(User) void createTicketForUser User Server Noun Noun create Ticket For User void Verb Noun PrePos Noun [5] : http://opennlp.sourceforge.net/projects.html 7 Extraction Pattern Structure Spec Return Type void Noun Wild Card Method Name Argument POS Sequence Class Name Noun Sequence Wild Card Extraction Spec Verb DO Noun Wild Card IO Three words in the structure spec Ex. Structure Spec Return Type void Method Name Argument Verb1 Noun2 PrePos3 Noun4 Class Name Wild Card Wild Card Extraction Spec Verb DO IO Verb1 Noun2 Noun4 8 Extracting V-O Relations 1. Match method property to a structure spec 2. If the matching succeed, extract a <Verb, DO, IO> tuple according to the extraction spec Ex. Method Property void createTicketForUser User Server Verb DO IO create Ticket User create Ticket For User void Verb Noun PrePos Noun Noun Noun Extraction Pattern Structure Spec Return Type void Method Name Verb1 Noun2 PrePos3 Noun4 Extraction Spec Verb DO Verb1 Noun2 Argument Class Name Wild Card Wild Card IO Noun4 9 Evaluation Evaluate the validity of the dictionary built with our method • Overview 1. Prepare 31 extraction patterns by hand 2. Build 4 domain dictionaries using the patterns as the experimental target 3. Evaluate tuples in the dictionaries by questionnaire investigation by 6 students in a software engineering laboratory 10 Experimental Target Built 4 domain dictionaries – Web application (WEB) – XML processing (XML) – Database (DB) – GUI Web XML DB GUI Input # of Software Products Output # of Methods in the Products # of tuples in the dictionary WEB 10 74707 282 XML 11 55812 547 DB 9 74127 672 GUI 7 298696 407 11 Questionnaire Investigation 1. Extract 90 tuples randomly from each dictionary 2. Evaluate the tuples by 6 students in a software engineering laboratory Q1. Is the V-O relation of the tuple actually used in the dictionary domain or in common Java programs? Q2. Does the tuple include an inappropriate Verb, DO, or IO? Q3. Is the tuple useful for appropriate naming of identifiers? 12 Task Assignment 30 tuples 30 tuples 90 tuples 30 tuples 30 tuples 30 tuples One participant was assigned two dictionaries in which domain he/she has an experience. 90 tuples in one dictionary were assigned to three participants. Each participant was assigned 30 tuples per one dictionary. 13 Result ( Q1 ) Q1. Is the V-O relation of the tuple actually used in the dictionary domain or in common Java programs? 100% higher is better 80% 60% 40% 20% 0% WEB XML The Dictionary Domain DB GUI Java • Ratios of tuples used in the dictionary domain • 62% ~ 75% • Ratios of tuples used in common Java program • 38% ~ 76% The dictionaries include: • Many tuples used in the dictionary domain • Tuples used in common Java programs 14 Result ( Q2 ) Q2. Does the tuple include an inappropriate Verb, DO, or IO? 100% lower is better 80% 60% 40% 20% 0% WEB XML DB GUI • Ratios of tuples including an inappropriate Verb, DO, or IO • 6% ~ 13% • Most tuples are given an appropriate word. 15 Result ( Q3 ) Q3. Is the tuple useful for appropriate naming of identifiers? 100% 80% higher is better 60% 40% 20% 0% WEB XML The Dictionary Domain DB GUI Java • Ratios of useful tuples in the dictionary domain • 53% ~ 71% • Ratios of useful tuples in common Java program • 30% ~ 61% • The dictionaries include many useful tuples used in each domain. 16 Tuples evaluated useful in Q3 Tuples evaluated useful in the dictionary domain Verb DO IO WEB Destroy Session HttpSessionEvent XML Declare Prefix NamespaceSupport DB Add Constraint Table GUI Click Mouse MouseEvent 17 Tuples evaluated Not Useful in Q3 Verb DO IO DB Release Mouse MouseEvent GUI Gain Focus Fe Reasons why tuples evaluated not useful These tuples: – belong to other domains – contain uncertain words – are common sense for average developers – are used not in the whole domain, but in the programs that dependent on a specific library 18 Discussion • The dictionaries included tuples in other domains. – The threshold for filtering was too low to remove noise. → More input products are needed to use a higher threshold. – Some of the input products belong to multiple domains (e.g., both WEB and DB) → If a tuple is appeared in multiple dictionaries, treat the tuple specially • The POS tagger gave inaccurate POSs to words in a method. – Our POS tagger uses OpenNLP with several heuristic but the tagger was not effective in case. → Optimize the method of POS tagging for words in a method 19 Conclusion and Future Work • Conclusion – Proposed an approach for building domain specific dictionary of V-O relations in methods • Future Work – Develop a method for filtering out tuples in other domains – Develop an environment to support naming with a dictionary built by our method 20
© Copyright 2025 ExpyDoc