コードクローン解析に基づくリファクタリング支援 (Refactoring Support Based on Code Clone Analysis) 肥後 芳樹,神谷 年洋,楠本 真二,井上 克郎 (Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue) 大阪大学 大学院情報科学研究科 (Graduate School of Information Science and Technology, Osaka University) 科学技術振興機構 さきがけ (Presto, Japan Science and Technology Agency) {y-higo,kamiya,kusumoto,inoue}@ist.osaka-u.ac.jp Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Background What is code clone? a code fragment that has identical or similar fragments in the same or different files in a system introduced in the source program because of various reasons such as reusing code by `copy-and-paste’ makes software maintenance more difficult. copy-and-paste copy-andpaste Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2 Requirements for Code Clone Detection Appropriate code clones should be detected in compliance with demands. To understand the amount and distribution of code clones, it is desirable to detect all code clones To remove code clones (Restructuring or Refactoring), it is useful to detect code clones that can be removed, and also removing them improves software maintainability Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 3 Research Objective and Approach We aim to extract code clones which can be easily refactored Approach To detect code clones efficiently, we use a code clone detection tool, CCFinder. Then, we extract the specific code clones easily refactored and provide applicable refactoring patterns for the code clones. Finally, we develop a refactoring support tool and apply it to an open source program. Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 4 Refactoring Process Support Commonly used refactoring process Step 1: Determine where refactoring should be applied Step 2: Determine which refactoring patterns can/should be applied Step 3: Investigate the effectiveness of the refactoring patterns Step 4: Modify source code Step 5: Conduct regression tests Proposed Method supports Steps1 and 2 High scalability: it take less of high time complexity. Detect fine-graded clone: it detect more fine-graded code clone than method unit. Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 5 Outline of CCFinder CCFinder directly compares source code on token unit, and detects code clones Normalization of name space Replacement of names defined by user Removal of table initialization Consideration of module delimiter CCFinder can analyze the system of millions line scale in practical use time Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 6 CCFinder: Clone Detection Process Source files throws {{ String (( (( )) )) throws $$ $$foo foofoo() 1. static void throws RESyntaxException static throws$$RESyntaxException RESyntaxException String aa { static void throws {{ $$ $$ new ]{{] {{$$ "123,400" , [[ [[]] String $$String $$ = new ]] == a[] String $[] { "123,400", 2. [[String new "abc", "orange 100" }; "abc" , "orange 100" } ; org . apache . regexp ; } } ; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); new $$ == $$ pat ==new new 4. .intRE sum 0; org . apache . regexp . RE ) ; int == $$00 $ $$ (( "[0-9,]+" "[0-9,]+" int sum $$ sum $$ = 5. for (int i$ = 0; i) < ;a.length; ++i) ;; for < $$ i$$i == int for (( int = 0$$0 ;;; i$$i << 6. a$ if. (pat.match(a[i])) ;; ++ if ifif(( (($$ pat $$ ;; ++ length ++$$ ii)) ))if pat $ . length ++ 7. .. match sum += Sample.parseNumber(pat.getParen(0)); $$ (( (( $$ aa [[ [[ $$ ii ]] ]] )) )) )) )) $$sum sum 8. += System.out.println("sum "getParen + sum); (( 00 (( $$ .. $$ (( ((pat $$ .. $.$. parseNumber parseNumber pat$$ .= . getParen += Sample 9. } )) )) ;; System .. .$$. println $$ .. .$$. out System out println "sum=="" (( $$ (( "sum static String $$ $$goo }}}} static ))) ;;;; goo(String $$ void 10. static [] ((a)(( $$throws RESyntaxException { ++ sum static void void goo String static ]] ))) =throws = {{ RE $$RE("[0-9,]+"); [[ exp throws RESyntaxException RE exp exp == {{ $$ $$ = 11. a$$RE newRESyntaxException =sum$ == 00 $ ) ; )) $;; $int $ (( "[0-9,]+" int sum 12. newintRE sum"[0-9,]+" = 0; for (( int for 0$ ;; i$i << 13. ;; for (intint$i =i$i 0;== i 0< a.length; ++i) $ ; ++ ;; ++ length a$ .. length ++$ ii) ))if ifif( (($ exp exp 14. . match if (exp.match(a[i])) $ ( (( $ aa [ [[ $ ii ] ]] ) )) ) )) $sum sum 15. += parseNumber sum += parseNumber(exp.getParen(0)); ( . $ getParen $ ( ( $exp. (. $exp getParen () 0) )( )0 ) ) $$ .. parseNumber 16. ;; System.out.println("sum + sum sum); ( $ ((+ "sum . .$. println $ . $.. out ==""" ++ out $ = System println "sum sum 17. })) ;; }} Lexical Lexical analysis analysis Token Token sequence sequence Transformation Transformation Transformed Transformed token token sequence sequence Match Match detection detection Clones Clones on on transformed transformed sequence sequence Formatting Formatting Clone pairs Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 7 Definitions: Clone Pair and Clone Set Clone Pair: a pair of identical or similar fragments Clone Set: a set of identical or similar fragments Clone Pair Clone Set (C1, C4) {C1, C4, C5} C2 (C1, C5) {C2, C3} C3 (C2, C3) C4 (C4, C5) C1 C5 CCFinder detects code clones as a clone pair After detection process, clone pairs are transformed into clone sets Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 8 Extraction of code clones easily refactored Structural code clones are regarded as the target of refactoring Detect clone pairs by CCFinder Transform the detected clone pairs into clone sets Extract structural parts as structural code clones from the detected clone sets What is a structural code clone ? example: Java language Declaration: class declaration, interface declaration Method: method body, constructor, static initializer statement: do, for, if, switch, synchronized, try, while Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 9 fragment 1 609: 610: 611: 612: 613: 614: 615: 616: 617: 618: 619: 620: 621: 622: 623: 624: 625: 626: 627: 628: reset(); grammar = g; // Lookup make-switch threshold in the grammar generic options if (grammar.hasOption("codeGenMakeSwitchThreshold")) { try { makeSwitchThreshold = grammar.getIntegerOption("codeGenMakeSwitchThreshold"); //System.out.println("setting codeGenMakeSwitchThreshold to " + makeSwitchThreshold); } catch (NumberFormatException e) { tool.error( "option 'codeGenMakeSwitchThreshold' must be an integer", grammar.getClassName(), grammar.getOption("codeGenMakeSwitchThreshold").getLine() ); } } Code clones which CCFinder detects // Lookup bitset-test threshold in the grammar generic options if (grammar.hasOption("codeGenBitsetTestThreshold")) { try { bitsetTestThreshold = grammar.getIntegerOption("codeGenBitsetTestThreshold"); fragment 2 623: 624: 625: 626: 627: 628: 629: 630: 631: 632: 633: 634: 635: 636: 637: 638: 639: 640: 641: 642: } // Lookup bitset-test threshold in the grammar generic options if (grammar.hasOption("codeGenBitsetTestThreshold")) { try { bitsetTestThreshold = grammar.getIntegerOption("codeGenBitsetTestThreshold"); //System.out.println("setting codeGenBitsetTestThreshold to " + bitsetTestThreshold); } catch (NumberFormatException e) { tool.error( "option 'codeGenBitsetTestThreshold' must be an integer", grammar.getClassName(), grammar.getOption("codeGenBitsetTestThreshold").getLine() ); } } Code clones which proposed method detects // Lookup debug code-gen in the grammar generic options if (grammar.hasOption("codeGenDebug")) { Token t = grammar.getOption("codeGenDebug"); if (t.getText().equals("true")) { Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 10 fragment 3 1007: 1008: 1009: 1010: 1011: 1012: 1013: 1014: 1015: 1016: 1017: 1018: 1019: if ( inputState.guessing==0 ) { buf.append(a.getText()); } { _loop144: do { if ((LA(1)==WILDCARD)) { match(WILDCARD); a=id(); if ( inputState.guessing==0 ) { buf.append('.'); buf.append(a.getText()); } } Code clones which CCFinder detects fragment 4 1527: 1528: 1529: 1530: 1531: 1532: 1533: 1534: 1535: 1536: 1537: 1538: 1539: if ( inputState.guessing==0 ) { t=a.getText(); } { _loop84: do { if ((LA(1)==COMMA)) { match(COMMA); id(); if ( inputState.guessing==0 ) { t+=","+b.getText(); } } Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 11 Provision of applicable refactoring patterns Following refactoring patterns[1][2] can be used to remove code sets including structural code clones Extract Class, Extract Method, Extract Super Class, Form Template Method, Move Method, Parameterize Method, Pull Up Constructor, Pull Up Method, For each clone set, the proposed method determines which refactoring pattern is applicable by using several metrics. [1]: M. Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1999. [2]: http://www.refactoring.com/, 2004. Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 12 Metrics(1):Volume Metrics for Clone Set LEN, POP, DFL LEN(S): is the average length of token sequence for a clone set S POP(S): is the number of elements (code fragments) of a clone set S DFL(S): indicates an estimation of how many tokens would be removed from source files when all code fragments in a clone set S are reconstructed new sub routine caller statements Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 13 example: Metrics(2): Coupling Metrics for Clone Set ・Clone set S includes fragments f and f NRV, NSV ・In fragment f , externally defined variable b and c are 1 2. 1 referred and a is assigned to. NRV(S): represents the average number of externally ・Fragment f2 is same as f1. defined variables referred in the fragment of a clone set S then,NRV(S) = ( 2 + 2 ) / 2 = 2 NSV(S): represents the average NSV(S) = ( 1 +number 1 ) / 2 = 1of externally defined variables assigned to in the fragment of a clone set S Definition int a , b, c; Fragment f1 … if( … ){ reference …; assignment Clone set fragment f1, f2, ・・・, fn … =Sbincludes + c; si is thea number = …; of externally defined variable which fragment fi refers …; ti is the number of externally defined variable which } fragment fi assigns … Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 14 example 2: Metrics(3):Inheritance example 3: Metric for Clone Set ・Clone set S includes fragments f and f . ・Clone set S includes fragments f and f . ・If all fragments of clone set S are included in a DCH ・If all classes which include f and f don’t have example 1: 1 1 2 2 1 2 class and its direct child classes, common class, DCH(S): represents the position and distance ・Clone setparent S includes fragmentsbetween f1 and f2. each of clone=set fragment of a clone set・IfSall fragments then,DCH(S) 1 S are included in a then,DCH(S) = ∞ same class, then, DCH(S) = 0 Definition class A S includes fragment f1, f2, ・・・,fn B class AClone set class class A f exists in class C Fragment i fragment f1Class C fragment f2 fragment fp1 is a class on classf hierarchy fragment 2 class B i which locates lowest position in C1, C2, ・・・,Cn class C If no common parent class of C1,C2,・・・,Cn exists, the value of fragment f2 DCH(S) is ∞ This metric is measured for only the class hierarchy where target software exists. fragment f1 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 15 Aries: Refactoring Support Tool Overview Target: Java programs Runtime environment: JDK1.4 or above Implementation Analysis component: Java 32,000 Lines CCFinder is used as code clone detection component JavaCC is used to construct syntax and semantic analysis component GUI component: Java14,000 Lines User can specify target clone sets through GUI operations. Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 16 Case Study: Ant Overview Ant is one of build tools like ‘make’ Input for Aries Source files of Ant: 627 LOC: about 180,000 It took 30 seconds to extract structural code clones We got 151 clone sets. Environment OS: FreeBSD 4.9 CPU: Xeon 2.8G x 2 Memory: 4GB Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 17 Case Study: Ant Extract Method (conditions) To apply ‘Extract Method’ pattern, we filtered clone sets by using following conditions The unit of clone is statement (do, for, if, …) Set the value of DCH(S) = 0 All fragments of a clone set are included in a class Set the value of NSV(S) < 2 Each fragment of a clone set assigns any value to 1 or no externally defined variable. 32 clone sets satisfied these conditions Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 18 Case Study: Ant Extract Method(result) 32 clone set can be categorized as followings category number No parameter, no return value 3 Addition of some parameters, no return value 18 Addition of some parameters and return the value 7 assignment Others 4 if (!isChecked()) { =={null) { if (iSaveMenuItem == null) // javacopts if (name sure we have anull) circular try { //ifmake (javacopts != don't null && !javacopts.equals("")) { if (other.name != { reference here StackgenicTask.createArg().setValue("-javacopts"); stk = new iSaveMenuItem = Stack(); newfalse; MenuItem(); return stk.push(this); iSaveMenuItem.setLabel("Save BuildInfo To Repository"); genicTask.createArg().setLine(javacopts); } getProject()); } catch dieOnCircularReference(stk, iExc) { }(Throwable } else if (!name.equals(other.name)) { } handleException(iExc); return false; local } } variable } Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 19 Conclusion We have proposed refactoring support method implemented a refactoring support tool, Aries conducted a case study to Ant, which is an open source program, and most of filtered clone sets could be removed. Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 20 Future Works As future works, we are going to evaluate whether or not each refactoring should be done as the viewpoint of software quality (support Step 3) find a group of clone sets that can be refactored at once to conduct refactoring more effectively Commonly used refactoring process Step 1: Determine where refactoring should be applied Step 2: Determine which refactoring patterns can/should be applied Step 3: Investigate the effectiveness of the refactoring patterns Step 4: Modify source code Step 5: Conduct regression tests Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 21 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 22 Code clone detection for refactoring: Related Works Detect similar sub-graphs as clone on program dependency graph [1]. High accuracy: This approach finds out data-dependence and control dependence in source codes. High time complexity: It takes O(n2) time to construct program dependency graph. Detect similar methods and functions as clone using metrics [2]. Low accuracy: if the size of target method or function is small, the values of metric make no difference. detection unit restriction: only method and function unit clone can be detected. [1] R. Komondoor and S. Horwitz, “Using slicing to identify duplication in source code”, In Proc. of the 8th International Symposium on Static Analysis, Paris, France, July 16-18, 2001. [2] Magdalena Balazinska, Ettore Merlo, Michel Dagenais, Bruno Lague, and Lostas Kontogiannis, “Advanced Clone-Analysis to Support Object-Oriented System Refactoring”, WCRE 2000, pp. 98-107 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 23
© Copyright 2024 ExpyDoc