Application of Collaborative Filtering for

コードクローン解析に基づくリファクタリング支援
(Refactoring Support Based on Code Clone Analysis)
肥後 芳樹,神谷 年洋,楠本 真二,井上 克郎
(Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue)
大阪大学 大学院情報科学研究科
(Graduate School of Information Science and Technology,
Osaka University)
科学技術振興機構 さきがけ
(Presto, Japan Science and Technology Agency)
{y-higo,kamiya,kusumoto,inoue}@ist.osaka-u.ac.jp
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
1
Background
What is code clone?
a code fragment that has identical or similar fragments
in the same or different files in a system
introduced in the source program because of various
reasons such as reusing code by `copy-and-paste’
makes software maintenance more difficult.
copy-and-paste
copy-andpaste
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
2
Requirements for Code Clone Detection
Appropriate code clones should be detected in compliance
with demands.
To understand the amount and distribution of code clones,
it is desirable to detect all code clones
To remove code clones (Restructuring or Refactoring), it
is useful to detect code clones that can be removed, and
also removing them improves software maintainability
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
3
Research Objective and Approach
We aim to extract code clones which can be easily refactored
Approach
To detect code clones efficiently, we use a code clone
detection tool, CCFinder.
Then, we extract the specific code clones easily
refactored and provide applicable refactoring patterns for
the code clones.
Finally, we develop a refactoring support tool and apply it
to an open source program.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
4
Refactoring Process Support
Commonly used refactoring process
Step 1: Determine where refactoring should be applied
Step 2: Determine which refactoring patterns can/should be
applied
Step 3: Investigate the effectiveness of the refactoring patterns
Step 4: Modify source code
Step 5: Conduct regression tests
Proposed Method supports Steps1 and 2
High scalability: it take less of high time complexity.
Detect fine-graded clone: it detect more fine-graded code
clone than method unit.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
5
Outline of CCFinder
CCFinder directly compares source code on token unit, and
detects code clones
Normalization of name space
Replacement of names defined by user
Removal of table initialization
Consideration of module delimiter
CCFinder can analyze the system of millions line scale in
practical use time
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
6
CCFinder:
Clone Detection Process
Source files
throws
{{ String
(( (( )) )) throws
$$ $$foo
foofoo()
1. static
void
throws
RESyntaxException
static
throws$$RESyntaxException
RESyntaxException
String aa {
static void
throws
{{ $$ $$
new
]{{] {{$$ "123,400"
,
[[ [[]] String
$$String
$$ =
new
]] == a[]
String
$[] { "123,400",
2. [[String
new
"abc", "orange 100" };
"abc"
,
"orange
100"
}
;
org
.
apache
.
regexp
;
}
} ;
3. org.apache.regexp.RE
pat = new org.apache.regexp.RE("[0-9,]+");
new
$$ ==
$$ pat
==new
new
4. .intRE
sum
0; org . apache . regexp
. RE
) ; int
== $$00
$
$$ (( "[0-9,]+"
"[0-9,]+"
int
sum
$$ sum
$$ =
5. for
(int i$ = 0; i) < ;a.length;
++i)
;; for
<
$$ i$$i ==
int
for (( int
= 0$$0 ;;; i$$i <<
6. a$ if. (pat.match(a[i]))
;; ++
if ifif(( (($$ pat
$$ ;; ++
length
++$$ ii)) ))if
pat
$ . length
++
7. .. match
sum
+=
Sample.parseNumber(pat.getParen(0));
$$ (( (( $$ aa [[ [[ $$ ii ]] ]] )) )) )) )) $$sum
sum
8. +=
System.out.println("sum
"getParen
+ sum);
(( 00
(( $$ .. $$ (( ((pat
$$ .. $.$. parseNumber
parseNumber
pat$$ .=
. getParen
+= Sample
9. } )) )) ;; System
.. .$$. println
$$ .. .$$. out
System
out
println
"sum==""
(( $$ (( "sum
static
String
$$ $$goo
}}}} static
))) ;;;; goo(String
$$ void
10. static
[] ((a)(( $$throws
RESyntaxException {
++ sum
static void
void
goo
String
static
]] ))) =throws
= {{ RE
$$RE("[0-9,]+");
[[ exp
throws
RESyntaxException
RE exp
exp ==
{{ $$ $$ =
11. a$$RE
newRESyntaxException
=sum$ == 00
$ ) ; )) $;; $int
$ (( "[0-9,]+"
int sum
12. newintRE
sum"[0-9,]+"
= 0;
for (( int
for
0$ ;; i$i <<
13. ;; for
(intint$i =i$i 0;== i 0<
a.length; ++i)
$ ; ++
;; ++
length
a$ .. length
++$ ii) ))if ifif( (($ exp
exp
14. . match
if
(exp.match(a[i]))
$ ( (( $ aa [ [[ $ ii ] ]] ) )) ) )) $sum
sum
15. += parseNumber
sum
+=
parseNumber(exp.getParen(0));
( . $ getParen
$ ( ( $exp. (. $exp
getParen
() 0) )( )0 ) )
$$ .. parseNumber
16. ;; System.out.println("sum
+ sum
sum);
( $ ((+ "sum
. .$. println
$ . $.. out
==""" ++
out
$ =
System
println
"sum
sum
17. })) ;; }}
Lexical
Lexical analysis
analysis
Token
Token sequence
sequence
Transformation
Transformation
Transformed
Transformed token
token sequence
sequence
Match
Match detection
detection
Clones
Clones on
on transformed
transformed sequence
sequence
Formatting
Formatting
Clone pairs
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
7
Definitions:
Clone Pair and Clone Set
Clone Pair: a pair of identical or similar fragments
Clone Set: a set of identical or similar fragments
Clone Pair
Clone Set
(C1, C4)
{C1, C4, C5}
C2
(C1, C5)
{C2, C3}
C3
(C2, C3)
C4
(C4, C5)
C1
C5
CCFinder detects code clones as a clone pair
After detection process, clone pairs are transformed into
clone sets
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
8
Extraction of code clones easily refactored
Structural code clones are regarded as the target of refactoring
Detect clone pairs by CCFinder
Transform the detected clone pairs into clone sets
Extract structural parts as structural code clones from the
detected clone sets
What is a structural code clone ?
example: Java language
Declaration: class declaration, interface declaration
Method: method body, constructor, static initializer
statement: do, for, if, switch, synchronized, try, while
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
9
fragment 1
609:
610:
611:
612:
613:
614:
615:
616:
617:
618:
619:
620:
621:
622:
623:
624:
625:
626:
627:
628:
reset();
grammar = g;
// Lookup make-switch threshold in the grammar generic options
if (grammar.hasOption("codeGenMakeSwitchThreshold")) {
try {
makeSwitchThreshold = grammar.getIntegerOption("codeGenMakeSwitchThreshold");
//System.out.println("setting codeGenMakeSwitchThreshold to " + makeSwitchThreshold);
} catch (NumberFormatException e) {
tool.error(
"option 'codeGenMakeSwitchThreshold' must be an integer",
grammar.getClassName(),
grammar.getOption("codeGenMakeSwitchThreshold").getLine()
);
}
}
Code clones which
CCFinder detects
// Lookup bitset-test threshold in the grammar generic options
if (grammar.hasOption("codeGenBitsetTestThreshold")) {
try {
bitsetTestThreshold = grammar.getIntegerOption("codeGenBitsetTestThreshold");
fragment 2
623:
624:
625:
626:
627:
628:
629:
630:
631:
632:
633:
634:
635:
636:
637:
638:
639:
640:
641:
642:
}
// Lookup bitset-test threshold in the grammar generic options
if (grammar.hasOption("codeGenBitsetTestThreshold")) {
try {
bitsetTestThreshold = grammar.getIntegerOption("codeGenBitsetTestThreshold");
//System.out.println("setting codeGenBitsetTestThreshold to " + bitsetTestThreshold);
} catch (NumberFormatException e) {
tool.error(
"option 'codeGenBitsetTestThreshold' must be an integer",
grammar.getClassName(),
grammar.getOption("codeGenBitsetTestThreshold").getLine()
);
}
}
Code clones which
proposed method
detects
// Lookup debug code-gen in the grammar generic options
if (grammar.hasOption("codeGenDebug")) {
Token t = grammar.getOption("codeGenDebug");
if (t.getText().equals("true")) {
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
10
fragment 3
1007:
1008:
1009:
1010:
1011:
1012:
1013:
1014:
1015:
1016:
1017:
1018:
1019:
if ( inputState.guessing==0 ) {
buf.append(a.getText());
}
{
_loop144:
do {
if ((LA(1)==WILDCARD)) {
match(WILDCARD);
a=id();
if ( inputState.guessing==0 ) {
buf.append('.'); buf.append(a.getText());
}
}
Code clones which
CCFinder detects
fragment 4
1527:
1528:
1529:
1530:
1531:
1532:
1533:
1534:
1535:
1536:
1537:
1538:
1539:
if ( inputState.guessing==0 ) {
t=a.getText();
}
{
_loop84:
do {
if ((LA(1)==COMMA)) {
match(COMMA);
id();
if ( inputState.guessing==0 ) {
t+=","+b.getText();
}
}
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
11
Provision of applicable refactoring patterns
Following refactoring patterns[1][2] can be used to remove
code sets including structural code clones
Extract Class,
Extract Method,
Extract Super Class,
Form Template Method,
Move Method,
Parameterize Method,
Pull Up Constructor,
Pull Up Method,
For each clone set, the proposed method determines which
refactoring pattern is applicable by using several metrics.
[1]: M. Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1999.
[2]: http://www.refactoring.com/, 2004.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
12
Metrics(1):Volume Metrics for Clone Set
LEN, POP, DFL
LEN(S): is the average length of token sequence for a clone
set S
POP(S): is the number of elements (code fragments) of a
clone set S
DFL(S): indicates an estimation of how many tokens would
be removed from source files when all code fragments in a
clone set S are reconstructed
new sub routine
caller statements
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
13
example:
Metrics(2): Coupling
Metrics for Clone Set
・Clone set S includes fragments f and f
NRV, NSV ・In fragment f , externally defined variable b and c are
1
2.
1
referred and a is assigned to.
NRV(S): represents
the average number of externally
・Fragment f2 is same as f1.
defined variables referred in the fragment of a clone set S
then,NRV(S) = ( 2 + 2 ) / 2 = 2
NSV(S): represents the
average
NSV(S)
= ( 1 +number
1 ) / 2 = 1of externally
defined variables assigned to in the fragment of a clone set S
Definition
int a , b, c;
Fragment f1
…
if( … ){
reference
…;
assignment
Clone set
fragment f1, f2, ・・・, fn
… =Sbincludes
+ c;
si is thea number
= …; of externally defined variable which
fragment fi refers
…;
ti is the number of externally defined variable which
}
fragment
fi assigns
…
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
14
example 2:
Metrics(3):Inheritance
example 3: Metric for Clone Set
・Clone set S includes fragments f and f .
・Clone set S includes fragments f and f .
・If all fragments of clone set S are included in a
DCH
・If
all classes
which include f and f don’t have
example
1:
1
1
2
2
1
2
class and its direct child classes,
common
class,
DCH(S): represents the
position
and
distance
・Clone
setparent
S includes
fragmentsbetween
f1 and f2. each
of clone=set
fragment of a clone set・IfSall fragments
then,DCH(S)
1 S are included in a
then,DCH(S)
=
∞
same class,
then, DCH(S) = 0
Definition
class A
S includes
fragment f1, f2, ・・・,fn
B
class AClone set class
class A f exists in class C
Fragment
i
fragment f1Class C fragment
f2
fragment fp1 is a class
on classf hierarchy
fragment
2
class B
i
which locates lowest position in C1, C2, ・・・,Cn
class C
If no common parent class of C1,C2,・・・,Cn exists, the value of
fragment f2
DCH(S) is ∞
This metric is measured for only the class hierarchy where target
software exists.
fragment f1
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
15
Aries: Refactoring Support Tool
Overview
Target: Java programs
Runtime environment: JDK1.4 or above
Implementation
Analysis component: Java 32,000 Lines
CCFinder is used as code clone detection component
JavaCC is used to construct syntax and semantic
analysis component
GUI component: Java14,000 Lines
User can specify target clone sets through GUI operations.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
16
Case Study: Ant
Overview
Ant is one of build tools like ‘make’
Input for Aries
Source files of Ant: 627
LOC: about 180,000
It took 30 seconds to extract structural code clones
We got 151 clone sets.
Environment
OS: FreeBSD 4.9
CPU: Xeon 2.8G x 2
Memory: 4GB
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
17
Case Study: Ant
Extract Method (conditions)
To apply ‘Extract Method’ pattern, we filtered clone sets by
using following conditions
The unit of clone is statement (do, for, if, …)
Set the value of DCH(S) = 0
All fragments of a clone set are included in a class
Set the value of NSV(S) < 2
Each fragment of a clone set assigns any value to 1 or
no externally defined variable.
32 clone sets satisfied these conditions
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
18
Case Study: Ant
Extract Method(result)
32 clone set can be categorized as followings
category
number
No parameter, no return value
3
Addition of some parameters, no return value
18
Addition of some parameters and return the value
7
assignment
Others
4
if (!isChecked())
{ =={null) {
if (iSaveMenuItem
== null)
// javacopts
if (name
sure
we
have
anull)
circular
try { //ifmake
(javacopts
!= don't
null &&
!javacopts.equals(""))
{
if (other.name
!=
{ reference here
StackgenicTask.createArg().setValue("-javacopts");
stk = new
iSaveMenuItem
= Stack();
newfalse;
MenuItem();
return
stk.push(this);
iSaveMenuItem.setLabel("Save
BuildInfo To Repository");
genicTask.createArg().setLine(javacopts);
}
getProject());
} catch dieOnCircularReference(stk,
iExc) {
}(Throwable
} else if (!name.equals(other.name))
{
} handleException(iExc);
return false;
local
}
}
variable
}
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
19
Conclusion
We have
proposed refactoring support method
implemented a refactoring support tool, Aries
conducted a case study to Ant, which is an open source
program, and most of filtered clone sets could be
removed.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
20
Future Works
As future works, we are going to
evaluate whether or not each refactoring should be done
as the viewpoint of software quality (support Step 3)
find a group of clone sets that can be refactored at once to
conduct refactoring more effectively
Commonly used refactoring process
Step 1: Determine where refactoring should be applied
Step 2: Determine which refactoring patterns can/should be
applied
Step 3: Investigate the effectiveness of the refactoring patterns
Step 4: Modify source code
Step 5: Conduct regression tests
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
21
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
22
Code clone detection for refactoring:
Related Works
Detect similar sub-graphs as clone on program dependency graph [1].
High accuracy: This approach finds out data-dependence
and control dependence in source codes.
High time complexity: It takes O(n2) time to construct
program dependency graph.
Detect similar methods and functions as clone using metrics [2].
Low accuracy: if the size of target method or function is
small, the values of metric make no difference.
detection unit restriction: only method and function unit
clone can be detected.
[1] R. Komondoor and S. Horwitz, “Using slicing to identify duplication in source code”, In Proc. of
the 8th International Symposium on Static Analysis, Paris, France, July 16-18, 2001.
[2] Magdalena Balazinska, Ettore Merlo, Michel Dagenais, Bruno Lague, and Lostas Kontogiannis,
“Advanced Clone-Analysis to Support Object-Oriented System Refactoring”, WCRE 2000, pp. 98-107
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
23