Slide 1

Quality prediction model for
object oriented software using
UML metrics
Ana Erika Camargo,
Koichiro Ochimizu
Japan Advanced Institute of Science and Technology
4th World Congress for Software Quality – Bethesda, Maryland, USA – September 2008
Quality Prediction Model using UML metrics
[1] of [42]
Outline
•
•
•
•
•
•
•
•
Objective
Scope
Our Approach
Related Work
Design complexity metrics and UML
Prediction Technique
Case Study
Conclusions and Future work
Quality Prediction Model using UML metrics
[2] of [42]
Objective
To create a model which is able :
• To predict fault-prone* code in early phases of the
life cycle of the software
• To detect possible defects in the software
(*) Fault-prone code: Code capable of having bugs.
Quality Prediction Model using UML metrics
[3] of [42]
: Variables
S : Change in the Same Direction
O : Change in the Opposite Direction
: Scope of this study
Scope
Causal-Loop Diagram
Fault-prone
code
S
S
S
S
Complex
Specifications
Complex
Design
O
O
Complex
Implementation
Wrong
Implementation
O
S
O
Wrong
Design
S
Designers'
experience
Developers'
experience
Quality Prediction Model using UML metrics
Misunderstanding
of Requirements
O
[4] of [42]
Our Approach
*Related existing works
Design Complexity
Predict
Fault-prone
code
Metrics
FROM: Code
Approximation:
To obtain good candidates
of fault-proneness prediction

Design Complexity
Metrics
 Predict before coding
FROM: UML Artifacts
Quality Prediction Model using UML metrics
[5] of [42]
Related work: Fault prediction
Prediction models of fault-proneness:
Study
Output
Prediction Technique
Basili et al. [1996]
Fault-prone classes
Multivariate Logistic
Regression
Briand et al.[2000]
Fault-prone classes
Multivariate Logistic
Regression
Fault ratio
General Regression
Neural Network
Fault ratio
Multiple Linear
Regression
Fault-prone classes
Multivariate Logistic
Regression
Kanmani et al.[2004]
Input: Design
Complexity
Metrics
CK Metrics
among others
Nachiappan et al.[2005]
Olague et al.[2007]
CK, QMOOD
CK : Chidamber & Kemerer, QMOOD: Quality Metrics for Object Oriented Design
Quality Prediction Model using UML metrics
[6] of [42]
Related work: Fault prediction
From these studies, we identified useful metrics to predict
fault-proneness of code :
• Chidamber and Kemerer – CK
1.
2.
3.
4.
5.
6.
Depth of inheritance tree (DIT)
Number of children (NOC)
Weighted Methods per Class (WMC)
Coupling Between objects (CBO)
Response for class (RFC)
Lack of Cohesion of methods (LCOM)
• Bansiyana and Davi's quality metrics - QMOOD
7. Average of DIT for all classes in the system (ANA)
8. Class Interface Size (CIS)
9. Data Access Metric (DAM)
10.Direct Class Coupling (DCC)
11.Measure of aggregation (MOA)
12.Measure of functionality abstraction (MFA)
13.Number of methods (NOM - same as WMC)
Quality Prediction Model using UML metrics
[7] of [42]
Related work: UML & Design Complexity
Metrics
• Tang et. al[2002]: Measures CK metrics from data
structures , which are created from Rational Rose class,
collaboration and activity diagrams.
Issue:
To obtain accurate measures, assumptions are
made, related to the level of details in the
diagrams. For example: one activity diagram per
operation in the system is required
Quality Prediction Model using UML metrics
[8] of [42]
Related work: UML & Design Complexity
Metrics
• Baroni [2002]: formal definition of CK and QMOOD
metrics, among others. This work uses UML class
diagrams.
Issues:
RFC, LCOM calculations are code dependent
CBO calculation, does not have a clear inclusion
of methods used or variables instantiated of
different classes, within every method of a class.
Quality Prediction Model using UML metrics
[9] of [42]
UML & Design Complexity Metrics
*Related existing works
Predict
Design Complexity
Fault-prone
code
Metrics
FROM: Code
Approximation:
To obtain good candidates
of fault-proneness prediction

Design Complexity
Metrics
 Predict before coding
FROM: UML Artifacts
Quality Prediction Model using UML metrics
[10] of [42]
UML & Design Complexity Metrics
Design complexity metrics that can be approximated using
UML class diagrams:
• Chidamber and Kemerer – CK






Weighted Methods per Class (WMC)
Depth of inheritance tree (DIT)
Number of children (NOC)
Coupling Between objects (CBO)
Response for class (RFC)
Lack of Cohesion of methods (LCOM)
 Can be obtained straightforward from
CLASS Diagrams
 Cannot be calculated precisely from
CLASS Diagrams. Implementation of the
bodies of the classes is needed.
• Bansiyana and Davi's quality metrics - QMOOD







Average of DIT for all classes in the system (ANA)
Class Interface Size (CIS)
Data Access Metric (DAM)
Direct Class Coupling (DCC)
Measure of aggregation (MOA)
Measure of functionality abstraction (MFA)
Number of methods (NOM - same as WMC)
Quality Prediction Model using UML metrics
[11] of [42]
UML & Design Complexity Metrics
CBO Approximation
• CBO-code:
Num. Classes Couple to a given Class *
•
CBO-UML Approach 1 (UML Collaboration Diagram):
A count of all messages Sent to different objects
•
CBO-UML Approach 2 (UML Collaboration Diagram):
The same as Approach 1, but eliminating those which RETURN a value.
(*) If a method within a class uses a method or instance of a variable of a different class, it is said that
this pair of classes is coupled
Quality Prediction Model using UML metrics
[12] of [42]
UML & Design Complexity Metrics
CBO Approximation
R7: fundsStatus : = CommtiFunds()
aCustomer
Quality Prediction Model using UML metrics
[13] of [42]
UML & Design Complexity Metrics
CBO Evaluation using an e-commerce system (*).
1.2
1
CBO
0.8
CBO-code
0.6
CBO-UML(2)
0.4
CBO-UML(1)
0.2
0
-0.2
0
2
4
6
8
10
12
14
Class number
(*) Described in: Gomaa Hassan, Designing Concurrent, Distributed, and Real-Time Applications with UML, Addison
Wesley-Object Technology Series Editors, July 2000.
Quality Prediction Model using UML metrics
[14] of [42]
UML & Design Complexity Metrics
CBO Evaluation
• For CBO-code and CBO-UML Approach 1
correlation coefficient = 0.81
• For CBO-code and CBO-UML Approach 2
correlation coefficient = 0.89
CBO-UML Approach 2 is slightly more linear to
CBO-code
Quality Prediction Model using UML metrics
[15] of [42]
UML & Design Complexity Metrics
RFC Approximation
• RFC-code:
Num. of Methods of a given class + Num. of methods of other classes directly
called by any of the methods of the given class.
• RFC-UML Approach 1 (UML Collaboration Diagrams):
Messages Received + Messages Sent
• RFC-UML Approach 2 (UML Collaboration & Class
Diagrams):
(Messages Received + Number of attributes*2) + Messages Sent, where:
(Messages Received + Number of attributes*2) ~ Num. of Methods of a given
class. Considering 2 public methods per attribute to get and to set its value.
Quality Prediction Model using UML metrics
[16] of [42]
UML & Design Complexity Metrics
RFC Approximation
class C {
A a;
void m() {
Dd;
d.dosth();
……..
}
void setA (A a) {
this.a = a;
}
A getA() {
return a;
}
}
dosth()
d
c
m()
x
RFC (C) = 3 + 1 = 4
Quality Prediction Model using UML metrics
[17] of [42]
UML & Design Complexity Metrics
RFC Evaluation using the same e-commerce system.
1.2
1
RFC
0.8
RFC-code
0.6
RFC-UML(2)
RFC-UML(1)
0.4
0.2
0
0
2
4
6
8
10
12
14
Class number
Quality Prediction Model using UML metrics
[18] of [42]
UML & Design Complexity Metrics
RFC Evaluation
• For RFC-Code and RFC-UML Approach 1
correlation coefficient = -0.07
• For RFC-Code and RFC-UML Approach 2
correlation coefficient = 0.67
 RFC-UML Approach 2 has a stronger linear relationship
with RFC-Code
Quality Prediction Model using UML metrics
[19] of [42]
UML & Design Complexity Metrics
Remark
If true that our 2nd approach’s assumption might not be
all valid, it still obtained an acceptable performance.
Which might be explained to the fact that private
attributes in a class are moderate correlated to its
number of methods, according to Olague’s research
[2007].
Quality Prediction Model using UML metrics
[20] of [42]
UML & Design Complexity Metrics
Design complexity metrics that can be approximated using UML
diagrams:
 Can be obtained straightforward from
CLASS Diagrams
 Can be approximated by using
•
Chidamber and Kemerer – CK
COLLABORATION Diagrams






•
Weighted Methods per Class (WMC)
Depth of inheritance tree (DIT)
Number of children (NOC)
Coupling Between objects (CBO)
Response for class (RFC)
Lack of Cohesion of methods (LCOM)
 Can not be approximated precisely using
UML Diagrams
Bansiyana and Davi's quality metrics - QMOOD







Average of DIT for all classes in the system (ANA)
Class Interface Size (CIS)
Data Access Metric (DAM)
Direct Class Coupling (DCC)
Measure of aggregation (MOA)
Measure of functionality abstraction (MFA)
Number of methods (NOM - same as WMC)
Quality Prediction Model using UML metrics
[21] of [42]
Prediction Technique
Design Complexity
Metrics (13)
Related existing works
Predict
Fault-prone code
FROM: Code
Approximation
Predict :
 How?
Design Complexity
Metrics (12)
FROM: UML Artifacts
Quality Prediction Model using UML metrics
[22] of [42]
Prediction Technique
Logistic Regression
• Use. When we have one variable (y) with two values
(e.g. faulty /no faulty, 1/0) and one or more
measurement variables (xs).
• Goal. To predict the probability of getting a particular
value of y , given xs variables, through a logit model.
• Key Points. No assumptions on the distribution of
variables are made.
Quality Prediction Model using UML metrics
[23] of [42]
Prediction Technique
Logistic Regression
Quality Prediction Model using UML metrics
[24] of [42]
Prediction Technique
Example. We want to estimate the probability of a class to
be highly FAULTY, in terms of a design complexity
metric: Mx.
Quality Prediction Model using UML metrics
[25] of [42]
Prediction Technique
Faulty:
Design complexity Metric:
CLASS FAULTY
Mx
---------------------------------------1.
1
1
2
1
1
3
1
1
4
1
1
5
1
1
6
1
1
7
1
1
8
1
1
9
1
1
10
1
1
11
1
0
12
1
0
Most Faulty (MF) = 1
Least Faulty (LF) = 2
Mx
CLASS
FAULTY Mx
--------------------------------------------13
2
1
14
2
0
15
2
0
16
2
0
Mx=0 Total
17
2
0 CLASS Mx=1
18
2
0 -------------------------------------------19
2
0 MF=1
10
2
12
20
2
0
1
11
12
21
2
0 LF=2
22
2
0 -------------------------------------------23
2
0 Total
11
13
24
24
2
0
Quality Prediction Model using UML metrics
[26] of [42]
Prediction Technique
CLASS Mx=1 Mx=0 Total
-------------------------------------------MF
10
2
12
LF
1
11
12
-------------------------------------------Total
11
13
24
Probabilities
• The probability of any given CLASS will be MF:
P(MF) = 12 /24 = 0.50
• The probability of any given CLASS will be MF given that
Mx=1:
P(MF|Mx=1) = 10/11= 0.909
• The probability of any given CLASS will be MF given that
Mx=0:
P(MF|Mx=0) = 2/13= 0.154
Quality Prediction Model using UML metrics
[27] of [42]
Prediction Technique
CLASS Mx=1 Mx=0 Total
-------------------------------------------MF
10
2
12
LF
1
11
12
-------------------------------------------Total
11
13
24
Odds
• The odds of a CLASS being MF:
Odds(MF) = 12 /12 = 1
• The odds of a CLASS being MF given that Mx=1 :
Odds(MF| Mx=1) = 10/1= 10 …. (1)
• The odds of a CLASS being MF given that Mx=0 :
Odds(MF| Mx=0) = 2/11= 0.182 … (2)
Quality Prediction Model using UML metrics
[28] of [42]
Prediction Technique
• Odds and Probabilities provide the same information but
in different ways.
• It is easy to convert odds y probabilities and vice-versa,
e.g. :
10
P(MF| Mx=1) = odds (MF| Mx=1) =
1 + odds (MF| Mx=1) 1+10
Odds(MF| Mx=1) =
P (MF| Mx=1)
1 - P (MF| Mx=1)
=
= 0.909
0.909
= 10
1-0.909
Quality Prediction Model using UML metrics
[29] of [42]
Prediction Technique
•
Applying the natural log of (1) and (2) :
ln [ Odds(MF|Mx=1) ] = ln ( 10 ) = 2.303 …………(3)
ln [ Odds(MF|Mx=0) ] = ln (0.182) = -1.704 ………(4)
•
We can generalize (3) and (4) in the following:
ln[ Odds(MF|Mx) ] = A + B*Mx ………..(5)
•
From (3) and (5), when Mx = 1:
ln[ Odds(MF|Mx) ] = A + B = 2.303 ….(6)
•
From (4) and (5), when Mx=0:
ln[ Odds(MF|Mx) ] = A = -1.704 ……..(7)
•
From (6) and (7): A = -1.704 , B = 4.007
•
Finally we can re-write (5) as follows:
ln[ Odds(MF|Mx) ] = -1.704 + 4.007 *Mx
Quality Prediction Model using UML metrics
[30] of [42]
Prediction Technique
ln[ Odds(MF|Mx) ] = -1.704 + 4.007 *Mx
•
If:
Odds(MF|Mx) =
p
;
p = P (MF|Mx)
1-p
• We can re-write our final equations as:
ln [
p
] = -1.704 + 4.007 *Mx
1-p
p = P (MF|Mx) =
1
(1+e-(-1.704+4.007Mx) )
Quality Prediction Model using UML metrics
[31] of [42]
Case study
Design Complexity
Metrics (13)
Related existing works
Predict
Fault-prone code
FROM: Code
Approximation
Design Complexity
Metrics (12)
Predict using:
Logistic Regression
Are the candidate UML metrics good
enough to predict fault-proneness?
FROM: UML Artifacts
Quality Prediction Model using UML metrics
[32] of [42]
Case study
Objective: Estimate the probability of having a faulty class
during the testing phase, using Logistic Regression.
Quality Prediction Model using UML metrics
[33] of [42]
Case study
Description. Using the design and implementation of the
e-commerce system described in Gomaa’s book, this
case study was carried out as follows:
•
•
Collection of UML and Code metrics (Xs)
Collection of data related to the faults of the ecommerce system from the logs of the CVS repository
used (Y)
• Evaluation of the relationship between each metric to
fault-proneness, using Univariate Logistic Models
Quality Prediction Model using UML metrics
[34] of [42]
Case study
Metrics to evaluate. Due to the manner the e-commerce system was
designed and implemented, without inheritance classes:
SUITE
Code Metric
Average Number of Ancestors (ANA)
Level
System
Inheritance
Metric
UML Metric to
evaluate

Yes
Measure of Aggregation (MOA)

Class Interface Size (CIS)*
QMOOD

No
Data Access of Metric (DAM)

Direct Class Coupling (DCC)

Measure of Functional Abstraction (MFA)
QMOOD
CK
CK
Number of Methods (NOM) =
Weighted Methods per class (WMC) *

Yes
Class
No

Depth of Inheritance (DIT)
Yes

Number of Children (NOC)
Yes


Response For Class (RFC)*
No
Coupling Between Objects (CBO)
(*) Were found good predictors of fault-prone code in Olague’s work [2007].
Quality Prediction Model using UML metrics

[35] of [42]
Case study
Estimation of the probability of a class of being faulty, using
CBO-code.
Class Number
Actual (y)
1
2
3
4
5
6
7
8
9
10
11
12
13
No Faulty
No Faulty
Faulty
Faulty
Faulty
Faulty
Faulty
Faulty
Faulty
Faulty
No Faulty
Faulty
No Faulty
PREDICTED
Prob using
CBO-code
0.2
0.2
0.99903
1
1
1
0.2
1
1
1
0.2
1
0.2
Predicted (y )
No Faulty
No Faulty
Faulty
Faulty
Faulty
Faulty
No Faulty
Faulty
Faulty
Faulty
No Faulty
Faulty
No Faulty
p
1
1 e
1.3863
8.3282CBO
Correctness:
12/13 classes
92.3% classes correct classified
Sensitivity:
8/9 faulty classes
88.8% Faulty classes correct
classified
Specificity:
4/4 no-faulty classes
100% No-faulty classes correct
classified
Quality Prediction Model using UML metrics
[36] of [42]
Case study
Results. From the univariate models using each one of the
metrics proposed.
Correctness
[classes]
Sensitivity
[ faulty classes]
Specificity
[no-faulty classes]
CBO-code
92.3 %
88.88%
100%
CBO-UML(1)
69.2%
66.66%
75%
CBO-UML(2)
69.2%
55.55%
100%
RFC-code
84.61%
88.88%
75%
RFC-UML(1)
76.92%
77.77%
75%
RFC-UML(2)
84.61%
88.88%
75%
WMC-code
90.9%
85.7%
100%
WMC-UML
72.7%
71.42%
75%
CIS-code
90.9%
85.7%
100%
CIS-UML
90.9%
100%
75%
DAM-code
36.3%
57.14%
0%
DAM-UML
72.7%
85.7
50%
Metrics
Quality Prediction Model using UML metrics
CIS: Public
Methods in a class
DAM: Ratio of
number of private
and protected
attributes to the
total number of
attributes
DCC measures
were not significant
for this study
[37] of [42]
Case study
Results
• Our second approach to approximate RFC with UML
diagrams performed equally to the RFC metric
measured from code
• UML CIS approximation performed similarly to the CIS
metric measured from the code
• The rest of the UML metrics’ performance was
somewhat acceptable
Quality Prediction Model using UML metrics
[38] of [42]
Case study
Can we apply the obtained models to other case
studies?
System
Metrics
E-commerce
CBO-UML(1)
Banking
E-commerce
RFC-UML(1)
Banking
E-commerce
CBO-UML(2)
Banking
E-commerce
Banking
RFC-UML(2)
Specificity
[no-faulty
classes]
69.2%
Sensitivity
[ faulty
classes]
66.66%
72.7%
100%
50%
76.92%
77.77%
75%
72.7%
100%
50%
69.2%
55.55%
100%
63.6%
80%
50%
84.61%
88.88%
75%
72.7%
80%
66.6%
Correctness
[classes]
Quality Prediction Model using UML metrics
75%
[39] of [42]
Conclusions and Future work
• UML metrics can be acceptable predictors of faultprone code
• UML CIS and UML RFC metrics showed strong
relationship to fault-proneness of code
• We might be able to create a more robust model to
predict fault-prone code before its implementation.
Quality Prediction Model using UML metrics
[40] of [42]
Conclusions and Future work
• Further study and evaluation of other metrics using
other UML artifacts (e.g. sequence diagrams, state
diagrams and description of use cases) is needed.
• Construction of a more robust model using multivariate
logistic regression
•
Evaluation of the final model obtained, using different
study cases
Quality Prediction Model using UML metrics
[41] of [42]
Quality prediction model for
object oriented software
using UML metrics
Camargo Ana Erika
[email protected]
Quality Prediction Model using UML metrics
[42] of [42]