On some Fault Prediction Issues

On Some Fault Prediction Issues
CAMARGO CRUZ Ana Erika
and
IIDA Hajimu
奈良先端科学技術大学院大学
情報科学研究科
ソフトウェア設計学研究室
1
Introduction
NATO Software Engineering Conference (1968), Mr.
Nash from the IBM UK Laboratories suggested :
 Test Planning and Status Control as valuable tools in
managing the later part of a software development
cycle.
Various research areas for Test Planning, one:
 Fault Prediction
2
Fault Prediction
 Goal:
 More effective
criteria for the elaboration of test cases.
 Predict faulty components
 State of the art:
 A large number
of prediction models have been
proposed in the last decades, but hardly put into
practice.
 Projects: mostly on open source software.
 Predictors: mainly product metrics.
3
Fault Prediction
Akiyama et al. [1971]
LOC
Design Complexity
Metrics
Basili et al. [1996]
(Coupling , Inheritance, etc.)
Complexity
Metrics
Halstead et al. [1975]
Process, History, Repository
Metrics
(Number of Commits, LOC modified, Past
faults, Times a file is refactored, etc.)
E. Arisholm et al. [2007]
Moser et al. [2008]
S. Shivaji et al. [2009]
Y. Kamei et al. [2010]
Logically, components with the greatest number of LOC,
• Most complex
• Most frequently changed
Latest Literature Review[1]:Overall, LOC is useful for fault
prediction.
Are we are reinventing the wheel?
[1]Hall, T., Beecham, S., Bowes, D., Gray, D. and Counsell, S.: A Systematic Literature Review on Fault Prediction Performance in 4
Softw. Eng., Softw. Eng., IEEE Trans. on, Vol. 38, No.6, pp.1276–1304 (2012)
Research Questions

RQ1: Is there exist multicollinearity?


If our supposition is right, these metrics would be highly intercorrelated.
RQ2: How much accurate is a model using inter-correlated metrics
than single metric models?
Metric1
Metric2
MetricN
Metric_N

Faulty
Faulty
Prediction
Model
A
Metric1
No
Faulty
Prediction
Model
B
No
Faulty
RQ3: How much does their usage worth?

If Model A is more accurate than B, does its prediction accuracy worth
the effort invested to construct it?
5
The Multicolinearity Problem

Most common prediction techniques used for fault
prediction (Naive Bayes and Logistic Regression) :


assume the predictor variables to be interdependent of
each other.
For example, suppose the following correlation
matrix:
Number of
Commits
LOC
Complexity
Number of Commits
1
LOC
0.9
1
Complexity
0.8
0.7
1
Number of Faults
0.7
0.9
0.5
Number
of Faults
1
6
The Multicolinearity Problem
r
2
Shared Variance:
Variation amount of two
variables that tend to vary
together
rLOC,Faults  0.9 LOC
r
2
LOC,Faults
 0.81
Faults  a1LOC  a2 NCommits b
NCommits
R
LOC
2
Faults
(Ideally)
Faults
rNCommits,Faults  0.7 NCommits
r 2 NCommits,Faults  0.49
Faults
NComiits
LOC
Faults
Multiple Correlation
Coefficient
A measure of the fit of a
multiple linear regression
model → [0,1]
NCommits→ poor
significance
rNCommits, LOC  0.9
r 2 NCommits,LOC  0.81
R2 Faults( NCommits,LOC) ~ r 2 LOC,Faults  0.81
7
A Rapid Literature Review[1]
 From 208 research papers, only 36 passed
their review.

They reported sufficient contextual and
methodological information
 We reviewed 13/36:
(RQ1)Finding 1. Report muticollineraity among their
used predictors, but no details are provided.
 Correlation Matrix among predictors is missing.
 Principal Component Analysis to alleviate the problem
no major details are given.
[1]Hall, T., Beecham, S., Bowes, D., Gray, D. and Counsell, S.: A Systematic Literature Review on Fault Prediction Performance in
Softw. Eng., Softw. Eng., IEEE Trans. on, Vol. 38, No.6, pp.1276–1304 (2012)
8
A Rapid Literature Review
(RQ2) Finding 2. Do not report their prediction results
on single metrics, but on sets of metrics.
 We cannot know how much more accurate is a model using
a single metric than another using multiple metrics.
 Exceptions:
- LOC useful predictor with stable performance[2,3].
- LOC-only model was suggested as viable alternative to
more complex models[4].
- One study exploring design complexity metrics
(Chidamber and Kemerer) found that:
» Using a single predictor model yielded better
prediction accuracy that using multiple metrics[5].
[2] M. D’Ambros, M. Lanza, and R. Robbes, “An extensive comparison of bug prediction approaches,” MSR 2010,. 7th IEEE Working Conference on, 2010, pp. 31–41.
[3] Z. Hongyu, “An investigation of the relationships between lines of code and defects,” ICSM 2009. IEEE International Conference on, 2009, pp. 274–283.
[4]R. Bell, T. Ostrand, and E. Weyuker, “Looking for bugs in all the right places,” in Procs of the 2006 international symposium on Software testing and analysis. ACM, 2006
[5] T. Gyimothy, R. Ferenc, and I. Siket, “Empirical validation of object-oriented metrics on open source software for fault prediction,” Software Engineering, IEEE
9
Transactions on, vol. 31, no. 10, pp. 897 – 910, oct. 2005.
A Rapid Literature Review
(RQ3)Finding 3. The gain of
using multiple metrics is not
assessed either.
An exception is the work of E.
Arisholm et al.[7,8], that
proposes a measures of costeffectiveness.
 However, their results are on
set of metrics.

 Size
If the only thing a prediction model does
is to model the fact that the number of
faults of a class is proportional to its size,
there would be likely no much gain from
such a model.
[7] E. Arisholm, L. C. Briand, and M. Fuglerud, “Data mining techniques for building fault-proneness models in telecom java software,” in Software
Reliability, ’07. The 18th IEEE International Symposium on, nov. 2007, pp. 215 –224.
[8] E. Arisholm, L. C. Briand, and E. B. Johannessen, “A systematic and comprehensive investigation of methods to build and evaluate fault prediction models,”10
Journal of Systems and Software, vol. 83, no. 1, pp. 2–17, 2010.
For Discussion
Course of Action
 Study other metrics. Which? Where from?
Publically available data only from open source
software projects.
 Metrics mined from these projects (product,
repository) may be telling us the same.
 Metrics on Experience, Team Communication, where
from?

 Include results and analysis:
on single predictors models as opposed to multiple
predictors models.
 cost effectiveness measures.

11
Conclusions
 Although multicollinearity
among predictors
of faulty code is reported:
 Little is known about the gain of using
multiple
predictors as opposed to single predictor models

We think that researchers have exhausted the
exploration of metrics from open source
software
Other factors which may be related to faulty code
are difficult to study due to the lack of publicly
available data.
12