EASEプロジェクトで目指すエンピリカル 環境について

Mega Software Engineering
and EASE Project
International Workshop on
Community-Driven Evolution of Knowledge Artifacts
UC Irvine, Dec. 16-18, 2003
Katsuro Inoue
Osaka University
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Overview
• Proposed a concept of Mega Software
Engineering, which shares experiences and
knowledge in community
• Introduced EASE project based on the concept
of MSE
• Presented the overview of Empirical
Environment and showed current
implementation of Empirical Project Monitor EMP,
as a partial realization of Empirical Environment
• Predicted ongoing directions to deeper analyses
of empirical data
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Empirical Software Engineering
• Various technologies in Software
Engineering based on empirical data
• Essential for scientific improvement of
project processes and products
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
3 Major Phases in Empirical SE
collection
analysis
improvement
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Classification of ESE Technologies by Target Scale
scale
Mega
Software
Engineering
Collection
Analysis
Improvement
Mega Software Engineering MSE
• Targets many projects
• A new concept but not a new technology itself
• Collection of key technologies already existing and
emerging
•
•
•
•
•
Distributed environment and data sharing
Analysis and data mining
Project monitoring and controlling
Scalable computing
...
• Use advances of hardware performance, e.g., network
bandwidth, CPU clock, memory space, disk capacity, ...
– Software engineering technology should share in advances of
hardware, which is mainly used for multimedia, grid, simulator, ...
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Characteristics of MSE
• Experience and knowledge of individual developer or
project are collected, refined as assets, and reused in
community
• Single-level flat static community for information sharing
• Automatic process : Little burden is required for each
developer or manager
• View from the organizational benefits may be directly
obtained (no individual developer’s view or project view)
• Open source development is a simple case of MSE
(MSE focuses analysis and feedback)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
EASE Project
• Empirical Approach to Software
Engineering
• Using the concept of MSE as its basis
• Funded by MEXT (Japanese government,
Ministry of Education, Culture, Sports, Science and Technology)
• 5 year project starting 2003
Senri Lab.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Project Target
Empirical software development
environment from 1 to thousands
of projects
Empirical Environment
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Project Objectives
1. Development of empirical environment
2. Application of empirical environment to
real projects
3. Collection of data and expertise of
empirical SE
4. Organizational benefits by applying
empirical environment
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Concept of Empirical Environment
Analysis
Collection
Internet
Public Domain Software
Open Source Project
Improvement
Related Organization
Software Development Organization
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Implementing
Empirical Environment
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
(1)Policy for Collection
• Goal first (ideal cases)
→ Data collection first (Realistic approach)
• Collect mainly product data(Obtain process data from
product data)
• Minimize developers overhead for collection
• Raw data without human tampering
• Real-time collection
• Applicable to various projects
– Small scale
– Non-water fall process such as XP
– Distributed development including sub-contracting
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
(2)Policy for Analysis
Step-wise implementation
difficult
5. …
4.Reuse comp./ expertise
3.Classification and evolution
2.Inter-project metrics
simple
1.Process / product metrics inside single project
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
(3)Policy for Improvement
• Feedback method for each objective
– Various mechanisms for various cases
Currently construct a browser for visualizing
collected data and measured metrics
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Empirical Project Monitor EPM
• A partial implementation of Empirical
Environment
• Collect, measure, and show various data
for project control
• Data source
– Versioning system CVS
– Mailing list manager Mailman
– Issue tracking tool GNATS
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Architecture of EPM
analysis tools
developer
manager
measurement of intra and inter projects
PostgreSQL(Repository)
Standardized empirical SE data (in XML)
developer
manager
versioning
history
mail
history
problem
history
prediction/ schedule
metrics value
other tool data
etc.
CVS, Mailman, GNATS, (WinCVS, CorporateSource)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Characteristics of EPM
• Use open source development tools
→
Easy to introduce
• Small overhead of data collection
– Most data from versioning history
– Communication through e-mail, and recoding
issues by tracking tool
• Easy to transform other data format to the
standardized empirical SE data format
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Application Area of EPM
• Large project
– Share project status immediately
– Reduce project management load
– Reduce risk for tampering data
• Small project
– Apply with small cost
– Apply to various projects, including XP and
distributed development
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Features of
Empirical Project Monitor
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
EPM Analysis Tool
• Single activity view
– Source code size
– Issue resolution time
– Cumulative number of issue, number of
unsolved problems, ...
• Multiple activity view
– Check-in and check-out
– #Issue and #mail
– check-in and #issue
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Growth of LOC
• Progress
monitoring
• Schedule v.s.
actual
menu
Project: EmpiriPrj
LOC
Cumulative
LOC
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
month
Growth of LOC(3 months)
LOC
Project: EmpiriPrj
LOC
Check-in
occurred
month
Growth of LOC
Open source project nkf (character-code converter)
LOC
LOC
Check-in
occurred
month
Cumulative Issues/Unsolved Issues /Mean Resolution Time
cumulative issues
Project:EmpiriPrj
mean resolution days
cumulative issues
unsolved issues
mean resolution days
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
month
Check-in and Check-out
# check-out
Project:EmpiriPrj
# check-out
Check-in occurred
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
month
CVS Log View
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Growth of Mail and Issues
cumulative # mail
Project:EmpiriPrj
cumulative # mail
check-in occurred
issue raised
issue resolved
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
month
Mail Log View
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Cumulative Issues and Check-in
cumulative # issues
Project:EmpiriPrj
cumulative issues
check-in occurred
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
month
Future of Empirical
Environment
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Extending Analysis Features
• Make deeper analysis and extract
organizational expertise
• Find and reuse expertise easily
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Code clone
detection
Component
search
Metrics
measurement
Product data archive
(CVS format)
Process data archive
(XML format)
Format
Translator
Managers
Corporate
Source
Developers
GUI
Project
categorization
Versioning
(CVS)
Format
Translator
Mailing
(Mailman)
Project x
Project y
Project z
...
Format
Translator
Issue
tracking
(GNATS)
Cooperative
filtering
EPM(developing)
Format
Translator
Other tool
data
Example Scenario (1)
Scheduled progress of project X
1
Actual progress of project X
2
E
W
A
X
Y
P
Find projects similar to X
- Project categorization
- Collaborative filtering
C
T
Q
V
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Example Scenario (2)
3
Average reuse rate
in similar projects
Project X’s reuse rate
- Code-clone detection
4
Promote using software asset
search engine to project X
- Software asset search engine
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Expected Effect
• Productivity can be drastically improved by
reusing organizational assets
• Management of assets can be easily
performed
• Cost control can be precisely made
relative to previous similar projects
• Reliability can be improved using issue
history
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analysis Technology (1)
Fast Code Clone Detection
Code clones = similar portions of program
Linux 2.4.0
NetBSD 1.5
NetBSD 1.5
Linux 2.4.0
FreeBSD 4.0
FreeBSD 4.0
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analysis Technology (2)
System Similarity Using Code-Clone
Detecion
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analysis Technology (3)
Collaborative Filtering
Focused
RepresenOutcome
Q&M
Collaborative
tative
Adopted
Resources
App. A
9
9
9
7
7.5
(target)
App. B
8
7
8
?
(missing)
8
App. C
?
(missing)
8
8
8
7
App. D
7
6
?
(missing)
9
6
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analysis Technology(4)
Java Class Search Engine SPARS-J
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Markov Model
0.02
0.01
0.01
0.05
0.03
0.001
0.1
• Component rank model can be considered as a
Markov Chain of user's focus
• User's focus moves from one component to another
along a use relation at a fixed time duration
• Node weight represents the existence probability of
the user's focus at infinite future
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Demo of SPARS-J
http://demo.spars.info
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Current Status and Schedule
• Current - Demo version of EPM
• First quarter of 2004
a release of EPM
• First quarter of 2005
Application of EPM in industry
• End of 2005
Inclusion of analysis tools
• User group, consortium, interest group, ...
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Summary
• Proposed a concept of Mega Software
Engineering, which shares experiences and
knowledge in community
• Introduced EASE project based on the concept
of MSE
• Presented the overview of Empirical
Environment and showed current
implementation of Empirical Project Monitor EMP,
as a partial realization of Empirical Environment
• Predicted ongoing directions to deeper analyses
of empirical data
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University