Document

Hardware/Software Co-Design Final
Project
Emulation on Distributed Simulation
Co-Verification System
陳少傑 教授
R91921081 黃鼎鈞
R91943004 尤建智
R91921089 林語亭
1
Agenda







1. Introduction of verification
2. Simulation / Emulation
3. Principle of co-verification
4. System Architecture
5. Experiment Result
6. Conclusion
7. Reference
2
Introduction of verification







Check if a design correctly implements specified
behavior (usually done before manufacture)
Classes
Logic design verification
simulation
emulation
formal verification
Physical design verification
3
Challenge in SOC Era


The complexity and gate count sky-rocket base on
Moore Law
The chip includes multi-modules( IP ), and mixed signal
blocks
4
Design and Verification Process
Design: writing design
specification and start
design cycle
Verify:
Verify the
Correctness of
design
Implement:
Implement and refine
the design through all
phases
5
The Verification Bottleneck
Verification problem grows even faster
due to the combination of increased gate
count and increased vector count
6
Approaches to Design Verification
Software Simulation
─ traditional software-based simulation
Hardware Accelerated Simulation
─ use special purpose hardware to accelerate
simulation of circuit
Emulation
─ Emulation actual circuit behavior
Rapid Prototyping
─ Create a prototype of actual hardware
Formal Verification
─ formal method
7
Simulation / Emulation Verification

Software Simulation: With very high flexibility
high extension and more cheaper than emulation
----Verilog, VHDL, C/C++, mixed language

Hardware Emulation: With very high speed for
processing time
-----PFGA, special hardware
8
Industrial Verification Issues
Intel: Processor project verification:
“Billions of generated vectors”
“Our VHDL regression tests take 27 days to run. ”
Sun: Sparc project verification:
Test suite ~1500 tests > 1 billion random simulation cycles
“A server ranch ~1200 SPARC CPUs”
Bull: Simulation including PwrPC 604
“Our simulations run at between 1-20 CPS.”
“We need 100-1000 cps.”
Cyrix : An x86 related project
“We need 50x Chronologic performance today.”
“170 CPUs running simulations continuously”
Kodak: “hundreds of 3-4 hour RTL functional simulations”
Xerox: “Simulation runtime occupies ~3 weeks of a design cycle”
Ross: 125 Million Vector Regression tests
9
Software Verification Mechanism
Simulation Engine
Monitor or
Rule Check
Library
Test Patterns
Design Under Test
Specific outputs
10
System/Abstract level simulation



Easily debug and diagnosis
Reduce simulation time
1. Saving data structure transfer time
2. Native code predominance
Much more memory function
HDL
HDL
HDL
Simulation
Engine
HDL
Perl
HDL
C/C++
11
Emulation System


Advantages:
+ easiest to implement
(involves little change to the
simulation environment)
+ 10X to 100X faster than
traditional simulation
Disadvantages:
--All module must be
synthesized
--Difficult to handle
verification scripts or
mathematical formulas
--Can’t probe any signal we
want (only on input/output)
12
SW/HW Co-Verification
Design
Transactionlevel HDL or
C/C++ test
bench
Software
simulator
Test Bench
DUT
transactor
High-level
protocol for
Communication
via network or
system bus
Synthesizable
DUT and
transactor
Dedicated
hardware
13
Principle of co-verification

How to design an hardware / software coverification system ?
----The key issue is PARTITION
14
Partition constraint on hardware part






Maximum gate-count of FPGA or emulator
Maximum number of input and output ports
Maximum number of registers in FPGA or
emulator
Gate-count balance among emulators
Delay for critical path in emulator
Monitored signal is suitable in hardware
15
Partition constraint on software part



Communication overhead among simulators and
emulators
Monitored signal is suitable in hardware
Tight clock policy or loose clock policy (multi-clock
system)
ALL of these are test patterns related factors
16
Partition process flow
--Dynamic process
Hardware constraint
Software constraint
Emulator
HDL file
Partition Engine
Simulator
Test Patterns
17
Incentives of the Project


Provide earlier verification in IC design process
Co-verification among different level description
Physical
 Register Transfer Level
 Behavior


Accelerates verification
18
Our Goal
Simulate 1
Verilog
VHDL
C
Simulate 2
Partition
Manual/Automatic
Co-verification
Emulate 1
Emulate 2
19
System Architecture (I)

Distributed Simulation
Master
Child I
TCP/IP port
Child III
TCP/IP port
TCP/IP port
communication
Child II
20
Features of the Simulator

A master process


Several child processes


must be setup to manipulate communication
Each corresponds to one part
Communication

TCP/IP ports
21
Potential Difficulties (I)

Distributed Simulation

Synchronization
Different simulating speeds among parts
 The faster have to wait


Data communication

Communication overhead


Partition
Clocks are the bottleneck

Duplicate global clocks within each parts speedup simulation
22
System Architecture (II)

Emulation
TCP/IP port
Master
Simulation
Child I
Simulation
TCP/IP port
Child III
Simulation
TCP/IP port
Child II
FPGA
Emulation
23
Features of the Emulator

Must be child processes



Each corresponds to one part synthesized as EDIF
Under the control of a corresponding child
simulation
Communication

Through IDE to its corresponding simulation
process
24
Potential Difficulties (II)

Emulation

Synchronization

Among FPGA’s and Simulator


Among different FPGA’s


Simulator synchronize the verification progression
Data communication


Clock Signals must be handled by Simulator
Must be manipulated by the simulator
Multiple clocks

Handled by Simulator
25
Potential Difficulties (III)

Design Partition

Manual / Automatic
Emulation parts must be synthesizable
 Hardware constrains


Communication overhead
Among different emulation parts
 Among different simulation parts

26
System Limitation

Emulation
Clocks are handled by Simulator, emulation can
progress one clock cycle at each call.
 FPGAs works interruptedly instead at their full
speeds.
 Partition among emulation parts may dominate
communication overhead.

27
Experiment Results






RTL module : Jazz2020 (DSP core)
Gate Count : 0.5M (estimated)
Number of test patterns : 374 (with verification
function)
Purely software simulation : 183 sec
Co-simulation (with Xilinx Vertex 400E) : 94 sec
Speed up : 2x (almost) Not fast as we expect
28
Future work




We will separate RTL code into
nonsynthesizable part and synthesizable part
Nonsynthesizable Part : Convert to C code
(compiled code type) run under embedded CPU
on FPGA chip
Synthesizable Part : put into FPGA block
Goal : All process will be done only on one
FPGA chip
29
Future Work
Original
RTL code
non-synthesizable
C Code
Embedded CPU Compiled
compiler
code
FPGA main board
Embedded
CPU
Partition
Engine
RTL
Gate Level FPGA
synthesizer
code
netlist
FPGA
Block
30
Conclusion


Simulation is and will be the most popular
verification method.
Emulation will standout as an accelerator under
heavy simulation load.
31