Lecture 9- Reuse - Tampereen teknillinen yliopisto

Lecture 9- Reuse
Erno Salminen
TIE-50200 Logiikkasynteesi
Department of Pervasive Computing
Tampere University of Technology
Spring 2014
Erno Salminen - February 2014
Department of Pervasive Computing
Outline
 Intellectual property (IP)
components
 Basics
 Design process
 Technical matters, not legal
 Efficient IP reuse with
Kactus2 tool
#2/45
Erno Salminen - February 2014
Department of Pervasive Computing
Size and architecture affect reuse
 Back in the days, single designer
did everything always from scratch
 Not affordable or possible anymore
More functionality in SW
Buy and sell IP components
i/f
mem
acc A
cpu
cpu
subsys
acc B
Small system
Medium system
Large system
E.g. accelerator, glue logic, peripheral
E.g. CPU (sub-)system
Multi-processor system
Ad-hoc architecture
Bus-based architecture
Tiled NoC architecture
Little reuse
Moderate reuse
Lots of reuse
#3/45
Erno Salminen - February 2014
Department of Pervasive Computing
SoC design size mandates
reuse [ITRS]
ITRS is a consortium of largest semiconductor
companies and reseach institutes, www.itrs.net
Memory size
Tens to hundreds PEs on a single chip.
Logic size
Same PE instantiated many times.
#PEs
Most PEs are reused. Some are perhaps
modified or reconfigured
 increasing parallelism
#4/45
Erno Salminen - February 2014
 Moore’s law still going strong
 mem takes larger proportion of chip
Department of Pervasive Computing
Verification takes >50% of project time
 79% defects found at block-level
[Switzer, DesignCon, 2000]
#5/45
Erno Salminen - February 2014
 Reusing verified
components is much
better than re-inventing
the wheel!
Department of Pervasive Computing
Intellectual Property (IP)
components
Linux
VideoEncoder
 Reusable, pre-designed and verified
components
NiosII
 May be HW or SW or HW+SW
 (especially HW IP) also called macros and
cores
 E.g. microprocessor, memory, HW
accelerator, SW function…
MicroBlaze
AES
encryption
 In-house developed or 3rd party
 From scratch or modify an old component
 Buy or even download for free
PCIexpress
 Simplify system design
Embedded
SRAM
 Faster, better time estimates, less risk
 Hemani’s law: size of reused component
grows 10x per every decade
#6/45
Erno Salminen - February 2014
Embedded
Flash
…
Department of Pervasive Computing
IP (2)
 Solves a general problem, e.g. CPU, MPEG
decoder, PCIe
 Fits many environments
 Perhaps configurable, e.g. data width
 Standard interfaces instead of in-house proprietary
interface
 Handy usage from other HW and SW (clear register
interface)
 Specialized verification IPs available for standard
interfaces (DDR2, USB, I2C…)
 Docs, commented code, verification suites, support
scripts…
 Packaged properly
 Sources, docs, scripts, ports, IP-XACT…
 Executable usage examples
#7/45
Erno Salminen - February 2014
Department of Pervasive Computing
3 IP delivery types
1. Soft core
 Delivered as synthesizable RTL
code or C source
 Integrator can modify by changing
the generics or code
 Fit many environments
2. Firm core
 Somewhere between soft and hard
 E.g. synthesized netlist (.vqm)
3. Hard core
 Fully designed, placed, and routed
 Cannot be modified, technology-
specific
 Delivered as GDSII file (or similar)
or fabricated into FPGA chip
#8/45
Erno Salminen - February 2014
Department of Pervasive Computing
Proof of concept
 English ARM Ltd. is a fabless company
 They do not make or sell silicon chips but they
sell IP
 Customers buy licenses to use the CPUs
 Customer gets the design data of CPU and
integrates that into their own chip
 ARM is also very strict on license misconduct…
 Extremely high volumes!
 10e9 ARM-based chips shipped in 2013
 Approx. 50e9 devices with ARM sold so far
 E.g. nearly all cellphones to date have used
ARM-based processors
#9/45
Erno Salminen - February 2014
Department of Pervasive Computing
IP integration phases
by FPGA vendor
1) Front-end:
Note that
integration phase
used to be textual
2) Back-end:
3) Production:
High-level synthesis (HLL) not applicable
above the IP level. IP reuse proven itself at
system-level.
Fig: [R. Wilson, SOCs: IP is the new abstraction,
Electrical Design News (EDN), Aug. 2011,
http://www.edn.com/electronics-news/4368295/SOCsIP-is-the-new-abstraction]
#10/45
Erno Salminen - February 2014
Department of Pervasive Computing
Verification steps
1. All IPs need thorough functional verification
 Bricaud: ”First big return on investment to the
reusable effort”
 Running all functional tests may takes weeks
 Done by IP designer before integration starts
2. In integration concentrate on communication
between IPs
 ”Test1: Foo sends to Bar which acknowledges”
 ”Test2: Cpu0 can access all memory areas”
3. Then verify different application use cases
 Ensure that the real value of the product works
#11/45
Erno Salminen - February 2014
Department of Pervasive Computing
3rd party IP source examples
http://www.design-reuse.com/
http://opencores.org/
Funbase by
TUT
#12/45
Erno Salminen - February 2014
Department of Pervasive Computing
Tip: Transport Triggered Architecture (TTA)
 Easiest way to design an
accelerator
 Application-specific instruction-set
processor (ASIP)
 Almost as low cyclecount as ASIC
 Still allows programmability, more
ProDE – processor designer
flexible than HW
 Easily configurable
 # of cores (multi-threading)
 # and type of exec. units
 Connections between units
 Many trade-offs between area and
performance
 ”Soft soft core”: synthesizable HDL
+ application C code
 Download from www.tce.cs.tut.fi
Cycle-accurate simulator
Screen caps: tce.cs.tut.fi
#13/45
Erno Salminen - February 2014
Department of Pervasive Computing
IP integration and interfaces
Original IP
Integration operation
Outcome
 IP must be adapted
Attach
IP
block
IP
block
Modify
IP
block
Attach
IP
block
IP
block
Create
wrapper Wrapper
Attach
IP
block
Wrapper
Adapted by A. Rasmus from [F. R. Wagner et al., “Strategies for the
integration of hardware and software IP components in embedded
systems-on-chip”, Integration, the VLSI Journal, September 2004,
Vol. 37, Iss. 4, pp.223-252]
#14/45
Erno Salminen - February 2014
Department of Pervasive Computing
C o m m u n ic a tio n n e tw o rk
unless it natively
Standard
interface
supports the interface
provided by the network
 Soft, white-box, IP
allow direct modification
of source codes
Heterogenous
interface
 Without source codes,
additional wrapper (or
adapter) is needed
Std interface separates computation
from communication
DSP
Processor
External
I/O
More bandwidth Change network
w/o affecting
processing
DSP
Processor
External
I/O
Audio
DSP RAM
Control
Decode
System
RAM
DSP
Processor
External
I/O
Processor
Audio
System
RAM
Decode
More flexibility Keep the network
and change the
processors –
DSP
First generation device
Peripheral
Processor Bus
Peripheral
Control
Peripheral
Processor
Processor Bus
MPEG
DSP RAM
MPEG
DSP RAM
Control
Processor
DSP
#15/45
Erno Salminen - February 2014
System
RAM
Department of Pervasive Computing
IP interface
 What goes in, what comes out, and
when?
 Easy connectivity is critical for efficient
reuse
 Interface definition includes
1.
2.
3.
Ports: names, types, directions
Timing (data must have settled before
write_en is driven high…)
Usually also a memory map of the IP
Interface signals
 More and more IPs use some sort of
standardized bus interface
 Resembles CPU’s memory bus (addr,
data, R/W, wait_req…)
 Simplifies integrating them together
 Avalon, AMBA, HIBI…
Interface timing (at signal-level)
#16/45
Erno Salminen - February 2014
Department of Pervasive Computing
IP interface (2): memory map
 Nowadays most IPs are connected to system-level
interconnect
 Instead of directly together with point-to-point connections
 There are interconnection standards, such as AMBA, OCP-
IP etc.
 IP is seen as set of registers
 Data registers: for feeding in data, for reading the results
 Control/status regs: for configuring and monitoring
 Together they form a memory map
 IP usually has a configurable base address, say
0x1000, and registers are offsets to that
 0x1000, 0x1004, 0x1008 and so on
 IPs become easily pin-compatible and the real
design task is to access the registers correctly
 SW driver functions can abstract the details of register
accesses
#17/45
Erno Salminen - February 2014
Department of Pervasive Computing
IP interface (3): comparison
a)
Special interface with many application-specific signals
 offers potentially the maximum performance
 is hard to understand
 tightly connected to neighbor blocks, hard to use in different
environment
 not accessible from CPU unless a wrapper is developed
b)
Stardadized bus interface with memory-mapped registers
 forces serial accesses (1st write data_in and then control, then




#18/45
read status, and then read results), hence perhaps a little bit
slower to access
easier to understand
loose coupling makes porting into another system easier
matches the how to software views of the system (set of memory
locations)
Next step in abstraction would be to standardize a set of basic
registers
Erno Salminen - February 2014
Department of Pervasive Computing
Verification IP (VIP)
 VIP = Readily available testbench for
verifying certain (standardized)
component during design
 Commonly for communication protocol
 E.g. VIP for UART
 Separate from production IP’s
 Typically consist of bus functional
model, traffic generators, protocol
monitors and functional coverage
blocks
 E.g. bus functional model mimics the interface
but does not compute anything
 Typically high-level so runs fast on
simulator
#19/45
Erno Salminen - February 2014
Department of Pervasive Computing
Harry Forster, Verification Horizon Blog, Mentor Graphics,
[http://blogs.mentor.com/verificationhorizons/blog/2011/03/30/prologue-the-2010-wilson-research-groupfunctional-verification-study/]
#20/45
Erno Salminen - February 2014
Department of Pervasive Computing
Designers are reusing not only logic but also testbenches
Desingers can
buy
testbenches as
well
TB
[http://blogs.mentor.com/verificationhorizons/blog/2011/04/01/part-3-the-2010-wilson-research-group-functional-verification-study/slide21-2/]
#21/45
Erno Salminen - February 2014
Department of Pervasive Computing
Reuse is increasingly critical for success
 Smaller and smaller fraction of logic is new
 Reuse increases all the time
 Earn extra income by selling IP
 Designer productivity must increase
manifold!
 but people are not getting much smarter
 plug-and-play integration
Year
Relative logic size (portable SoC)
Req. % or reused deisng
Req. Productivity for new design
Req. Productivity for reused design
#22/45
Erno Salminen - February 2014
2013
2015
2017
1.00
1.66
2.63
62 %
70 %
78 %
1.00
1.56
2.33
2.00
3.12
4.65
Values normalized to year 2013
2019
4.17
86 %
3.45
6.90
2021
6.53
92 %
5.11
10.22
Department of Pervasive Computing
Potential problems in re-use
 In general:
 Not invented here! -syndrome
 Incomplete design information and docs
 Unreadable, uncommented code
 No supporting scripts, incomplete verification
 Tools not supported anymore or poor inter-operability
 In-house developed components
 Preparing for reuse req. some extra effort
 The full design was never properly archived, so pieces of
the design are scattered over various disks on various
machines, some of which no longer exist
 3rd party IP
 Expensive or complex licenses?
 Vendor-lock (tied to certain vendor or chip family)
#23/45
Erno Salminen - February 2014
Department of Pervasive Computing
Design of re-usable
macro
So how it is done?
Erno Salminen - February 2014
Department of Pervasive Computing
Macro and sub-block design
Term ”decomposition” also
used
[Bricaud]
#25/45
Erno Salminen - February 2014
Department of Pervasive Computing
Macro/sub-block design
 Capture the major requirements and use cases
 Develop the functional and technical specifications
functional specification describes the aspects of the
subblock that are visible to the rest of the macro
technical specification describes the internals of the
subblock
1.
2.

good technical specification allows the designer to code once
and quickly
 Develop RTL
 Design teams work simultaneously on sub-blocks
 Split into reasonably small units
 Only small (1-2 person) design teams needed
 Develop testbenches
 Remember readability and ease of modification
 One TB shows basic operation, the other tries to prove
DUV buggy
#26/45
Erno Salminen - February 2014
Department of Pervasive Computing
Macro/sub-block Design (2)
 Develop synthesis scripts and synthesize
 external timing constraints should be fully defined by the
specification before coding begins
 synthesis scripts must be developed early in the design
process
 synthesis should begin as soon as the RTL code passes
the most basic functional tests (early synthesis give insight
into problem areas)
 Run static language check (also called “lint” tool)
 provides a automatic method for checking the RTL for
violations of coding guidelines and other kinds of errors
 can report cyclomatic complexity, maintenance index etc.
 Measure testbench coverage (96% of lines were
executed...)
 Perform power analysis
#27/45
Erno Salminen - February 2014
Department of Pervasive Computing
Developers spend over 2x more time
reading than writing code!
 Code lives forever”
easily tens years or
more
 Lots of maintenance
time goes into
navigating , i.e. seeking
files, function and
variable declarations
etc.
 Aim for simplicity and
readability
 Clear code is always
better than any
document
#28/45
Interrupts incl. other duties,
chatting with colleagues etc.
Controlled variable in the
study
Erno Salminen - February 2014
[Ko et al., An Exploratory study of How
Developers Seek…, IEEE Tran. SW Eng,
2006]
Variation between
10 monitored
developers
Department of Pervasive Computing
Coding Guidelines
 General recommendations:
 pay attention to names! Perhaps the single most important aspect of
understandability

Be careful with units: pkt_bytes is better than packet_size, time_ms is
better than vague time
 use simple constructs, basic types, and simple clocking schemes
 use a consistent coding style, and a consistent structure for
processes and FSM
 use a regular partitioning scheme, with all module outputs registered
and with the modules roughly of the same size
 use constants or parameters instead of hard-coded numbers, provide
comments
 Don’t sweat on small stuff (e.g. on which line the curly brackets are)
 If automatic checking (lint) is used, there should be
errors/warnings from IP
 Adopt such settings that clean output is achievable – i.e. no
warnings/errors
 In our department, we adhere to
 http://www.cs.tut.fi/~ege/Misc/dcs_vhdl_coding_rules_es_v4_4.pdf
#29/45
Erno Salminen - February 2014
Department of Pervasive Computing
Sub-block integration
 Very important step
 First grading of how
reusable sub-blocks are
 Ease of integration
 Speed of verification
 Otherwise, similar to
sub-block design
 ATPG = automated test
pattern generation for
manufacturing test
 First, just two sub-
blocks, then few more
and so on
#30/45
Erno Salminen - February 2014
[Bricaud]
Department of Pervasive Computing
Productization
 Ensure that macro is
synthesizable with
multiple technologies
 Simulate at gate level
 Verify formally that
netlist from synthesis
is equivalent to RTL
 Document
 Ensure that all
steps are repeatable
#31/45
Erno Salminen - February 2014
Department of Pervasive Computing
Macro Design Archive
 All the items which are needed when any
change, upgrade, or modification is made to
the macro
 “Just the zipped codes” is definitely not enough!
 Revision control system must be used for all
files
 Five main categories
1. Product files
2. Verification files
3. Documentation files
4. System integration files
5. Example use case with tutorial
#32/45
Erno Salminen - February 2014
Department of Pervasive Computing
Macro Design Archive.1-2 Src + verif
 Product files
 synthesizable VHDL+Verilog
 simulation and synthesis scripts and timing
constraints assuming reference library
 installation scripts (e.g. copying files and setting
paths)
 Verification files
 self-checking testbenches with high coverage
 compilation and simulation setup scripts
 sometimes also scripts for creating test data and
checking results
#33/45
Erno Salminen - February 2014
Department of Pervasive Computing
Macro Design Archive.3 Documents
 Datasheet, flyer – basics briefly, only few pages
 functional specification / user guide – what it does, how to use it, how
to connect it
 The key piece of documentation. Even more important than technical
specification
 Consider how many users a macro has and how many develop it further. 10
to 1? 20 to 1?
 technical specification – how it does it
 verification plan – how we decided to verify it
 simulation and coverage logs – how we know it works and how
confidently
 synthesis results for multiple technologies – how small and fast it is
 lint report – how well the code is written
 “Unquestioned assumptions are by far the biggest time-waster for system
debuggers. It’s why the word “arrgh!” was coined.” – R. Colwell
#34/45
Erno Salminen - February 2014
Department of Pervasive Computing
Macro Design Archive.3 user guide

Necessary contents of user guide
1.
2.
3.
4.
5.
6.
7.
8.
9.
functionality – The unit calculates foo() for the input stream…
assumptions – Industrial conditions…maximum clk jitter is < 100 ps…
directory structure – Source codes are in directory vhd… User knows if he has everything
recommended SW environment, including compilers and drivers – It worked at least with these…
config info and parameters – Generics are divided into X categories… Data_width_g sets…
detailed description of I/O – How to connect the signals, what’s their timing, timing diagrams
register map – How SW and other IPs can access the unit, regY is 32-bit read-only…
recommended clocking and reset – Unit uses one clock (rising edge) and active-low reset…
recommended system verification strategy – After integration, write value X to reg Y and see if Z
happens…
architecture, block diagram – There are 4 sub-blocks… Mainly for developers and maintainers
performance – Computation takes approx. 500-520 cycles depending…, the avg. throughput is…
size / gate count – As function of generic parameters
power dissipation – As function of computational load assuming the XXX silicon conditions…
exceptions to coding/design guidelines – This is also mainly for developers tweaking the codes
debug strategy, including recommended debug tools – If you suspect something, please check
that…
test structures, testability, and test coverage – Unit has 2 test inputs… and a full scan chain…
version history and known bugs – Major milestones are… At moment, XX is not supported
because…
10.
11.
12.
13.
14.
15.
16.
17.
Rough order of importance. E.g. interface and performance are more critical criterai in IP selection than block diagram.
#35/45
Erno Salminen - February 2014
Department of Pervasive Computing
Macro Design Archive.4 Integration
4. System integration files
 SW driver for accessing IP from CPU
 test program that reads and writes some registers of the
IP
 bus functional models of other system components
 cycle-based simulator and HW emulator models
 recommendation of commercially available software
required for HW/SW cosimulation and system integration
(as appropriate for the particular macro)
 High-level functional model

#36/45
HDL, Matlab, C/C++, UML
Erno Salminen - February 2014
Department of Pervasive Computing
Macro Design Archive.5 Example(s)
 Easy start is the key
 Otherwise, potential users get frustrated and will not the use the
IP (unless management forces to…)
 Users can learn the basic idea by repeating a well-prepared
simulation setup
 Very simple tecstbench-like example that ”does something
useful”
 Since it runs OK, I must have all the files, paths, necessary





tools…
One can copy-paste code for own projects
Not necessarily full-blown testing here!
No corner-cases, only the most common operation(s)
Not much input data, just enough to get the idea
”Police instructions” with screen captures ensure that everyone
can repeat the simulation easily
 Annotated screen captures show what should happen
during simulation
#37/45
Erno Salminen - February 2014
Department of Pervasive Computing
Reuse must have an easy start
 TB is not that good for introducing basics
 Written application notes are somewhat better
 Ready-made executable examples are excellent
 System does something useful in few minutes
 ”Hello world” is the world’s most important program
 Allow imitation and copy-pasting unlike documents
 Gove feeling e.g. about time usage
#38/45
Erno Salminen - February 2014
Department of Pervasive Computing
Reuse made easy with
Kactus2
Erno Salminen - February 2014
Department of Pervasive Computing
Motivation = there’s lots of
everything
 Product data management (PDM) = track and control
data related to a particular product
 Lots of
 products in a company
 (changing) requirements
 versions, e.g. 50 IP components per SoC, tens of tools, up
to ten OS
 people, e.g. tens to hundreds, sites across the world
 files, e.g. 10k files per SoC, lots of legacy code
 languages, e.g. VHDL, Verilog, C/C++, Python, asm…
v
h
d
c
 Long lifecycles, e.g. offer support for 5-10 years
 Staff will change
 Must use a version management tool (Svn, Git …)
 Kactus2 tool simplifies managament and increases
productivity
#40/45
Erno Salminen - February 2014
Department of Pervasive Computing
Vendor-specific integration tools are
great but…
Altera Qsys
Hard to transfer
design between
different vendors
Xilinx Platform Studio
#41/45
Erno Salminen - February 2014
Department of Pervasive Computing
IP reuse and product data
management
 We must enhance integration
 We must track ”what’s in there”
 E.g. deployed controller in a factory
 IPs, versions, parameters…
 IP core is the most common system-
level abstraction nowadays
 IP-XACT is IEEE standard for
capturing component meta-data
HW design view in Kactus2
tool
http://sourceforge.net/proje
cts/kactus2/
 Simplifies transfer between products,
teams and companies
 Simplifies integration
 Captures interfaces, file sets, hierarchy,
versions, configurations…
 Machine-readable (xml)
#42/45
Erno Salminen - February 2014
Department of Pervasive Computing
IP-XACT
 Rather large standard (374 pages)
 Component, e.g. NiosII
 Files, ports, parameters
 Design, e.g. example_soc
 Instantiates, configures and connects components together
 E.g. Nios, audio codec, vga, PLL…
 Bus interface, e.g. Avalon
 Collection of ports (e.g. data, addr, we, stall…)
 Master and slave roles
 Allow interconnecting and multiple wires with single
connections and type checking
 Vendor-independent, but supports vendor extensions
 Make some things handier… and might cause vendor-lock
accidentally/on purpose
#43/45
Erno Salminen - February 2014
Department of Pervasive Computing
Kactus IP-XACT tool (TUT 2009-)
 Manage IP library
 Create ”electronic data sheets”
 Find out where IP has been used
 Create hierarchical HW design
 Check pin interface compliance
 Draft IP blueprints
 Drag-and-drop integration, user-friendly
GUI
 Document SW structure and mapping
 Generate structural VHDL, docs,
synthesis/simulation scripts, code
templates…
 Vendor-independent
 C++/Qt, open source, GPL2
 http://funbase.cs.tut.fi
#44/45
Erno Salminen - February 2014
Department of Pervasive Computing
Conclusions
 Reuse everything you can
 Generality and reusability degrades component’s
performance a little - Much larger impact in design
effort
 Reusable IPs come in 3 forms: soft, firm and
hard
 Pay attention to interfaces and simple first usage
 IP package includes lots of stuff in addition to codes
 Kactus is an excellent framework for embedded
systems
 Package IP, handle IP libraries, light-weight product
data management, hierarchical HW and SW design,
design automation
More info http://funbase.cs.tut.fi/
http://tce.cs.tut.fi/
#45/45
Erno Salminen - February 2014
Department of Pervasive Computing
Extra material
Erno Salminen - February 2014
Department of Pervasive Computing
Phases of IP creation and reuse
 Several phases precede the actual integration (phase 6):
1. Creation— according to guidelines that simplify reuse
2. Qualification—ensures that IP has qualities expected by
consumers (integrators)
3-4. Classification and search—consumers must find appropriate
IP blocks from on-line catalogs where components are
classified according to adequate criteria
5. Transfer— delivers all needed information for evaluation and
integration, including design data, documentation, test
specification
5B. Evaluation—a more accurate evaluation is usually needed
before integrating it into a design, e.g. by instantiating the
component into a testbench and simulating it
#47/45
Erno Salminen - February 2014
[F. Wagner et al., Strategies for the integration..., VLSI, 2004]
Department of Pervasive Computing
Macro Design Archive.5 Example(s)
 Include readme.txt ASCII file
 Can be opened everywhere
 Purpose of IP, file directories, instructions how to start
 Makefile/compile_all.sh/syn.sh to
compile/synthesize everything
 Ready-made wave form files
 HDL that instantiates the macro
1.
With easy-to-understand components


2.
In realistic environment, e.g. CPU using the
accelerator macro

#48/45
Erno Salminen - February 2014
Minimize the dependencies to external components (CPUs,
memories, etc.)
Minimize the utilized languages
Includes SW code compilation and execution
Department of Pervasive Computing
Header and license information
 All (code) files start with header
 File name, purpose
 Author, date created, date updated
 Actually, all your files should start with
header (text, .tex, pptx, .xlsx, .html etc.)
 All open source files should have license
information
 Otherwise, potential users don’t necesarily dare
tyo use it
 Project root directory should a file COPYING
 For example http://www.gnu.org/licenses/lgpl2.1.txt
#49/45
Erno Salminen - February 2014
Department of Pervasive Computing
Open source license information
 Zillion different licenses available: GPL, LGPL,
MIT, BSD…
 Consider 3 main points
1.
Does code adhere to free/open source SW criteria

2.
(weak) copyleft:


3.

Erno Salminen - February 2014
General method for making a program (or other work) free
(libre), and requiring all modified and extended versions of
the program to be free as well
http://en.wikipedia.org/wiki/Copyleft
non-viral or viral

#50/45
http://www.gnu.org/philosophy/free-sw.html
http://opensource.org/docs/osd
Will derived works inherit the same license, like being
contaminated by virus?
E.g. including GPL code makes the whole project GPL (viral
license) but LGPL does not (non-viral)
Department of Pervasive Computing
License example (LGPL)
























#51/45
-- Funbase IP library Copyright (C) 2011 TUT Department of Computer Systems
--- This file is part of HIBI
--- This source file may be used and distributed without
-- restriction provided that this copyright statement is not
-- removed from the file and that any derivative work contains
-- the original copyright notice and the associated disclaimer.
--- This source file is free software; you can redistribute it
-- and/or modify it under the terms of the GNU Lesser General
-- Public License as published by the Free Software Foundation;
-- either version 2.1 of the License, or (at your option) any
-- later version.
--- This source is distributed in the hope that it will be
-- useful, but WITHOUT ANY WARRANTY; without even the implied
-- warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
-- PURPOSE. See the GNU Lesser General Public License for more
-- details.
--- You should have received a copy of the GNU Lesser General
-- Public License along with this source; if not, download it
-- from http://www.opencores.org/lgpl.shtml
Erno Salminen - February 2014
Department of Pervasive Computing
User guide examples
 Jussi Nieminen, Traffic Generator Usage, TUT,
2010 (+ source codes)
 http://www.tkt.cs.tut.fi/research/nocbench/data/traffic_generator_20091201.zi
p
 Jussi Nieminen, Syntesoituva liikennegeneraattori
piirinsisäisten tiedonsiirtoverkkojen testaukseen ja
suorituskykyvertailuun, kandidaatintyö, Tampereen
teknillinen yliopisto, 2010, 30 s.
 http://www.tkt.cs.tut.fi/kurssit/1570/kandityot/Kandidaatintyo_Jussi_Nieminen.
pdf
 Esko Pekkarinen, Transaction Generator 2 Tutorial,
TUT, 2010
 http://www.tkt.cs.tut.fi/research/nocbench/data/sctg2_tutorial.pdf
#52/45
Erno Salminen - February 2014
Department of Pervasive Computing
Reporting digital systems
 The result (area/power/performance/frequency/combination) is a
sum of several factors
 Comparison not possible unless they are specified
 Process
 line width, supply voltage, worst/nominal/best-case, low power/high
speed process, output load, FPGA type number and LUT count,
 Detail level
 RTL/layout results, VHDL/Verilog/SystemC, memories inluded?, I/O
pads and power supplies included?, memory usage for SW,
 power simulation: supply voltage, RTL/gate-level/transistor sim.,
swithing activity, leakage included?, crosstalk included?, glitches
included?, memories included?
 Design method
 synthesis/full-custom design, tool version,
 System parameters
 data width, buffer size, pin-limited/logic-limited system?, memory
interface
 Others...
[E. Salminen, On Preparing Clear Publications, lecture slides on TKT-2410 Scientific publishing, Tampere,
Finland, Nov. 2005.]
#53/45
Erno Salminen - February 2014
Department of Pervasive Computing
Design Verification Requirements
All values, except for the first
row (productivity) are linearly
increasing or decreasing
trends, where the initial values
in 2006 are derived from
industry survey data, and the
final values are estimated.
[1] This requirement considers
the SOC design productivity
requirement specified in Table
20 – SOC logic Mtx per
designer-year, and divides it by
the percentage of design effort
spent in verification, so that to
obtain the amount of logic
which can be verified in a year.
The percentage of the effort
spent in verification is a linear
trend starting at 70% in 2006
and ending at 50% in 2020.
[2] The reuse data uses the
equation: new verification
infrastructure + reused + 3rd party
verification IP = 100% . The table
reports analytical values for the
new verification infrastructures
and acquired 3rd party IP, while
the estimated size of reused
components can be derived.
[3] That is, number of
assertions or checkers inserted
in a design or referring to an
aspect of the design, for each
corresponding million of
transistors of logic generated
when synthesizing that design.
ITRS 2006 update,
http://www.itrs.net/
Links/2006Update/
FinalToPost/02_D
esign_2006Update
.pdf
#54/45
Erno Salminen - February 2014
Department of Pervasive Computing
Processors core examples, 180nm CMOS
[Salminen et al., Comparison of Hardware IP Components for System-on-Chip, Tampere Soc Symposium, 2004.]
#55/45
Erno Salminen - February 2014
Department of Pervasive Computing
HW Accelerator examples, 180nm CMOS
In this study, the avg size was around 70-80 kilogates
#56/45
Erno Salminen - February 2014
Department of Pervasive Computing
Download and try
http://sourceforge.net/projects/kactus2/
#57/45
Erno Salminen - February 2014
Department of Pervasive Computing
Kactus screenshots: HW design
Detailed settings
of selected
component or whole
design
Library of IP
reusable
components
Bidir
pins
Chip
output
Component
Connection
Graphical a
design area
Help window
#58/45
Erno Salminen - February 2014
Department of Pervasive Computing
IP library
Library + import
Import wizard packages new
IP in few minutes
User selects only
this in simple case
automatic
Filter
Search
Hierarchical
component
Contents of the
top-level VHDL file
Automatically
detected generic
parameters. These
will be saved in IPXACT XML
Desing (prev
slide)
Subcomponent
#59/45
Erno Salminen - February 2014
Automatically
detected ports.
These will be
saved in IP-XACT
XML
Department of Pervasive Computing
Component editor
Utilized fields
in boldface
Component
editor
window
Unique VLNV
identifier
Filled with
See
wizard
(prev
slide)
Symbol
preview
Bus interface groups
many ports togetherd
Allowed but
unutilized
IP-XACT
fields
#60/45
Erno Salminen - February 2014
Department of Pervasive Computing
Component’s filesets
Files divided
into 3
categories
VHDL files
arp3 depends on
udp_ip_pkg
File status OK (found
on disk, not modified)
Top-level
component
Anecdote of the day. The name ”Kactus” is derived from the acronym ”ACT” (architecture configuration tool) to get something more
easily pronouncable.
#61/45
Erno Salminen - February 2014
Department of Pervasive Computing
System design
 Map SW onto
CPUs
 Shows
automatically all
CPUs from all
hierarchy levels
 Nioses and dct
on DE2 board
 uBlaze on Xilinx
board
 x86 on laptop
CPU
SW
component
Appindependent
communicati
on library
 Kactus generates
makefiles for each
CPU according to
mapping
 IP-XACT extension
available only in
Kactus
#62/45
Erno Salminen - February 2014
This CPU
does nothing
in this use
case
Accelerator,
not used in
this use case
Department of Pervasive Computing
Snippet from generated IP-XACT file
<?xml version="1.0“...><!-- Created by Kactus2 ... >
<spirit:component…">
<spirit:vendor> TUT
</spirit:vendor>
<spirit:library> ip.hwp.interface </spirit:library>
<spirit:name>
udp_ip_dm9000a
</spirit:name>
<spirit:version> 1.0
</spirit:version>
<spirit:description>Receives/transmits data from/to…
...
<spirit:busInterface><spirit:name>DM9000A</spirit:name>
<spirit:master/>
...
<spirit:portMap>
<spirit:logicalPort>
<spirit:name>eth_chip_sel_out</spirit:name>
...
<spirit:fileSet><spirit:name>HDLsources</spirit:name>
<spirit:file><spirit:name>../vhd/arp3.vhd
...
#63/45
Erno Salminen - February 2014
Department of Pervasive Computing
Snippet from generated VHDL file
-- File: udp_flood_example_dm9000a.vhd ...
entity udp_flood_example_dm9000a is
port (
-- Interface: clk_in
-- Clock input.
clk_in_CLK : in std_logic; ...
DM9000A_eth_interrupt_in : in std_logic;
...
architecture kactusHierarchical of udp_flood_example_dm9000a is
signal pll_flooderCLK : std_logic;
signal floodertx_udpnew_tx_in : std_logic;
...
pll25 : altera_de2_pll_25
port map (
c0
=> pll_flooderCLK,
inclk0 => clk_in_CLK
);
flooder : simple_udp_flood_example
port map (
clk => pll_flooderCLK,
...
#64/45
Erno Salminen - February 2014
Department of Pervasive Computing
Benefits of Kactus
 Clear view on all products and components




What IP do we have?
Where they been used?
What are their details?
What are the details of the product Gizmo?
 Excellent GUI
 Intuitive and easy to use
 Graphical view shows structure well
 No need to edit XML manually
 Design automation
 Less mistakes
 Easy to try out different variants of the design
 Open source
#65/45
Erno Salminen - February 2014
Department of Pervasive Computing
Custom processors
 ASIP = Application Specific Instruction-set Processor
 Extend CPU with application (domain) specific instructions
 MAC, sum with clipping, DCT etc.
 Extension tightly coupled with CPU pipeline
 Optimize internal communication within CPU
 Remove unnecessary instructions
 Otherwise configure CPU (num of registers, data width...)
 Allow using C/C++ compilation
#66/45
Erno Salminen - February 2014
Department of Pervasive Computing
TTA (2)
 Harvard architecture
 Separate instruction and data
memories
 Supports multiple data
memories
 C compiler and simulator
automatically configured to new
micro-architecture
 Only one instruction: move
 e.g. ”Add r2, r3, r3:
move RF[2] -> ALU.op1
move RF[3] -> ALU.trig
move ALU.result -> RF[3]
 Instruction word has as many
fields as there are internal buses
 Resembles VLIW, Everything
scheduled at compile-time
 Larger code size than RISC
#67/45
Erno Salminen - February 2014
Department of Pervasive Computing
Move instruction is handy
 Instructions control the internal buses, and
operations happen as a side-effect
 Resource sharing for buses
 Move result from FU’s output to next one’s input,
instead of going through register file -> less
registers and ports to register file
 More freedom in code scheduling than traditional
CPUs. Move can happen later (or earlier) if the
result reg (input) reg is not needed for other data
-> less buses needed, supports different pipeline
depths in FUs
 Number of units and buses easily configurable
 Number of inputs and outputs in an FU is easily
configurable (not just 2 inputs and 1 outputs)
#68/45
Erno Salminen - February 2014
Department of Pervasive Computing
TTA performance
 Better area and performance than general
purpose RISC
 Special function unit (SFU)
 Designed and added manually
 Arbitrary latency and num of operands (thanks to
transport-triggered scheme)
 Decreases ex.time but increases area
 For certain algorithms, same cycle counts as
ASIC may achieved
 ASIC has larger operating frequency, though
 Currently, TTA+tools developed at TUT
 Download: http://tce.cs.tut.fi/
 Used in course TKT-3526 Processor design
 Interested students may do project work on TTA
#69/45
Erno Salminen - February 2014
Department of Pervasive Computing
Area vs. runtime trade-off
 TTA’s cycle count is smaller than RISC, close to ASIC
 ASIC has highest frequency
 TTA’s area between ASIC and RISC
 Specialized SFU is very useful
(memory excluded)
(memory excluded)
RC4 exploration
#70/45
[P. Hämäläinen, Euromicro DSD, 2005]
Erno Salminen - February 2014
Department of Pervasive Computing