Lecture 9- Reuse Erno Salminen TIE-50200 Logiikkasynteesi Department of Pervasive Computing Tampere University of Technology Spring 2014 Erno Salminen - February 2014 Department of Pervasive Computing Outline Intellectual property (IP) components Basics Design process Technical matters, not legal Efficient IP reuse with Kactus2 tool #2/45 Erno Salminen - February 2014 Department of Pervasive Computing Size and architecture affect reuse Back in the days, single designer did everything always from scratch Not affordable or possible anymore More functionality in SW Buy and sell IP components i/f mem acc A cpu cpu subsys acc B Small system Medium system Large system E.g. accelerator, glue logic, peripheral E.g. CPU (sub-)system Multi-processor system Ad-hoc architecture Bus-based architecture Tiled NoC architecture Little reuse Moderate reuse Lots of reuse #3/45 Erno Salminen - February 2014 Department of Pervasive Computing SoC design size mandates reuse [ITRS] ITRS is a consortium of largest semiconductor companies and reseach institutes, www.itrs.net Memory size Tens to hundreds PEs on a single chip. Logic size Same PE instantiated many times. #PEs Most PEs are reused. Some are perhaps modified or reconfigured increasing parallelism #4/45 Erno Salminen - February 2014 Moore’s law still going strong mem takes larger proportion of chip Department of Pervasive Computing Verification takes >50% of project time 79% defects found at block-level [Switzer, DesignCon, 2000] #5/45 Erno Salminen - February 2014 Reusing verified components is much better than re-inventing the wheel! Department of Pervasive Computing Intellectual Property (IP) components Linux VideoEncoder Reusable, pre-designed and verified components NiosII May be HW or SW or HW+SW (especially HW IP) also called macros and cores E.g. microprocessor, memory, HW accelerator, SW function… MicroBlaze AES encryption In-house developed or 3rd party From scratch or modify an old component Buy or even download for free PCIexpress Simplify system design Embedded SRAM Faster, better time estimates, less risk Hemani’s law: size of reused component grows 10x per every decade #6/45 Erno Salminen - February 2014 Embedded Flash … Department of Pervasive Computing IP (2) Solves a general problem, e.g. CPU, MPEG decoder, PCIe Fits many environments Perhaps configurable, e.g. data width Standard interfaces instead of in-house proprietary interface Handy usage from other HW and SW (clear register interface) Specialized verification IPs available for standard interfaces (DDR2, USB, I2C…) Docs, commented code, verification suites, support scripts… Packaged properly Sources, docs, scripts, ports, IP-XACT… Executable usage examples #7/45 Erno Salminen - February 2014 Department of Pervasive Computing 3 IP delivery types 1. Soft core Delivered as synthesizable RTL code or C source Integrator can modify by changing the generics or code Fit many environments 2. Firm core Somewhere between soft and hard E.g. synthesized netlist (.vqm) 3. Hard core Fully designed, placed, and routed Cannot be modified, technology- specific Delivered as GDSII file (or similar) or fabricated into FPGA chip #8/45 Erno Salminen - February 2014 Department of Pervasive Computing Proof of concept English ARM Ltd. is a fabless company They do not make or sell silicon chips but they sell IP Customers buy licenses to use the CPUs Customer gets the design data of CPU and integrates that into their own chip ARM is also very strict on license misconduct… Extremely high volumes! 10e9 ARM-based chips shipped in 2013 Approx. 50e9 devices with ARM sold so far E.g. nearly all cellphones to date have used ARM-based processors #9/45 Erno Salminen - February 2014 Department of Pervasive Computing IP integration phases by FPGA vendor 1) Front-end: Note that integration phase used to be textual 2) Back-end: 3) Production: High-level synthesis (HLL) not applicable above the IP level. IP reuse proven itself at system-level. Fig: [R. Wilson, SOCs: IP is the new abstraction, Electrical Design News (EDN), Aug. 2011, http://www.edn.com/electronics-news/4368295/SOCsIP-is-the-new-abstraction] #10/45 Erno Salminen - February 2014 Department of Pervasive Computing Verification steps 1. All IPs need thorough functional verification Bricaud: ”First big return on investment to the reusable effort” Running all functional tests may takes weeks Done by IP designer before integration starts 2. In integration concentrate on communication between IPs ”Test1: Foo sends to Bar which acknowledges” ”Test2: Cpu0 can access all memory areas” 3. Then verify different application use cases Ensure that the real value of the product works #11/45 Erno Salminen - February 2014 Department of Pervasive Computing 3rd party IP source examples http://www.design-reuse.com/ http://opencores.org/ Funbase by TUT #12/45 Erno Salminen - February 2014 Department of Pervasive Computing Tip: Transport Triggered Architecture (TTA) Easiest way to design an accelerator Application-specific instruction-set processor (ASIP) Almost as low cyclecount as ASIC Still allows programmability, more ProDE – processor designer flexible than HW Easily configurable # of cores (multi-threading) # and type of exec. units Connections between units Many trade-offs between area and performance ”Soft soft core”: synthesizable HDL + application C code Download from www.tce.cs.tut.fi Cycle-accurate simulator Screen caps: tce.cs.tut.fi #13/45 Erno Salminen - February 2014 Department of Pervasive Computing IP integration and interfaces Original IP Integration operation Outcome IP must be adapted Attach IP block IP block Modify IP block Attach IP block IP block Create wrapper Wrapper Attach IP block Wrapper Adapted by A. Rasmus from [F. R. Wagner et al., “Strategies for the integration of hardware and software IP components in embedded systems-on-chip”, Integration, the VLSI Journal, September 2004, Vol. 37, Iss. 4, pp.223-252] #14/45 Erno Salminen - February 2014 Department of Pervasive Computing C o m m u n ic a tio n n e tw o rk unless it natively Standard interface supports the interface provided by the network Soft, white-box, IP allow direct modification of source codes Heterogenous interface Without source codes, additional wrapper (or adapter) is needed Std interface separates computation from communication DSP Processor External I/O More bandwidth Change network w/o affecting processing DSP Processor External I/O Audio DSP RAM Control Decode System RAM DSP Processor External I/O Processor Audio System RAM Decode More flexibility Keep the network and change the processors – DSP First generation device Peripheral Processor Bus Peripheral Control Peripheral Processor Processor Bus MPEG DSP RAM MPEG DSP RAM Control Processor DSP #15/45 Erno Salminen - February 2014 System RAM Department of Pervasive Computing IP interface What goes in, what comes out, and when? Easy connectivity is critical for efficient reuse Interface definition includes 1. 2. 3. Ports: names, types, directions Timing (data must have settled before write_en is driven high…) Usually also a memory map of the IP Interface signals More and more IPs use some sort of standardized bus interface Resembles CPU’s memory bus (addr, data, R/W, wait_req…) Simplifies integrating them together Avalon, AMBA, HIBI… Interface timing (at signal-level) #16/45 Erno Salminen - February 2014 Department of Pervasive Computing IP interface (2): memory map Nowadays most IPs are connected to system-level interconnect Instead of directly together with point-to-point connections There are interconnection standards, such as AMBA, OCP- IP etc. IP is seen as set of registers Data registers: for feeding in data, for reading the results Control/status regs: for configuring and monitoring Together they form a memory map IP usually has a configurable base address, say 0x1000, and registers are offsets to that 0x1000, 0x1004, 0x1008 and so on IPs become easily pin-compatible and the real design task is to access the registers correctly SW driver functions can abstract the details of register accesses #17/45 Erno Salminen - February 2014 Department of Pervasive Computing IP interface (3): comparison a) Special interface with many application-specific signals offers potentially the maximum performance is hard to understand tightly connected to neighbor blocks, hard to use in different environment not accessible from CPU unless a wrapper is developed b) Stardadized bus interface with memory-mapped registers forces serial accesses (1st write data_in and then control, then #18/45 read status, and then read results), hence perhaps a little bit slower to access easier to understand loose coupling makes porting into another system easier matches the how to software views of the system (set of memory locations) Next step in abstraction would be to standardize a set of basic registers Erno Salminen - February 2014 Department of Pervasive Computing Verification IP (VIP) VIP = Readily available testbench for verifying certain (standardized) component during design Commonly for communication protocol E.g. VIP for UART Separate from production IP’s Typically consist of bus functional model, traffic generators, protocol monitors and functional coverage blocks E.g. bus functional model mimics the interface but does not compute anything Typically high-level so runs fast on simulator #19/45 Erno Salminen - February 2014 Department of Pervasive Computing Harry Forster, Verification Horizon Blog, Mentor Graphics, [http://blogs.mentor.com/verificationhorizons/blog/2011/03/30/prologue-the-2010-wilson-research-groupfunctional-verification-study/] #20/45 Erno Salminen - February 2014 Department of Pervasive Computing Designers are reusing not only logic but also testbenches Desingers can buy testbenches as well TB [http://blogs.mentor.com/verificationhorizons/blog/2011/04/01/part-3-the-2010-wilson-research-group-functional-verification-study/slide21-2/] #21/45 Erno Salminen - February 2014 Department of Pervasive Computing Reuse is increasingly critical for success Smaller and smaller fraction of logic is new Reuse increases all the time Earn extra income by selling IP Designer productivity must increase manifold! but people are not getting much smarter plug-and-play integration Year Relative logic size (portable SoC) Req. % or reused deisng Req. Productivity for new design Req. Productivity for reused design #22/45 Erno Salminen - February 2014 2013 2015 2017 1.00 1.66 2.63 62 % 70 % 78 % 1.00 1.56 2.33 2.00 3.12 4.65 Values normalized to year 2013 2019 4.17 86 % 3.45 6.90 2021 6.53 92 % 5.11 10.22 Department of Pervasive Computing Potential problems in re-use In general: Not invented here! -syndrome Incomplete design information and docs Unreadable, uncommented code No supporting scripts, incomplete verification Tools not supported anymore or poor inter-operability In-house developed components Preparing for reuse req. some extra effort The full design was never properly archived, so pieces of the design are scattered over various disks on various machines, some of which no longer exist 3rd party IP Expensive or complex licenses? Vendor-lock (tied to certain vendor or chip family) #23/45 Erno Salminen - February 2014 Department of Pervasive Computing Design of re-usable macro So how it is done? Erno Salminen - February 2014 Department of Pervasive Computing Macro and sub-block design Term ”decomposition” also used [Bricaud] #25/45 Erno Salminen - February 2014 Department of Pervasive Computing Macro/sub-block design Capture the major requirements and use cases Develop the functional and technical specifications functional specification describes the aspects of the subblock that are visible to the rest of the macro technical specification describes the internals of the subblock 1. 2. good technical specification allows the designer to code once and quickly Develop RTL Design teams work simultaneously on sub-blocks Split into reasonably small units Only small (1-2 person) design teams needed Develop testbenches Remember readability and ease of modification One TB shows basic operation, the other tries to prove DUV buggy #26/45 Erno Salminen - February 2014 Department of Pervasive Computing Macro/sub-block Design (2) Develop synthesis scripts and synthesize external timing constraints should be fully defined by the specification before coding begins synthesis scripts must be developed early in the design process synthesis should begin as soon as the RTL code passes the most basic functional tests (early synthesis give insight into problem areas) Run static language check (also called “lint” tool) provides a automatic method for checking the RTL for violations of coding guidelines and other kinds of errors can report cyclomatic complexity, maintenance index etc. Measure testbench coverage (96% of lines were executed...) Perform power analysis #27/45 Erno Salminen - February 2014 Department of Pervasive Computing Developers spend over 2x more time reading than writing code! Code lives forever” easily tens years or more Lots of maintenance time goes into navigating , i.e. seeking files, function and variable declarations etc. Aim for simplicity and readability Clear code is always better than any document #28/45 Interrupts incl. other duties, chatting with colleagues etc. Controlled variable in the study Erno Salminen - February 2014 [Ko et al., An Exploratory study of How Developers Seek…, IEEE Tran. SW Eng, 2006] Variation between 10 monitored developers Department of Pervasive Computing Coding Guidelines General recommendations: pay attention to names! Perhaps the single most important aspect of understandability Be careful with units: pkt_bytes is better than packet_size, time_ms is better than vague time use simple constructs, basic types, and simple clocking schemes use a consistent coding style, and a consistent structure for processes and FSM use a regular partitioning scheme, with all module outputs registered and with the modules roughly of the same size use constants or parameters instead of hard-coded numbers, provide comments Don’t sweat on small stuff (e.g. on which line the curly brackets are) If automatic checking (lint) is used, there should be errors/warnings from IP Adopt such settings that clean output is achievable – i.e. no warnings/errors In our department, we adhere to http://www.cs.tut.fi/~ege/Misc/dcs_vhdl_coding_rules_es_v4_4.pdf #29/45 Erno Salminen - February 2014 Department of Pervasive Computing Sub-block integration Very important step First grading of how reusable sub-blocks are Ease of integration Speed of verification Otherwise, similar to sub-block design ATPG = automated test pattern generation for manufacturing test First, just two sub- blocks, then few more and so on #30/45 Erno Salminen - February 2014 [Bricaud] Department of Pervasive Computing Productization Ensure that macro is synthesizable with multiple technologies Simulate at gate level Verify formally that netlist from synthesis is equivalent to RTL Document Ensure that all steps are repeatable #31/45 Erno Salminen - February 2014 Department of Pervasive Computing Macro Design Archive All the items which are needed when any change, upgrade, or modification is made to the macro “Just the zipped codes” is definitely not enough! Revision control system must be used for all files Five main categories 1. Product files 2. Verification files 3. Documentation files 4. System integration files 5. Example use case with tutorial #32/45 Erno Salminen - February 2014 Department of Pervasive Computing Macro Design Archive.1-2 Src + verif Product files synthesizable VHDL+Verilog simulation and synthesis scripts and timing constraints assuming reference library installation scripts (e.g. copying files and setting paths) Verification files self-checking testbenches with high coverage compilation and simulation setup scripts sometimes also scripts for creating test data and checking results #33/45 Erno Salminen - February 2014 Department of Pervasive Computing Macro Design Archive.3 Documents Datasheet, flyer – basics briefly, only few pages functional specification / user guide – what it does, how to use it, how to connect it The key piece of documentation. Even more important than technical specification Consider how many users a macro has and how many develop it further. 10 to 1? 20 to 1? technical specification – how it does it verification plan – how we decided to verify it simulation and coverage logs – how we know it works and how confidently synthesis results for multiple technologies – how small and fast it is lint report – how well the code is written “Unquestioned assumptions are by far the biggest time-waster for system debuggers. It’s why the word “arrgh!” was coined.” – R. Colwell #34/45 Erno Salminen - February 2014 Department of Pervasive Computing Macro Design Archive.3 user guide Necessary contents of user guide 1. 2. 3. 4. 5. 6. 7. 8. 9. functionality – The unit calculates foo() for the input stream… assumptions – Industrial conditions…maximum clk jitter is < 100 ps… directory structure – Source codes are in directory vhd… User knows if he has everything recommended SW environment, including compilers and drivers – It worked at least with these… config info and parameters – Generics are divided into X categories… Data_width_g sets… detailed description of I/O – How to connect the signals, what’s their timing, timing diagrams register map – How SW and other IPs can access the unit, regY is 32-bit read-only… recommended clocking and reset – Unit uses one clock (rising edge) and active-low reset… recommended system verification strategy – After integration, write value X to reg Y and see if Z happens… architecture, block diagram – There are 4 sub-blocks… Mainly for developers and maintainers performance – Computation takes approx. 500-520 cycles depending…, the avg. throughput is… size / gate count – As function of generic parameters power dissipation – As function of computational load assuming the XXX silicon conditions… exceptions to coding/design guidelines – This is also mainly for developers tweaking the codes debug strategy, including recommended debug tools – If you suspect something, please check that… test structures, testability, and test coverage – Unit has 2 test inputs… and a full scan chain… version history and known bugs – Major milestones are… At moment, XX is not supported because… 10. 11. 12. 13. 14. 15. 16. 17. Rough order of importance. E.g. interface and performance are more critical criterai in IP selection than block diagram. #35/45 Erno Salminen - February 2014 Department of Pervasive Computing Macro Design Archive.4 Integration 4. System integration files SW driver for accessing IP from CPU test program that reads and writes some registers of the IP bus functional models of other system components cycle-based simulator and HW emulator models recommendation of commercially available software required for HW/SW cosimulation and system integration (as appropriate for the particular macro) High-level functional model #36/45 HDL, Matlab, C/C++, UML Erno Salminen - February 2014 Department of Pervasive Computing Macro Design Archive.5 Example(s) Easy start is the key Otherwise, potential users get frustrated and will not the use the IP (unless management forces to…) Users can learn the basic idea by repeating a well-prepared simulation setup Very simple tecstbench-like example that ”does something useful” Since it runs OK, I must have all the files, paths, necessary tools… One can copy-paste code for own projects Not necessarily full-blown testing here! No corner-cases, only the most common operation(s) Not much input data, just enough to get the idea ”Police instructions” with screen captures ensure that everyone can repeat the simulation easily Annotated screen captures show what should happen during simulation #37/45 Erno Salminen - February 2014 Department of Pervasive Computing Reuse must have an easy start TB is not that good for introducing basics Written application notes are somewhat better Ready-made executable examples are excellent System does something useful in few minutes ”Hello world” is the world’s most important program Allow imitation and copy-pasting unlike documents Gove feeling e.g. about time usage #38/45 Erno Salminen - February 2014 Department of Pervasive Computing Reuse made easy with Kactus2 Erno Salminen - February 2014 Department of Pervasive Computing Motivation = there’s lots of everything Product data management (PDM) = track and control data related to a particular product Lots of products in a company (changing) requirements versions, e.g. 50 IP components per SoC, tens of tools, up to ten OS people, e.g. tens to hundreds, sites across the world files, e.g. 10k files per SoC, lots of legacy code languages, e.g. VHDL, Verilog, C/C++, Python, asm… v h d c Long lifecycles, e.g. offer support for 5-10 years Staff will change Must use a version management tool (Svn, Git …) Kactus2 tool simplifies managament and increases productivity #40/45 Erno Salminen - February 2014 Department of Pervasive Computing Vendor-specific integration tools are great but… Altera Qsys Hard to transfer design between different vendors Xilinx Platform Studio #41/45 Erno Salminen - February 2014 Department of Pervasive Computing IP reuse and product data management We must enhance integration We must track ”what’s in there” E.g. deployed controller in a factory IPs, versions, parameters… IP core is the most common system- level abstraction nowadays IP-XACT is IEEE standard for capturing component meta-data HW design view in Kactus2 tool http://sourceforge.net/proje cts/kactus2/ Simplifies transfer between products, teams and companies Simplifies integration Captures interfaces, file sets, hierarchy, versions, configurations… Machine-readable (xml) #42/45 Erno Salminen - February 2014 Department of Pervasive Computing IP-XACT Rather large standard (374 pages) Component, e.g. NiosII Files, ports, parameters Design, e.g. example_soc Instantiates, configures and connects components together E.g. Nios, audio codec, vga, PLL… Bus interface, e.g. Avalon Collection of ports (e.g. data, addr, we, stall…) Master and slave roles Allow interconnecting and multiple wires with single connections and type checking Vendor-independent, but supports vendor extensions Make some things handier… and might cause vendor-lock accidentally/on purpose #43/45 Erno Salminen - February 2014 Department of Pervasive Computing Kactus IP-XACT tool (TUT 2009-) Manage IP library Create ”electronic data sheets” Find out where IP has been used Create hierarchical HW design Check pin interface compliance Draft IP blueprints Drag-and-drop integration, user-friendly GUI Document SW structure and mapping Generate structural VHDL, docs, synthesis/simulation scripts, code templates… Vendor-independent C++/Qt, open source, GPL2 http://funbase.cs.tut.fi #44/45 Erno Salminen - February 2014 Department of Pervasive Computing Conclusions Reuse everything you can Generality and reusability degrades component’s performance a little - Much larger impact in design effort Reusable IPs come in 3 forms: soft, firm and hard Pay attention to interfaces and simple first usage IP package includes lots of stuff in addition to codes Kactus is an excellent framework for embedded systems Package IP, handle IP libraries, light-weight product data management, hierarchical HW and SW design, design automation More info http://funbase.cs.tut.fi/ http://tce.cs.tut.fi/ #45/45 Erno Salminen - February 2014 Department of Pervasive Computing Extra material Erno Salminen - February 2014 Department of Pervasive Computing Phases of IP creation and reuse Several phases precede the actual integration (phase 6): 1. Creation— according to guidelines that simplify reuse 2. Qualification—ensures that IP has qualities expected by consumers (integrators) 3-4. Classification and search—consumers must find appropriate IP blocks from on-line catalogs where components are classified according to adequate criteria 5. Transfer— delivers all needed information for evaluation and integration, including design data, documentation, test specification 5B. Evaluation—a more accurate evaluation is usually needed before integrating it into a design, e.g. by instantiating the component into a testbench and simulating it #47/45 Erno Salminen - February 2014 [F. Wagner et al., Strategies for the integration..., VLSI, 2004] Department of Pervasive Computing Macro Design Archive.5 Example(s) Include readme.txt ASCII file Can be opened everywhere Purpose of IP, file directories, instructions how to start Makefile/compile_all.sh/syn.sh to compile/synthesize everything Ready-made wave form files HDL that instantiates the macro 1. With easy-to-understand components 2. In realistic environment, e.g. CPU using the accelerator macro #48/45 Erno Salminen - February 2014 Minimize the dependencies to external components (CPUs, memories, etc.) Minimize the utilized languages Includes SW code compilation and execution Department of Pervasive Computing Header and license information All (code) files start with header File name, purpose Author, date created, date updated Actually, all your files should start with header (text, .tex, pptx, .xlsx, .html etc.) All open source files should have license information Otherwise, potential users don’t necesarily dare tyo use it Project root directory should a file COPYING For example http://www.gnu.org/licenses/lgpl2.1.txt #49/45 Erno Salminen - February 2014 Department of Pervasive Computing Open source license information Zillion different licenses available: GPL, LGPL, MIT, BSD… Consider 3 main points 1. Does code adhere to free/open source SW criteria 2. (weak) copyleft: 3. Erno Salminen - February 2014 General method for making a program (or other work) free (libre), and requiring all modified and extended versions of the program to be free as well http://en.wikipedia.org/wiki/Copyleft non-viral or viral #50/45 http://www.gnu.org/philosophy/free-sw.html http://opensource.org/docs/osd Will derived works inherit the same license, like being contaminated by virus? E.g. including GPL code makes the whole project GPL (viral license) but LGPL does not (non-viral) Department of Pervasive Computing License example (LGPL) #51/45 -- Funbase IP library Copyright (C) 2011 TUT Department of Computer Systems --- This file is part of HIBI --- This source file may be used and distributed without -- restriction provided that this copyright statement is not -- removed from the file and that any derivative work contains -- the original copyright notice and the associated disclaimer. --- This source file is free software; you can redistribute it -- and/or modify it under the terms of the GNU Lesser General -- Public License as published by the Free Software Foundation; -- either version 2.1 of the License, or (at your option) any -- later version. --- This source is distributed in the hope that it will be -- useful, but WITHOUT ANY WARRANTY; without even the implied -- warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR -- PURPOSE. See the GNU Lesser General Public License for more -- details. --- You should have received a copy of the GNU Lesser General -- Public License along with this source; if not, download it -- from http://www.opencores.org/lgpl.shtml Erno Salminen - February 2014 Department of Pervasive Computing User guide examples Jussi Nieminen, Traffic Generator Usage, TUT, 2010 (+ source codes) http://www.tkt.cs.tut.fi/research/nocbench/data/traffic_generator_20091201.zi p Jussi Nieminen, Syntesoituva liikennegeneraattori piirinsisäisten tiedonsiirtoverkkojen testaukseen ja suorituskykyvertailuun, kandidaatintyö, Tampereen teknillinen yliopisto, 2010, 30 s. http://www.tkt.cs.tut.fi/kurssit/1570/kandityot/Kandidaatintyo_Jussi_Nieminen. pdf Esko Pekkarinen, Transaction Generator 2 Tutorial, TUT, 2010 http://www.tkt.cs.tut.fi/research/nocbench/data/sctg2_tutorial.pdf #52/45 Erno Salminen - February 2014 Department of Pervasive Computing Reporting digital systems The result (area/power/performance/frequency/combination) is a sum of several factors Comparison not possible unless they are specified Process line width, supply voltage, worst/nominal/best-case, low power/high speed process, output load, FPGA type number and LUT count, Detail level RTL/layout results, VHDL/Verilog/SystemC, memories inluded?, I/O pads and power supplies included?, memory usage for SW, power simulation: supply voltage, RTL/gate-level/transistor sim., swithing activity, leakage included?, crosstalk included?, glitches included?, memories included? Design method synthesis/full-custom design, tool version, System parameters data width, buffer size, pin-limited/logic-limited system?, memory interface Others... [E. Salminen, On Preparing Clear Publications, lecture slides on TKT-2410 Scientific publishing, Tampere, Finland, Nov. 2005.] #53/45 Erno Salminen - February 2014 Department of Pervasive Computing Design Verification Requirements All values, except for the first row (productivity) are linearly increasing or decreasing trends, where the initial values in 2006 are derived from industry survey data, and the final values are estimated. [1] This requirement considers the SOC design productivity requirement specified in Table 20 – SOC logic Mtx per designer-year, and divides it by the percentage of design effort spent in verification, so that to obtain the amount of logic which can be verified in a year. The percentage of the effort spent in verification is a linear trend starting at 70% in 2006 and ending at 50% in 2020. [2] The reuse data uses the equation: new verification infrastructure + reused + 3rd party verification IP = 100% . The table reports analytical values for the new verification infrastructures and acquired 3rd party IP, while the estimated size of reused components can be derived. [3] That is, number of assertions or checkers inserted in a design or referring to an aspect of the design, for each corresponding million of transistors of logic generated when synthesizing that design. ITRS 2006 update, http://www.itrs.net/ Links/2006Update/ FinalToPost/02_D esign_2006Update .pdf #54/45 Erno Salminen - February 2014 Department of Pervasive Computing Processors core examples, 180nm CMOS [Salminen et al., Comparison of Hardware IP Components for System-on-Chip, Tampere Soc Symposium, 2004.] #55/45 Erno Salminen - February 2014 Department of Pervasive Computing HW Accelerator examples, 180nm CMOS In this study, the avg size was around 70-80 kilogates #56/45 Erno Salminen - February 2014 Department of Pervasive Computing Download and try http://sourceforge.net/projects/kactus2/ #57/45 Erno Salminen - February 2014 Department of Pervasive Computing Kactus screenshots: HW design Detailed settings of selected component or whole design Library of IP reusable components Bidir pins Chip output Component Connection Graphical a design area Help window #58/45 Erno Salminen - February 2014 Department of Pervasive Computing IP library Library + import Import wizard packages new IP in few minutes User selects only this in simple case automatic Filter Search Hierarchical component Contents of the top-level VHDL file Automatically detected generic parameters. These will be saved in IPXACT XML Desing (prev slide) Subcomponent #59/45 Erno Salminen - February 2014 Automatically detected ports. These will be saved in IP-XACT XML Department of Pervasive Computing Component editor Utilized fields in boldface Component editor window Unique VLNV identifier Filled with See wizard (prev slide) Symbol preview Bus interface groups many ports togetherd Allowed but unutilized IP-XACT fields #60/45 Erno Salminen - February 2014 Department of Pervasive Computing Component’s filesets Files divided into 3 categories VHDL files arp3 depends on udp_ip_pkg File status OK (found on disk, not modified) Top-level component Anecdote of the day. The name ”Kactus” is derived from the acronym ”ACT” (architecture configuration tool) to get something more easily pronouncable. #61/45 Erno Salminen - February 2014 Department of Pervasive Computing System design Map SW onto CPUs Shows automatically all CPUs from all hierarchy levels Nioses and dct on DE2 board uBlaze on Xilinx board x86 on laptop CPU SW component Appindependent communicati on library Kactus generates makefiles for each CPU according to mapping IP-XACT extension available only in Kactus #62/45 Erno Salminen - February 2014 This CPU does nothing in this use case Accelerator, not used in this use case Department of Pervasive Computing Snippet from generated IP-XACT file <?xml version="1.0“...><!-- Created by Kactus2 ... > <spirit:component…"> <spirit:vendor> TUT </spirit:vendor> <spirit:library> ip.hwp.interface </spirit:library> <spirit:name> udp_ip_dm9000a </spirit:name> <spirit:version> 1.0 </spirit:version> <spirit:description>Receives/transmits data from/to… ... <spirit:busInterface><spirit:name>DM9000A</spirit:name> <spirit:master/> ... <spirit:portMap> <spirit:logicalPort> <spirit:name>eth_chip_sel_out</spirit:name> ... <spirit:fileSet><spirit:name>HDLsources</spirit:name> <spirit:file><spirit:name>../vhd/arp3.vhd ... #63/45 Erno Salminen - February 2014 Department of Pervasive Computing Snippet from generated VHDL file -- File: udp_flood_example_dm9000a.vhd ... entity udp_flood_example_dm9000a is port ( -- Interface: clk_in -- Clock input. clk_in_CLK : in std_logic; ... DM9000A_eth_interrupt_in : in std_logic; ... architecture kactusHierarchical of udp_flood_example_dm9000a is signal pll_flooderCLK : std_logic; signal floodertx_udpnew_tx_in : std_logic; ... pll25 : altera_de2_pll_25 port map ( c0 => pll_flooderCLK, inclk0 => clk_in_CLK ); flooder : simple_udp_flood_example port map ( clk => pll_flooderCLK, ... #64/45 Erno Salminen - February 2014 Department of Pervasive Computing Benefits of Kactus Clear view on all products and components What IP do we have? Where they been used? What are their details? What are the details of the product Gizmo? Excellent GUI Intuitive and easy to use Graphical view shows structure well No need to edit XML manually Design automation Less mistakes Easy to try out different variants of the design Open source #65/45 Erno Salminen - February 2014 Department of Pervasive Computing Custom processors ASIP = Application Specific Instruction-set Processor Extend CPU with application (domain) specific instructions MAC, sum with clipping, DCT etc. Extension tightly coupled with CPU pipeline Optimize internal communication within CPU Remove unnecessary instructions Otherwise configure CPU (num of registers, data width...) Allow using C/C++ compilation #66/45 Erno Salminen - February 2014 Department of Pervasive Computing TTA (2) Harvard architecture Separate instruction and data memories Supports multiple data memories C compiler and simulator automatically configured to new micro-architecture Only one instruction: move e.g. ”Add r2, r3, r3: move RF[2] -> ALU.op1 move RF[3] -> ALU.trig move ALU.result -> RF[3] Instruction word has as many fields as there are internal buses Resembles VLIW, Everything scheduled at compile-time Larger code size than RISC #67/45 Erno Salminen - February 2014 Department of Pervasive Computing Move instruction is handy Instructions control the internal buses, and operations happen as a side-effect Resource sharing for buses Move result from FU’s output to next one’s input, instead of going through register file -> less registers and ports to register file More freedom in code scheduling than traditional CPUs. Move can happen later (or earlier) if the result reg (input) reg is not needed for other data -> less buses needed, supports different pipeline depths in FUs Number of units and buses easily configurable Number of inputs and outputs in an FU is easily configurable (not just 2 inputs and 1 outputs) #68/45 Erno Salminen - February 2014 Department of Pervasive Computing TTA performance Better area and performance than general purpose RISC Special function unit (SFU) Designed and added manually Arbitrary latency and num of operands (thanks to transport-triggered scheme) Decreases ex.time but increases area For certain algorithms, same cycle counts as ASIC may achieved ASIC has larger operating frequency, though Currently, TTA+tools developed at TUT Download: http://tce.cs.tut.fi/ Used in course TKT-3526 Processor design Interested students may do project work on TTA #69/45 Erno Salminen - February 2014 Department of Pervasive Computing Area vs. runtime trade-off TTA’s cycle count is smaller than RISC, close to ASIC ASIC has highest frequency TTA’s area between ASIC and RISC Specialized SFU is very useful (memory excluded) (memory excluded) RC4 exploration #70/45 [P. Hämäläinen, Euromicro DSD, 2005] Erno Salminen - February 2014 Department of Pervasive Computing
© Copyright 2024 ExpyDoc