Memory Use Cases in FPGAenabled Systems Oct, 2014 Richard Shaw Sr. Manager, Product Planning 2x Performance Gain at a 30% Cost Increase “Stratix V FPGAs… increase ranking throughput in a production search infrastructure by 95% at comparable latency to a software-only solution. The added FPGA compute boards only increased power consumption by 10% and did not exceed our 30% limit in the total cost of ownership of an individual server, yielding a significant overall improvement in system efficiency” Source: Microsoft paper, A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services 2 FPGA: Field Programmable Gate Array 3 Parallel IO memory: DDR4, DDR3, QDR IV, QDR II+, RLDRAM3 Serial memory: HMC, MoSys BE FPGA Replaces Traditional ASICs and ASSPs Device Comparative Snapshot Configurability Total Cost of Ownership for System Designs Via Software and Hardware Low Via Software and Hardware Low, Volume Dependent Time to Market Fast Fast Design Flexibility that Lowers Risk Yes Yes * If the ASSP is available, you are later than the competition to market despite fastest design time. If it’s not, you have high risk, no flexibility, no differentiation. 4 Via Software Only High, Volume Dependent Slowest No Via Software Only Contextually Determined: Lowest Cost for Fixed Function Within System Slowest for New ASSP/ Fastest for Established ASSP* No Altera’s Target Markets & Industries Industrial and Automotive Communications Automation and Process Control PLC and I/O Modules, Motion and Motor Control, Industrial Networking, Sensor/Encoder Interfaces Networking Switches, Routers Building Control and Security Video Surveillance, Access Control, HVAC Control Wireline Optical Metro Access Automotive Displays, Infotainment, Driver Assistance Wireless Remote Radio Head, Basestations, Wireless LAN Smart Energy Smart Grid/Meter, Energy Management, Power Distribution Broadcast Studio, Satellite, Broadcasting Military and Aerospace Military and Aerospace 5 Intelligence Deep Packet Inspection, Data Analysis, High Performance Computing, Acceleration, Access EW/Radar Counter-IED, Jammers, Decoys, Early Warning Radar; Airborne, Ship-Borne and Stationary Radar Secure Communications In-Line Network Encryptors; Airborne, Vehicular, Tower and Tactical Radios Guidance & Control Aircraft, Missile, Vehicle and Robot Guidance and Control, Instrumentation Clusters Computing, Consumer, Storage, Test, and Medical Computer and Storage Servers, RAID, High Performance Computing, Flash Storage, MFP Consumer Displays, Set-Top-Boxes Test IP Video Testers, Protocol Testers Medical CT Equipment, Ultrasound Application Case 1: Data Center FPGAs used for search acceleration Two dual-rank DDR3-1600 SO-DIMMs − DRAM to store models − Models loaded to FPGA M20K RAM during run-time − Model Reload takes up to 250us, much slower than processing Increased memory bandwidth needed − 8GB @ DDR3-1333 or 4GB singlerank @ DDR3-1600 Insufficient physical space to add additional DRAM channels Food for thought − Could HMC or 2.5D DRAM be better solution in the future? Source: Microsoft paper, A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services 6 Application Case 2: Memory Intensive Networking Front End Optics (& Processing) Packet Processing (PP) Traffic Manager (TM) Backplane Switch (FIC) PP Function Memories Used TM Function Memories Used Parsing M20K* Free List M20K, QDR, RLD Packet Store M20K, DDR Linked List M20K, QDR, RLD Classification TCAM QDR, DDR, RLD Packet Editing M20K, QDR, RLD Queue & Buffer Management Statistics M20K, DDR, RLD nQ, dQ (head,tail ptrs) QDR, RLD Policing M20K, QDR, RLD Congestion Mgt. QDR, RLD Forwarding DDR Scheduler QDR, RLD * M20K: Distributed embedded SRAM in Altera FPGA 7 Packet Buffering DRAM Requirement Front End Optics (& Processing) Full Duplex Line Rate Packet Processing (PP) Traffic Manager (TM) Backplane Switch (FIC) Packet Buffering BW* (Gb/s) # DDR42400 72b UDIMM (153 Gb/s) # DDR4 IO Required # HMC SR15G (1280 Gb/s) # HBM (1024 Gb/s) 100G 572 4 556 1 1 200G 1143 8 1112 1 2 400G 2286 16 2224 2 3 FPGA not enough IO for 200+G system HMC & HBM meets BW requirement Note: * Assume 70% DRAM controller efficiency 8 Control Plane Memory Requirements Capacity constrained RTR constrained Existing control plane memory is port and IO constrained for forward looking applications 9 100G-400G Wireline Memory Requirements for FPGAs TM Random (M. Trans/sec) 12000 10000 8000 BW (Gbit/sec) 1800 1600 Beyond 200G Serial HMC is recommended 1400 Package pin constraint for control plane 1200 Inflection Point 6000 1000 800 600 4000 Data plane constraint. 4x72b DDR4 @ 1200 MHz. 2000 0 100 200 400 200 0 Random Trans./Sec (M) Full Duplex BW (Gb/sec) 400 Offered Load Gb/sec FPGA IO & packaging solution will be challenged to meet systemlevel power & performance requirements. Inflection point at 200G 10 Application Case 3: Flash Storage X86 CPU DDR3 Controller FPGA DDR3 Slave Flash Controller flash flash flash 11 FPGA used as bridge between flash memory and CPU FPGA is a DDR3 slave to the x86 CPU FPGA also implements flash controller Breakthrough Advantage with Generation 10 TSMC 20 nm process 15% higher performance than current high-end with 40% lower midrange power 5x higher customer commitment dollar value at time of launch Dual-core 32-bit ARM Cortex-A9 processor Reinventing the Midrange 12 Intel 14 nm Tri-Gate process 2x performance increase 70% power savings Quad-core 64-bit ARM Cortex-A53 processor 3D-capable for integrating SRAM, DRAM, ASIC Delivering Unimaginable Performance Summary FPGA’s flexibility and versatility enable wide usage in different industries Altera FPGAs have broad memory technology support − DDR4, DDR3, QDR IV, RLDRAM 3 − Flash, MRAM − HMC, MoSys BE − 2.5D / 3D memory 13 Altera well positioned to support target markets and applications Thank You
© Copyright 2024 ExpyDoc