View presentation

Memory Use Cases in FPGAenabled Systems
Oct, 2014
Richard Shaw
Sr. Manager, Product Planning
2x Performance Gain at a 30% Cost Increase
“Stratix V FPGAs… increase ranking throughput in a
production search infrastructure by 95% at comparable latency
to a software-only solution. The added FPGA compute boards
only increased power consumption by 10% and did not exceed
our 30% limit in the total cost of ownership of an individual
server, yielding a significant overall improvement in system
efficiency”
Source: Microsoft paper, A Reconfigurable Fabric for Accelerating Large-Scale
Datacenter Services
2
FPGA: Field Programmable Gate Array


3
Parallel IO memory: DDR4, DDR3, QDR IV, QDR II+, RLDRAM3
Serial memory: HMC, MoSys BE
FPGA Replaces Traditional ASICs and ASSPs
Device Comparative Snapshot
Configurability
Total Cost of
Ownership
for System
Designs
Via Software
and Hardware
Low
Via Software
and Hardware
Low, Volume
Dependent
Time to Market
Fast
Fast
Design Flexibility
that Lowers Risk
Yes
Yes
* If the ASSP is available, you are later than the competition to market
despite fastest design time. If it’s not, you have high risk, no flexibility, no
differentiation.
4
Via Software Only
High, Volume
Dependent
Slowest
No
Via Software Only
Contextually
Determined: Lowest
Cost for Fixed
Function Within
System
Slowest for New
ASSP/ Fastest
for Established
ASSP*
No
Altera’s Target Markets & Industries
Industrial and Automotive
Communications
Automation and
Process Control
PLC and I/O Modules, Motion and Motor Control,
Industrial Networking, Sensor/Encoder Interfaces
Networking
Switches, Routers
Building Control
and Security
Video Surveillance, Access Control, HVAC Control
Wireline
Optical Metro Access
Automotive
Displays, Infotainment, Driver Assistance
Wireless
Remote Radio Head, Basestations,
Wireless LAN
Smart Energy
Smart Grid/Meter, Energy Management,
Power Distribution
Broadcast
Studio, Satellite, Broadcasting
Military and Aerospace
Military and Aerospace
5
Intelligence
Deep Packet Inspection, Data Analysis, High
Performance Computing, Acceleration, Access
EW/Radar
Counter-IED, Jammers, Decoys, Early Warning
Radar; Airborne, Ship-Borne and Stationary
Radar
Secure
Communications
In-Line Network Encryptors; Airborne,
Vehicular, Tower and Tactical Radios
Guidance &
Control
Aircraft, Missile, Vehicle and Robot Guidance
and Control, Instrumentation Clusters
Computing, Consumer, Storage, Test, and Medical
Computer
and Storage
Servers, RAID, High Performance Computing,
Flash Storage, MFP
Consumer
Displays, Set-Top-Boxes
Test
IP Video Testers, Protocol Testers
Medical
CT Equipment, Ultrasound
Application Case 1: Data Center


FPGAs used for search
acceleration
Two dual-rank DDR3-1600
SO-DIMMs
−


DRAM to store models
−
Models loaded to FPGA M20K RAM
during run-time
−
Model Reload takes up to 250us, much
slower than processing
Increased memory bandwidth
needed
−

8GB @ DDR3-1333 or 4GB singlerank @ DDR3-1600
Insufficient physical space to add
additional DRAM channels
Food for thought
−
Could HMC or 2.5D DRAM be better
solution in the future?
Source: Microsoft paper, A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services
6
Application Case 2: Memory Intensive Networking
Front End
Optics
(& Processing)
Packet
Processing
(PP)
Traffic
Manager
(TM)
Backplane
Switch
(FIC)
PP Function
Memories Used
TM Function
Memories Used
Parsing
M20K*
Free List
M20K, QDR, RLD
Packet Store
M20K, DDR
Linked List
M20K, QDR, RLD
Classification
TCAM
QDR, DDR, RLD
Packet Editing
M20K, QDR, RLD
Queue & Buffer
Management
Statistics
M20K, DDR, RLD
nQ, dQ (head,tail ptrs) QDR, RLD
Policing
M20K, QDR, RLD
Congestion Mgt.
QDR, RLD
Forwarding
DDR
Scheduler
QDR, RLD
* M20K: Distributed embedded SRAM in Altera FPGA
7
Packet Buffering DRAM Requirement
Front End
Optics
(& Processing)
Full Duplex
Line Rate
Packet
Processing
(PP)
Traffic
Manager
(TM)
Backplane
Switch
(FIC)
Packet
Buffering
BW* (Gb/s)
# DDR42400 72b
UDIMM (153
Gb/s)
# DDR4 IO
Required
# HMC SR15G (1280
Gb/s)
# HBM
(1024 Gb/s)
100G
572
4
556
1
1
200G
1143
8
1112
1
2
400G
2286
16
2224
2
3
FPGA not enough IO for 200+G system
HMC & HBM meets BW requirement
Note: * Assume 70% DRAM controller efficiency
8
Control Plane Memory Requirements
Capacity
constrained
RTR
constrained
Existing control plane memory is port and IO
constrained for forward looking applications
9
100G-400G Wireline Memory Requirements for FPGAs
TM Random (M. Trans/sec)
12000
10000
8000
BW (Gbit/sec)
1800
1600
Beyond 200G Serial
HMC is recommended
1400
Package pin
constraint for
control plane
1200
Inflection Point
6000
1000
800
600
4000
Data plane
constraint. 4x72b
DDR4 @ 1200 MHz.
2000
0
100
200
400
200
0
Random Trans./Sec (M)
Full Duplex BW (Gb/sec)
400
Offered Load Gb/sec
FPGA IO & packaging solution will be challenged to meet systemlevel power & performance requirements. Inflection point at 200G
10
Application Case 3: Flash Storage
X86 CPU
DDR3
Controller
FPGA
DDR3
Slave
Flash
Controller
flash
flash
flash



11
FPGA used as bridge between flash memory and CPU
FPGA is a DDR3 slave to the x86 CPU
FPGA also implements flash controller
Breakthrough Advantage with Generation 10




TSMC 20 nm process
15% higher performance than
current high-end with 40%
lower midrange power
5x higher customer
commitment dollar
value at time of launch
Dual-core 32-bit ARM
Cortex-A9 processor
Reinventing the
Midrange
12

Intel 14 nm Tri-Gate process

2x performance increase

70% power savings

Quad-core 64-bit ARM
Cortex-A53 processor

3D-capable for integrating
SRAM, DRAM, ASIC
Delivering Unimaginable
Performance
Summary


FPGA’s flexibility and versatility enable wide usage in
different industries
Altera FPGAs have broad memory technology support
− DDR4, DDR3, QDR IV, RLDRAM 3
− Flash, MRAM
− HMC, MoSys BE
− 2.5D / 3D memory

13
Altera well positioned to support target markets and
applications
Thank You