Tightly-Coupled FPGA Cluster with TERASIC DE5-NET boards Custom Computing Framework for Real Applications Kentaro Sano Sano Laboratory Graduate School of Information Sciences, Tohoku University T.C.F.C 1 21 Feb, 2014 Sano Lab Why Tightly-Coupled FPGA Cluster? Low-power and scalable custom computing with FPGAs Low-power : dedicated data-paths, memory systems, networks on FPGAs Scalable : low-latency HW-to-HW direct communication/synchronization via accelerator-domain network: ADN Testbed for development and product run of “real” applications Qsys-based hardware framework on FPGA Linux driver, API, FPGA-class library for software development Researches for compilers, tools, and applications Experiences with running an actual system (trouble shooting, etc.) General-purpose network Intra-node network DRAM DRAM DRAM DRAM CPU FPGA FPGA FPGA FPGA PCI-Express (x8) DRAM DRAM DRAM DRAM CPU DRAM DRAM DRAM DRAM FPGA FPGA FPGA FPGA Inter-FPGA network (Accelerator-domain network, ADN) FPGA FPGA FPGA FPGA CPU DRAM DRAM DRAM DRAM FPGA FPGA FPGA FPGA CPU Computing node w/ FPGA boards Architecture of tightly-coupled FPGA cluster T.C.F.C 2 Tightly-Coupled FPGA Cluster 21 Feb, 2014 Sano Lab Tightly-Coupled FPGA Cluster Overview System configuration 4 x host PCs 4 x FPGAs / PC 4 x 10G SFP+ ports / FPGA Node 01 Implementation Node 02 Linux on nodes Qsys framework on FPGAs 10G SFP+ A(Tx, Rx) 10G SFP+ B(Tx, Rx) 10GbE SW QDR II+ SRAM D 12.8 GB/s x18@ 500MHz 1GB/s for read/write ALTERA Stratix V FPGA 5SGXEA7 N2F45C2 FPGA 10G SFP+ C(Tx, Rx) 10G SFP+ D(Tx, Rx) 12.8 GB/s x64@ 800MHz (DDR) up to 1066MHz DDR3 DRAM B PC3-12800 (DDR3-1600) SFP+ 10G Ether DDR3 DRAM A PC3-12800 (DDR3-1600) DE5-NET x4 each DDR3 memory 2GB as default (up to 8GB) QDR II+ SRAM A QDR II+ SRAM B Node 03 Node 04 QDR II+ 18 Mbits each SRAM C (20-bit addressing for 18-bit data) QDRII SRAM 10Gbps+ each (Tx, Rx) PCIe 3.0 x 8 : 8GB/s (Tx, Rx) PCI-Express FPGA board (Stratix V) UPS T.C.F.C 3 Tightly-Coupled FPGA Cluster 21 Feb, 2014 Sano Lab Front and Back Node 01 Node 01 Node 02 Node 02 64 port 10GbE switch 10GbE SW Node 03 Node 03 Node 04 Node 04 UPS T.C.F.C 4 Tightly-Coupled FPGA Cluster 21 Feb, 2014 Sano Lab More Photos 4 FPGAs on node Temperature sensors 10GbE SFP+ ports status LEDs on boards 64 port 10GbE switch 5 T.C.F.C Tightly-Coupled FPGA Cluster 21 Feb, 2014 Sano Lab Hardware/Software Stack Application Software FPGA class, FPGAs class DMA API Developed Layers SW 10G MAC API PCI-Express & DMA Driver Linux Kernel PCIe I/F DMAs DDR3 Ctrls QDRII+ Ctrls 10G MACs Application Logic HW FPGA T.C.F.C 6 Tightly-Coupled FPGA Cluster 21 Feb, 2014 Sano Lab Future Work Scalable and low-power computation Parallel fluid simulation with building cube method Deep learning for image/video recognition Molecular dynamics simulation Gene info processing Further development of framework OS management of FPGA resources Stream processor generator : SPGen ST Splitter cell attribute 1 word 9 words STsink Equilibrium Calc & Collision Pipelines (ECPs) ECP 8 ECP 2 ECP 1 ECP 0 designed with FloPoCo ST sink Macro calc pipeline (MCP) Macro, Equi., Col. ST src Unit (MECU) 9 words ST sink ST src 1 word ST sink ST sink Translation Unit (TLU) 3 words STsrc Delay Unit ST src 9 words 3 words ST src ST Merger 10 words Processing Element (LBM PE) ST sink Boundary Unit (BDU) ST src 10 words (to memory or the next PE) 7 Tightly-Coupled FPGA Cluster to / from adjacent PE or inter-FPGA transfer uni System tools 10 words (from memory or the previous PE) to / from adjacent PE or inter-FPGA transfer unit Partial reconfiguration support FPGA-direct communication via PCIe Inter FPGA communication with SATA cables Remote DMA among FPGAs 21 Feb, 2014 T.C.F.C Sano Lab
© Copyright 2024 ExpyDoc