Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano Keio Univ. National Institute of Informatics Toshiba RDC Keio Univ. Keio Univ. Network-on-Chip (NoC) • Tile-based Multi-Core – Core: Execution – Router: Packet delivery • RAW 1 2 3 4 5 6 7 8 [Furtek, FPL2004] – Tree • aSoC 0 [Taylor, Micro2002] – 2D Mesh • ACM Tile (RISC, RAM, I/O) [Liang, TVLSI2004] – 2D Mesh Network-on-bChip (NoC) • Tile-based Multi-Core – Core: Execution – Router: Packet delivery • RAW [Taylor, Micro2002] – 2D Mesh • ACM [Furtek, FPL2004] – Tree • aSoC MIPS Memory [Liang, TVLSI2004] – 2D Mesh Router Network-on-Chip (NoC) ○ Advantage • Better Wiring Delay – Global wiring – Limited-length Links • Improve Modularity – Standard Network I/F × Drawback Tile (RISC, RAM, I/O) 0 1 2 3 4 5 6 7 8 • Overhead SoC is growing! NoC is one of Scalable on-chip interconnects Stream Processing ~Simulation~ • MPEG, JPEG, Viterbi – System Level Design • No Clock for execution Data Module(a) • Communication is cycle accurate Data Module(a) Clock Module(b) UTF Model UnTimed Functional Module(b) High Abstraction BCA Model Bus Cycle Accurate RTL Model Detail Design Application is divided into some Tasks based on Simulation. Stream Processing ~Map, Route~ • Shared Links – Link Congestion Throughput is degraded • Optimization (in general) – Mapping: Minimum Communication Length Strong access locality !! – Routing: Minimal Paths Physical Tile of NoC (2) (1) (2) (2) (3) (4) (1) (2) (2) (2) (4) (3) Graph Too Task shortFlow to distribute path congestion by Minimal paths. Existing Routing ~Is non-minimal path useful?~ Common feature of SAN & NoC • Packet delivery • Deadlock freedom – WH Switching Feature of SAN • Various applications, Various traffic patterns – Non-minimal paths make unstable state – Turn-Model, … Feature of NoC [Ho, HPCA2003] • Fixed application, Fixed traffic patterns – System level simulation Predictable communication Load balancing with non-minimal Flee ~Non-minimal routing strategy~ • Stream processing in NoCs – Strong access locality !! – Too short to distribute path congestions Increase # of alternative paths by introducing non-minimal paths • Partially non-minimal paths Non-minimal paths are basically inefficient… • Path establishment based on Traffic Amount – Heavy Traffic Comm. Minimal Path – Light Traffic Comm. Avoiding Congestion Flee ~Traffic pattern Analysis~ Traffic Pattern Analysis Record # time, src, dst, size 10000 (0) (1) 32 # srcdst, TotalSize Traffic Analysis Heavy! (0) (1) 8192 10000 (0) (2) 4 (1) (2) 8192 10000 (0) (3) 4 (2) (3) 8192 (0) (2) 1024 (0) (3) 1024 10010 (1) (2) 32 10010 (0) (1) 32 10010 (0) (2) 4 10010 (0) (3) 4 10020 (2) (3) 32 10020 (1) (2) 32 10030 (2) (3) 4 1. For each src-dst pair, –Totalize packet size E.g., src-dst pair(0,1) 32 + 32 64 2. Sorting in descending order … Src-dst pair with largest TotalSize is in first line –In order of TotalSize Each src-dst pair gets a path in order of Analysis Record. Flee ~Establishing Paths~ • In order of Traffic Amount: There will be several alternative paths … – Search for lowest cost path – Increase the cost of links selected Link with high cost is hotspot … # srcdst, TotalSize (0) (1) 8192 (1) (2) 8192 (2) (3) 8192 (0) (2) 1024 (0) (3) 1024 Each link has “Cost” (0) (1) (2) (3) … 解析結果 Analysis Analysis Analysis Record Record Record Paths are assigned not to disturb previously established paths Simulation Environments • Router Model – 4 ports for adj. Routers – 1 port for Core • Network Topology – 4×4 Mesh – 4×4 Torus Packet size 259 flit (2 flit header) Switching method Wormhole switching 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 16 node 2D mesh # of Virtual channels Mesh:1, Torus:2 Router Simulation time Core 1,000,000 cycle 15 Applications for Evaluation • App. Traces – Viterbi Decoder – JPEG Codec – IPsec – Uniform (0) Header Analysis (1) Huffman Decode (2) Inverse Quant. (3) I-DCT for Row (4) (5) Yuv-rgb Convert (6) MCU Mapping (7) I-DCT for Col (8) Rgb-yuv Convert (9) MCU Samping (10) I-DCT for Col (11) I-DCT for Row (12) (13) Stream Gen. (14) Huffman Code (15) Quant. Tile mapping example of JPEG Codec ( for Decoder, for Encoder) Results ~Viterbi @ 2D Mesh~ • Flee • DOR – Avg Hop count:2.52 (Dimension-Order Routing) – Avg Hop count:1.84 Y-axis: Latency [cycle] Communication in Viterbi trace includes Fork and Join. 14.2% Improved X-axis:Accepted Traffic [flit/cycle/node] Results ~Viterbi @ 2D Torus~ • Flee • DOR – Avg Hop count:1.87 (Dimension-Order Routing) – Avg Hop count:1.48 Y-axis: Latency [cycle] Communication in Viterbi trace includes Fork and Join. 22.2% Improved X-axis:Accepted Traffic [flit/cycle/node] Flee improves 22.2% of throughput with non-minimal paths. Results ~JPEG @ 2D Mesh~ • Flee • DOR – Avg Hop count:1.01 (Dimension-Order Routing) – Avg Hop count:1.00 Y-axis: Latency [cycle] In JPEG trace, data is sequentially process. No fork and join pattern. No difference X-axis:Accepted Traffic [flit/cycle/node] Communication is between neighbors No need non-minimal Results ~Effect of Traffic Analysis~ Viterbi @ 2D Mesh • Flee • Flee (Incomplete) – Known data amount – Unknown data amount Y-axis: Latency [cycle] All data transfer size is “1” Incomplete Flee: Not Improved X-axis:Accepted Traffic [flit/cycle/node] Results ~Effect of Traffic Analysis~ Viterbi @ 2D Torus • Flee • Flee (Incomplete) – Known data amount – Unknown data amount Y-axis: Latency [cycle] All data transfer size is “1” Incomplete Flee: Partially Improved X-axis:Accepted Traffic [flit/cycle/node] Communication size is key factor to improve performance. Summary ~Non-minimal routing strategy~ • Stream Processing in NoCs – Strong access locality !! – Too short to distribute path congestions Increase # of alternative paths by introducing non-minimal paths • Flee: Non-minimal routing strategy – Heavy Traffic Comm. Minimal Paths – Light Traffic Comm. Avoiding Congestions • Improve 22.2% of Throughput Thank you for your listening
© Copyright 2025 ExpyDoc