ネットワークコンピューティング論Ⅱ 平成21年度 後期 火曜 第2時限(10:40-12:10) 吉永 努(UEC) [email protected] NC論2 1 内容 • 分散・並列処理計算機における相互結合ネット ワークとその上でのメッセージ・ルーティング技 法などについて学ぶ • 資料 http://comp.is.uec.ac.jp/yoshinagalab/yoshinaga/dp2.html • http://www.gap.upv.es/slides/AppendixE.html • TA: Axida君 [email protected] NC論2 2 References • T. M. Pinkston and J. Duato: Interconnection Networks, Appendix E in Computer Architecture: A Quantitative Approach, 4th Edition, Morgan Kaufmann publishers (2006). • J. Duato, S. Yalamanchili, L. Ni: Interconnection Networks - an Engineering Approach-, IEEE CS press (1997) • 同第2版, Morgan Kaufmann publishers (2003) • 富田眞治: 並列コンピュータ、昭晃堂(1996) • W.D. Dally, B. Towles: Principles and Practices of Interconnection Networks, 3 NC論2 Morgan Kaufmann publishers (2003) What is an interconnection Network? • It is a programmable system that transports data between terminals, such as processors and memory. • It is programmable in the sense that it makes different connections at different points. • It is a system because it is composed of many components: buffers, channels, switches, and controls that works together to deliver data. NC論2 4 Interconnection Network (1/2) Interconnection Network P P P M M M Multicomputer NC論2 5 Interconnection Network (2/2) P P P Interconnection Network M M M UMA type shared memory multiprocessor It is also called dance-hall architecture. NC論2 6 Trend • Its performance is increasing with processor performance at a rate of 50% per year. • Communication is a limiting factor in the performance of many modern systems. • Buses have been unable to keep up with the bandwidth demand, and point-to-point interconnection networks are rapidly taking over. NC論2 7 Computer Classifications (%) 2008/06 2007/06 2006/06 MPP 19.6 21.4 19.6 Cluster 80.0 74.6 72.8 Others 0.4 4.0 7.6 http://www.top500.org/ share of the TOP500 June, 2008 – June, 2006 NC論2 8 Examples of MPPs processor NEC EarthSimulator Originalvector 1GHz (8GFlops) IBM BlueGene/L PPC440 Dual core 0.7GHz Topology #proc. 640 x 640 640-node x crossbar 8 -way switch = 5,120 3D torus + Tree 32 x 32 x 32 x 2-way = 65,536 (2.8GFlops) NC論2 9 Examples of clusters processor T2K Hitachi Cluster AMD Opteron QC 2.3 GHz (9.2GFlops) IBM Roadrunner PowerXCell 8i 3.2 GHz Interconnect #cores Myrinet 10G 12,288 Infiniband 122,400 (12.8GFlops) NC論2 10 Networks for Supercomputers • Earth simulator: 640 x 640 crossbar switch ( 640 x 8 ≒ 5K processors ) • ASCI Q: Quadrics network (fat-tree) ( 1024 SMPs x 4 x 3 ≒ 12K processors ) • Cray X1: modified 2D torus (~ 4K processors ) • Cray Red Storm: 3D mesh (27 x 16 x 24 ≒10K processors) • IBM Blue Gene/L: 3D torus (64 x 32 x 32 = 64K nodes) NC論2 11 Architecture vs. software memory programming UMA (SMP) shared OpenMP NUMA (MPP) distributed (not shared) MPI (message Passing Interface) NC論2 12 Network Design (1/3) • Performance: latency and throughput (bandwidth) • Scalability: #processors vs. network, memory, I/O bandwidth • Incremental expandability: small to maximum size • Partitionability: netwrok may be partitioned for several users NC論2 13 Network Design (2/3) • Simplicity: simple design, higher clock frequency, easy to use • Distance span: smaller system is preferred for noise and cable delay, etc. • Physical constraints: packaging (pin count), wiring(wire length), and maintenance (power consumption) should meet physical limitation. NC論2 14 Network Design (3/3) • Reliability: fault tolerant, reliable communication, hot swap • Expected workload: robust performance over a wade range of traffic conditions. • Cost: trade-offs between cost and performance. NC論2 15 Classifiction of Interconnection Networks • Shared-Medium Networks – Local area networks (ethernet, token ring) – Backplane bus (e.g. SUN Gigaplane) • Direct Networks (router-based) – mesh, torus, hypercube, tree, … etc. • Indirect Networks (switch-based) • Hybrid Networks NC論2 16 Shared-Medium Networks (LAN) • Arbitration that determines the mastership of the shared-medium network to resolve network access is needed. • The most well-known protocol is carrier-sense multiple access with collision detection (CSMA/CD). • Token bus and token ring pass a token from the owner which has the right to access the bus/ring and resolve nondeterministic waiting time. NC論2 17 Shared-Medium Networks (Backplane bus) • It is commonly used to interconnect processor(s) and memory modules to provide SMP (Symmetrical Memory Processor) architecture. • It is realized by printed lines on a circuit board by discrete wiring. • Gigaplane in SUN Enterprise x000 server(1996): 2.6GB/s, 256 bits data, 42 bits address, 83.8MHz clock. NC論2 18 Direct (static) Networks • Consists of a set of nodes. • Each node is directly connected to a subset of other nodes in the network. • Examples: – 2D mesh (intel Paragon), 3D mesh (MIT J-Mahine) – 2D torus (Fujitsu AP3000), 3D torus (Cray T3D, T3E) – Hypercube (CM1, CM2, nCUBE) NC論2 19 Mesh topology node 2D 3D NC論2 20 Torus topology 2D 3D (4-ary 2-cube) (3-ary 3-cube) NC論2 21 Hypercube (binary n-cube) 4D (2-ary 4-cube) NC論2 22 tree Binary tree fat tree NC論2 x tree 23 Hierarchical topology (1/2) Pyramid Hierarchical ring (Hierarchical 2D mesh) NC論2 24 Hierarchical topology (2/2) Cube-connected cycles RDT (Recursive Diagonal Torus) NC論2 25 Hypermesh (spaninng-bus hypercube) Single or multiple buses NC論2 26 Base-m n-cube (hyper-crossbar) 770 070 777 077 707 000 007 8x8 crossbar Base-8 3-cube (Toshiba Prodigy) NC論2 27 Diameter and degrees (1/2) 2D mesh #node N 2D torus N Diameter 2√N √N degree 4 4 NC論2 3D torus N binary n-cube N = 2n √N log N 6 log N 3 28 Diameter and degrees (2/2) Base-m n-cube #node N = mn N = n2n Diameter logm N degree CCC logm N 3n/2 3 NC論2 Binary tree N ring 2log N N/2 3 2 3 N 29
© Copyright 2024 ExpyDoc