An Operand Routing Network for an SFQ Reconfigurable DataPaths Processor I. Kataeva1, H. Akaike1, A. Fujimaki1, N. Yoshikawa2, N. Takagi3, K. Murakami4 1Dept. of Quantum Engineering, Nagoya University 2Dept. of Electrical and Computer Engineering, Yokohama National University 3Dept. of Information Engineering, Nagoya University 4Research Institute for Information Technology, Kyushu University Outline Architectures of the Operand Routing Network NDRO-based ORN Crossbar-based ORN A crossbar switch with a multicasting function Experimental results Crossbar switches A 1-to-2 ORN prototype Conclusion Reconfigurable Data-path processor an architecture suitable for SFQ implementation two-dimensional array of Floating Point Units ORN ... FPU connected using Operand Routing network ORN : : : : ... ORN ... ORN SB : : : : LM SMAC ... : ORN is reconfigured to fit a DFG Requirements for the Operand Routing Network FPU FPU FPU connections are established between the FPUs in the immediate vicinity of each other FPU FPU FPU FPU FPU FPU FPU each FPU can be connected to one or more FPUs in the next row number of the connections, N, is odd one FPU’s output can be connected to either or both inputs of an FPU in the next stage NDRO-based Operand Routing Network FPU NDRO NDRO FPU FPU FPU NDRO FPU FPU FPU FPU FPU NDRO FPU FPU FPU FPU NDRO FPU FPU FPU FPU 2×M×N NDRO cells small number of Josephson junctions FPU FPU FPU “+”: NDRO “–”: irregular non-pipelined structure with the increase of the complexity becomes cumbersome Crossbar-based Operand Routing Network CB CB FPU CB FPU CB FPU CB FPU CB FPU FPU FPU FPU ½CB CB FPU FPU FPU ½CB CB FPU FPU FPU ½CB CB FPU FPU FPU ½CB CB FPU FPU FPU ½CB CB “+”: scalable pipelined easily re-designed for any number of N and M “–”: large number of Josephson junctions M - ½ CB and (2×M+1)(N-1)/2 - CB Comparison of the ORN architectures NDRO-based ORN ORN complexity latency, ps minimum interval number of control lines bias current, A power, mW number of JJ (including control block) N=2, M=4 200 nf+200ps 32 0.2 0.5 1710 N=3, M=8 288 nf+288ps 96 0.7 1.75 6840 N=5, M=10 200 N=9, M=32 1152 Number of JJs of NDRO-based ORN in a table is an estimation based on a design of the switch for RDP prototype (N=3, M=4) that consisted of 3420 JJs (Iwasaki, not published yet) Crossbar-based ORN ORN complexity latency, clocks minimum interval number of control lines bias current, A power, mW number of JJ number of JJ (including control block) N=2, M=4 6 nf 48 0.3 0.75 3000 4020 N=3, M=8 6 nf 100 0.63 1.575 6230 7684 N=5, M=10 10 nf 208 1.41 3.525 13930 14954 N=9, M=32 18 nf 1168 8.28 20.7 77440 99088 A crossbar switch with broadcasting function: 296 JJs Crossbar switch with a multicasting function, ver.1 00 00 AND din0 cross0 bar0 DFF DFF DFF NOT 00 AND NDRO DFF 01 AND 01 AND 10 AND bar1 din1 NOT NDRO DFF DFF DFF 11 AND DFF 00 AND 01 AND 10 AND 11 AND 11 788 JJs bias current – 87 mA clkout 10 AND 10 area – 1.28mm×1mm 11 AND clkin cross1 dout0 01 clock frequency – 27 GHz 4 pipeline stages dout1 Crossbar switch with a multicasting function, ver.2 din0 cross0 bar0 DFF dout0 NDROC 02 AND bar1 NDROC 02 din1 DFF AND dout1 01 DFF AND NDRO bar1 dout0 clkout clkin cross1 AND NDRO dout1 AND 296 JJs 30 mA 62% reduction of the number of JJs 2 pipeline stages instead of 4 00 bar0 cross0 clkout clkin cross1 din0 AND 10 11 141 JJs 14 mA 70% reduction of the number of JJs 2 pipeline stages instead of 4 00 01 10 11 Experimental results data_out ladder clkin_hf clkin_lfout circuit under test clkin_lfin Circuits tested: crossbar switch with a multicasting function, ver.1 ½ crossbar switch with a multicasting function , ver.1 data_in 1-to-2 ORN prototype Cross-bar switch with a multicasting function, ver.1 00 01 788 JJs area – 1.28mm×1mm bias current – 87 mA clock frequency – 27 GHz clkout1 dout1 dout0 clkout Total: 1484 JJs bias current – 172 mA clkin_lfout clkin_hf clkin_lfin bar1 cross0 bar0 din1 din0 10 11 clkout1 dout1 dout0 clkout clkin_lfout clkin_lfin cross1 bar1 bar0 din1 din0 Pattern Bias current Lower margin, % Upper margin, % 00 bias1 -13.450 4.213 bias2 -4.702 15.014 bias1 -13.450 7.745 bias2 -7.989 5.156 bias1 -15.217 7.745 bias2 -4.702 5.156 bias1 -13.450 4.213 bias2 -4.702 11.728 01 10 11 ½ cross-bar switch with a multicasting function, ver.1 00 01 455 JJs area – 1mm×0.52mm bias current – 50 mA clock frequency – 27 GHz clkout1 dout1 dout0 clkout Total: 1066 JJs bias current – 127 mA clkin_lfout clkin_lfin bar1 cross0 bar0 din0 10 clkout1 dout1 dout0 clkout clkin_lfout clkin_lfin cross1 bar0 din0 Pattern Bias current lower margin, % Bias current upper margin, % 00 2.402 15.512 01 2.7 17.797 10 1.011 18.293 Tested at the frequencies: 22÷36 GHz 1-to-2 ORN: low frequency test dout12 completely functional, exhaustive test dout02 dout11 dout01 bias_kern0 = -14.6/5.3 % does not depend on the pattern clkout2 clkout1 clkout bar02 bar12 cross11 cross01 cross10 bar00 clkin_lfin din0 bias_kern1 = -16.1/18.3 % for din0 -> dout11, dout12 bias_kern2 = -20.7/12.6 % for din0 -> dout11, dout12 open466, no. 4 chip F2 example: din0 -> dout01, dout02, dout12 control pattern: CBT0 “10”, CBT1 “01”, CBT2 “10” minimum! bias_kern1 = -40.3/17.2% for din1 -> dout01 bias_kern2 = -38/12.6% for din2 -> dout02, dout12 maximum! Total 2031 JJs Total bias current 224.4 mA bias_kern0 = -14.6/5.3 % bias_kern1 = -23/17.2% bias_kern2 = -23/12.6% 1-to-2 ORN: high frequency test and bias margins frequency dependence bias_kern1 margins for din0 -> dout11 routing 20.000 dout12 dout02 dout11 dout01 clkout2 clkout1 clkout 15.000 10.000 5.000 0.000 10.842 12.679 14.324 15.858 17.241 18.818 20.345 21.854 23.480 -5.000 upper margin lower margin -10.000 -15.000 -20.000 -25.000 -30.000 bar11 bar01 bar10 bar00 clkin_lfout1 clkin_hf clkin_lfin din0 bias_kern1 margins for din0 -> dout01 routing 20.000 15.000 10.000 5.000 0.000 10.842 12.679 14.324 15.858 17.241 18.818 20.345 -5.000 -15.000 -20.000 -25.000 -30.000 open466, no. 4 chip F2 example: din0 -> dout11 control pattern: CBT0 “00”, CBT1 “00” bias_kern2 margins for din0 -> dout02, dout12 routing 15.000 10.000 5.000 0.000 10.842 bias_kern0 margins were frequency independent measured at the frequencies up to 23.5 GHz -5.000 -10.000 -15.000 -20.000 -25.000 -30.000 12.679 14.324 21.854 upper margin lower margin -10.000 15.858 17.241 18.818 20.345 upper margin lower margin Conclusion we have proposed two different architectures of an ORN: NDRO-based crossbar-based complexity comparison has been done and crossbar-based ORN is considered a better option due to its scalability and pipelined structure a new version of the crossbar switch has been designed two versions of a crossbar with a multicasting function have been designed for 2.5 kA/cm2 process and successfully tested at the frequencies up to 28 GHz a 1-to-2 ORN has been designed and successfully tested at the frequencies up to 23.5 GHz
© Copyright 2024 ExpyDoc