CREST 研究提案 「 単一

An Operand Routing Network for
an SFQ Reconfigurable DataPaths Processor
I. Kataeva1, H. Akaike1, A. Fujimaki1,
N. Yoshikawa2, N. Takagi3, K. Murakami4
1Dept.
of Quantum Engineering, Nagoya University
2Dept. of Electrical and Computer Engineering, Yokohama National
University
3Dept. of Information Engineering, Nagoya University
4Research Institute for Information Technology, Kyushu University
Outline
 Architectures of the Operand Routing Network
 NDRO-based ORN
 Crossbar-based ORN
 A crossbar switch with a multicasting function
 Experimental results
 Crossbar switches
 A 1-to-2 ORN prototype
 Conclusion
Reconfigurable Data-path processor
an architecture suitable for SFQ implementation
 two-dimensional array of Floating Point Units
ORN
...
FPU
 connected using Operand Routing network
ORN
:
:
:
:
...
ORN
...
ORN
SB
:
:
:
:
LM
SMAC
...
:
 ORN is reconfigured to fit a DFG
Requirements for the Operand Routing
Network
FPU
FPU
FPU
 connections are established between the FPUs
in the immediate vicinity of each other
FPU
FPU
FPU
FPU
FPU
FPU
FPU
 each FPU can be connected to one or more
FPUs in the next row
 number of the connections, N, is odd
 one FPU’s output can be connected to either or
both inputs of an FPU in the next stage
NDRO-based Operand Routing Network
FPU
NDRO
NDRO
FPU
FPU
FPU
NDRO
FPU
FPU
FPU
FPU
FPU
NDRO
FPU
FPU
FPU
FPU
NDRO
FPU
FPU
FPU
FPU
 2×M×N NDRO cells
 small number of Josephson junctions
FPU
FPU
FPU
 “+”:
NDRO
 “–”:
 irregular non-pipelined structure
 with the increase of the complexity becomes
cumbersome
Crossbar-based Operand Routing Network
CB
CB
FPU
CB
FPU
CB
FPU
CB
FPU
CB
FPU
FPU
FPU
FPU
½CB
CB
FPU
FPU
FPU
½CB
CB
FPU
FPU
FPU
½CB
CB
FPU
FPU
FPU
½CB
CB
FPU
FPU
FPU
½CB
CB
 “+”:
 scalable
 pipelined
 easily re-designed for any number of N and M
 “–”:
 large number of Josephson junctions
 M - ½ CB and (2×M+1)(N-1)/2 - CB
Comparison of the ORN architectures
NDRO-based ORN
ORN
complexity
latency, ps
minimum
interval
number of
control lines
bias
current, A
power,
mW
number of JJ
(including control
block)
N=2, M=4
200
nf+200ps
32
0.2
0.5
1710
N=3, M=8
288
nf+288ps
96
0.7
1.75
6840
N=5, M=10
200
N=9, M=32
1152
Number of JJs of NDRO-based ORN in a table is an estimation based
on a design of the switch for RDP prototype (N=3, M=4) that
consisted of 3420 JJs (Iwasaki, not published yet)
Crossbar-based ORN
ORN
complexity
latency,
clocks
minimum
interval
number of
control lines
bias
current, A
power,
mW
number
of JJ
number of JJ
(including control
block)
N=2, M=4
6
nf
48
0.3
0.75
3000
4020
N=3, M=8
6
nf
100
0.63
1.575
6230
7684
N=5, M=10
10
nf
208
1.41
3.525
13930
14954
N=9, M=32
18
nf
1168
8.28
20.7
77440
99088
A crossbar switch with broadcasting function: 296 JJs
Crossbar switch with a multicasting
function, ver.1
00
00
AND
din0
cross0
bar0
DFF
DFF
DFF
NOT
00
AND
NDRO
DFF
01
AND
01
AND
10
AND
bar1
din1
NOT
NDRO
DFF
DFF
DFF
11
AND
DFF
00
AND
01
AND
10
AND
11
AND
11
 788 JJs
 bias current – 87 mA
clkout
10
AND
10
 area – 1.28mm×1mm
11
AND
clkin
cross1
dout0
01
 clock frequency – 27 GHz
 4 pipeline stages
dout1
Crossbar switch with a multicasting
function, ver.2
din0
cross0
bar0
DFF
dout0
NDROC
02
AND
bar1
NDROC
02
din1
DFF
AND
dout1
01
DFF
AND
NDRO
bar1
dout0
clkout
clkin
cross1
AND
NDRO
dout1
AND
 296 JJs
 30 mA
 62% reduction of the number of JJs
 2 pipeline stages instead of 4
00
bar0
cross0
clkout
clkin
cross1
din0
AND
10
11
 141 JJs
 14 mA
 70% reduction of the number of JJs
 2 pipeline stages instead of 4
00
01
10
11
Experimental results
data_out
ladder
clkin_hf
clkin_lfout
circuit
under
test
clkin_lfin
Circuits tested:
 crossbar switch with a multicasting function, ver.1
 ½ crossbar switch with a multicasting function , ver.1
data_in
 1-to-2 ORN prototype
Cross-bar switch with a multicasting
function, ver.1
00
01
 788 JJs
 area – 1.28mm×1mm
 bias current – 87 mA
 clock frequency – 27 GHz
clkout1
dout1
dout0
clkout
Total:
 1484 JJs
 bias current – 172 mA
clkin_lfout
clkin_hf
clkin_lfin
bar1
cross0
bar0
din1
din0
10
11
clkout1
dout1
dout0
clkout
clkin_lfout
clkin_lfin
cross1
bar1
bar0
din1
din0
Pattern
Bias
current
Lower
margin, %
Upper
margin, %
00
bias1
-13.450
4.213
bias2
-4.702
15.014
bias1
-13.450
7.745
bias2
-7.989
5.156
bias1
-15.217
7.745
bias2
-4.702
5.156
bias1
-13.450
4.213
bias2
-4.702
11.728
01
10
11
½ cross-bar switch with a multicasting
function, ver.1
00
01
 455 JJs
 area – 1mm×0.52mm
 bias current – 50 mA
 clock frequency – 27 GHz
clkout1
dout1
dout0
clkout
Total:
 1066 JJs
 bias current – 127 mA
clkin_lfout
clkin_lfin
bar1
cross0
bar0
din0
10
clkout1
dout1
dout0
clkout
clkin_lfout
clkin_lfin
cross1
bar0
din0
Pattern
Bias current
lower margin, %
Bias current
upper margin, %
00
2.402
15.512
01
2.7
17.797
10
1.011
18.293
Tested at the frequencies: 22÷36 GHz
1-to-2 ORN: low frequency test
dout12
 completely functional, exhaustive test
dout02
dout11
dout01
 bias_kern0 = -14.6/5.3 % does not depend on the
pattern
clkout2
clkout1
clkout
bar02
bar12
cross11
cross01
cross10
bar00
clkin_lfin
din0
 bias_kern1 = -16.1/18.3 % for din0 -> dout11, dout12
 bias_kern2 = -20.7/12.6 % for din0 -> dout11, dout12
open466, no. 4 chip F2 example:
din0 -> dout01, dout02, dout12
control pattern: CBT0 “10”, CBT1 “01”, CBT2 “10”
minimum!
 bias_kern1 = -40.3/17.2% for din1 -> dout01
 bias_kern2 = -38/12.6% for din2 -> dout02, dout12
maximum!
 Total 2031 JJs
 Total bias current 224.4 mA
 bias_kern0 = -14.6/5.3 %
 bias_kern1 = -23/17.2%
 bias_kern2 = -23/12.6%
1-to-2 ORN: high frequency test and bias
margins frequency dependence
bias_kern1 margins for din0 -> dout11 routing
20.000
dout12
dout02
dout11
dout01
clkout2
clkout1
clkout
15.000
10.000
5.000
0.000
10.842
12.679
14.324
15.858
17.241
18.818
20.345
21.854
23.480
-5.000
upper margin
lower margin
-10.000
-15.000
-20.000
-25.000
-30.000
bar11
bar01
bar10
bar00
clkin_lfout1
clkin_hf
clkin_lfin
din0
bias_kern1 margins for din0 -> dout01 routing
20.000
15.000
10.000
5.000
0.000
10.842
12.679
14.324
15.858
17.241
18.818
20.345
-5.000
-15.000
-20.000
-25.000
-30.000
open466, no. 4 chip F2 example:
din0 -> dout11
control pattern: CBT0 “00”, CBT1 “00”
bias_kern2 margins for din0 -> dout02, dout12 routing
15.000
10.000
5.000
0.000
10.842
 bias_kern0 margins were frequency independent
 measured at the frequencies up to 23.5 GHz
-5.000
-10.000
-15.000
-20.000
-25.000
-30.000
12.679
14.324
21.854
upper margin
lower margin
-10.000
15.858
17.241
18.818
20.345
upper margin
lower margin
Conclusion
 we have proposed two different architectures of an ORN:
 NDRO-based
 crossbar-based
 complexity comparison has been done and crossbar-based ORN is considered a
better option due to its scalability and pipelined structure
 a new version of the crossbar switch has been designed
 two versions of a crossbar with a multicasting function have been designed for
2.5 kA/cm2 process and successfully tested at the frequencies up to 28 GHz
 a 1-to-2 ORN has been designed and successfully tested at the frequencies up
to 23.5 GHz