SCAs against Embedded Crypto Devices

SCAs against Embedded Crypto Devices
F.-X. Standaert
UCL Crypto Group, Universit´
e catholique de Louvain
Lecture 1 - Hardware Implementations
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
1
Outline
I
I
I
I
Different types of computing devices
Two key concepts
Hardware performance indicators
Implementation tradeoffs
I
I
I
I
I
Technology scaling
Design tradeoffs
FPGAs
Application to block ciphers
Further readings
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
2
Different types of computing devices
I
General purpose computers (e.g. microprocessors)
I
I
I
Reconfigurable devices (e.g. FPGAs)
Application Specific Integrated Circuits (e.g. AES)
I
I
Software-programmed
Hard-coded
Tradeoff: flexibility vs. performance
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
3
Sequential logic
I
I
I
1 cycle: read in memory - operate - store in memory
Operation delay Top > than critical path Tph (in sec)
Operation frequency fop = 1/Top (in Hz)
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
4
Abstraction levels (for memory & operations)
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
5
Hardware performance indicators
I
I
I
I
I
Hardware cost (in gates, transistors or circuit size)
Operation frequency (in Hz)
Data throughput (in bit/sec)
Data latency (in clock cycles)
Power and energy (in Watts and Joules)
I
I
I
UCL
Crypto Group
Microelectronics Laboratory
Not equivalent, e.g.
Power matters for RFID devices
Energy matters for battery-supplied devices
SCAs against Embedded Crypto Devices - L1
6
Implementation tradeoffs
I
Tph ∝ LD ·
I
I
I
I
CL ·Vdd
,
Ion
with:
LD the operation logic depth (in gates)
CL the load capacitance (in Farad)
Vdd the circuit supply voltage
Ion the MOSFET drain current in ON state
“Tph decreases
with larger Vdd ”
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
7
Implementation tradeoffs (II)
I
Sources of power and energy consumption
I
Ptot = Pdyn + Pstat
I
I
I
I
2
Pdyn ∝ Ngates · CL · Vdd
· fop · α (1 + βsc )
α: activity factor / β: short circuits
Pstat ∝ Ileak · Vdd
with Ileak increasing with smaller Vdd
“Minimum energy best
trades Pdyn and Pstat ”
(here with Top = Tph )
⇒ ∃ frequency/energy tradeoff
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
8
Technology scaling
I
Pdyn dominates old technologies (down to 0.1µm)
Pstat becomes significant in nanoscale devices
I
Inter-device variability also increases with scaling !
I
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
9
Design tradeoffs
I
Resources sharing, e.g. with the AES ByteSub
I
Low cost design: 1 S-box, 16 cycles
Fast design: 16 S-boxes, 1 cycles
Low cost implies more control ⇒ less efficient
I
I
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
10
Design tradeoffs (II)
I
Inner pipelining, e.g. with the AES round
Ideally: fop × 2
(usually worse in practice)
Latency: 11 → 22 (cycles)
Throughput?
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
11
Design tradeoffs (II)
I
Inner pipelining, e.g. with the AES round
Ideally: fop × 2
(usually worse in practice)
Latency: 11 → 22 (cycles)
Throughput?
(128 bits/11 cycles) · fop
(256 bits/22 cycles) · fop
⇒ ideally ×2
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
11
Design tradeoffs (III)
I
Further improvements of the throughput (fop fixed)
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
12
Design tradeoffs (III)
I
Further improvements of the throughput (fop fixed)
Parallelism (left) less efficient than outer pipelining (right)
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
12
FPGAs
I
I
I
I
“Sea” of programmable logic blocks
Connected with programmable routing
Functionality determined by configuration bits
Different technologies
I
I
UCL
Crypto Group
Microelectronics Laboratory
0.18µm → 45 nm
Several manufacturers
SCAs against Embedded Crypto Devices - L1
13
FPGAs (II)
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
14
FPGAs (III)
I
Logic blocks
I
I
I
Routing blocks
I
I
I
From 3-input Look-Up Tables. . .
to 8-bit Arithmetic and Logic Units
The granularity of the device influences both the
design performances and configuration time
Structured according to the interconnect length
Major impact in final performances
Embedded blocks
I
UCL
Crypto Group
Microelectronics Laboratory
Memories, multipliers, . . .
SCAs against Embedded Crypto Devices - L1
15
(How to use)
I
FPGAs (IV)
Compared to ASICs: fabrication + packaging are
replaced by configuration (i.e. sending a programming
file to the chip to determine the “gates” functionality)
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
16
FPGAs (V)
I
Example: Xilinx logic block
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
17
Application to block ciphers
I
Target FPGA 1 has logic blocks (LB1) made of:
I
I
I
I
Target FPGA 2 has logic blocks (LB2) made of:
I
I
I
I
Two 4-input LUTs
One 1-bit MUX to combine the LUTs
Two registers
Four 6-input LUTs
Three 1-bit MUX to combine the LUTs
Four registers
Embedded memory, with each block (MB) made of:
I
I
I
UCL
Crypto Group
Microelectronics Laboratory
4096-bit RAM memories
Dual-ported (i.e. 2 R/W operations per cycle)
Configurable (4096 × 1, 2048 × 2, . . . )
SCAs against Embedded Crypto Devices - L1
18
S-box implementations
I
I
I
“Minimum memory” cost (in bits) of S1 /S2 ? { . . . }
Cost of S1 /S2 in LB1/LB2? { . . . }
Would you use the memory to implement S1 /S2 ?
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
19
Block cipher design
I
Consider an AES-like cipher with the following round:
I
To be implemented in FPGA 1 with S-box S2
With MixColumn in 256 LUTs and logic depth 2 LUTs
And the full cipher iterating 11 rounds
I
I
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
20
Block cipher design
I
I
What is the cost of one round in LUTs?
Design and evaluate the cost (in LUTs and regs) of:
I
I
I
I
What is the latency (in cycles) of these architectures?
Assume TLUT = 10 ns, what is the throughput
achieved by these architectures (in bit/sec)?
I
I
I
A 1-round loop architecture without pipeline
A 1-round loop architecture with maximum pipeline
Is this assumption realistic (physically speaking)?
“Ideally”, what would happen if we move to a 2-round
loop architecture, or a 32-bit loop architecture?
{ ...}
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
21
Examples
I
FPGA implementations of the AES Rijndael
Index
E,D?
Key Sched.
Feedback?
Device
Architecture
1.
2.
3.
4.
5.
E only
E only
E/D
E/D
E/D
on-the-fly
on-the-fly
precomputed
precomputed
precomputed
no
no
yes
yes
yes
Virtex-E
Virtex-E
Virtex-II
Spartan-II
Spartan-II
128-bit unrolled
128-bit loop
32-bit loop
8-bit loop
PicoBlaze
LUTs
3516
3846
288
-
Regs.
3840
2517
113
-
Index
1.
2.
3.
4.
5.
UCL
Crypto Group
Microelectronics Laboratory
Slices
2784
2257
146
124
119
RAMBs
100
0
3
2
2
Freq.
92 MHz
169 MHz
123 MHz
67 MHz
90 MHz
Throughput
11.7 Gbit/sec
2 Gbit/sec
358 Mbit/sec
2.2 Mbit/sec
710 Kbit/sec
SCAs against Embedded Crypto Devices - L1
22
Summarizing
I
I
I
I
Specialized hardware implementations (ASICs, FPGAs)
can be used to reach high performances
Many different metrics exist (cost, speed, . . . )
Hardware Design optimization (e.g. sharing,
pipelining) depends on algorithmic features
Technology scaling can have high impact too!
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
23
Further readings
I
I
I
I
International Technology Roadmap for
Semiconductors, http://www.itrs.net/
F. Rodriguez-Henriquez, F. Saqib, N.A. Diaz, C.K.
Koc, Cryptographic Algorithms on Reconfigurable
Hardware, Springer, 2007.
H. Kaeslin, Digital Integrated Circuit Design,
Cambridge University Press, 2008.
J.M. Rabaey, Digital Integrated Circuits: a Design
Perspective, second edition, Prentice Hall, 2003.
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
24
Thanks
UCL
Crypto Group
Microelectronics Laboratory
SCAs against Embedded Crypto Devices - L1
25