SCAs against Embedded Crypto Devices F.-X. Standaert UCL Crypto Group, Universit´ e catholique de Louvain Lecture 1 - Hardware Implementations UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 1 Outline I I I I Different types of computing devices Two key concepts Hardware performance indicators Implementation tradeoffs I I I I I Technology scaling Design tradeoffs FPGAs Application to block ciphers Further readings UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 2 Different types of computing devices I General purpose computers (e.g. microprocessors) I I I Reconfigurable devices (e.g. FPGAs) Application Specific Integrated Circuits (e.g. AES) I I Software-programmed Hard-coded Tradeoff: flexibility vs. performance UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 3 Sequential logic I I I 1 cycle: read in memory - operate - store in memory Operation delay Top > than critical path Tph (in sec) Operation frequency fop = 1/Top (in Hz) UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 4 Abstraction levels (for memory & operations) UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 5 Hardware performance indicators I I I I I Hardware cost (in gates, transistors or circuit size) Operation frequency (in Hz) Data throughput (in bit/sec) Data latency (in clock cycles) Power and energy (in Watts and Joules) I I I UCL Crypto Group Microelectronics Laboratory Not equivalent, e.g. Power matters for RFID devices Energy matters for battery-supplied devices SCAs against Embedded Crypto Devices - L1 6 Implementation tradeoffs I Tph ∝ LD · I I I I CL ·Vdd , Ion with: LD the operation logic depth (in gates) CL the load capacitance (in Farad) Vdd the circuit supply voltage Ion the MOSFET drain current in ON state “Tph decreases with larger Vdd ” UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 7 Implementation tradeoffs (II) I Sources of power and energy consumption I Ptot = Pdyn + Pstat I I I I 2 Pdyn ∝ Ngates · CL · Vdd · fop · α (1 + βsc ) α: activity factor / β: short circuits Pstat ∝ Ileak · Vdd with Ileak increasing with smaller Vdd “Minimum energy best trades Pdyn and Pstat ” (here with Top = Tph ) ⇒ ∃ frequency/energy tradeoff UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 8 Technology scaling I Pdyn dominates old technologies (down to 0.1µm) Pstat becomes significant in nanoscale devices I Inter-device variability also increases with scaling ! I UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 9 Design tradeoffs I Resources sharing, e.g. with the AES ByteSub I Low cost design: 1 S-box, 16 cycles Fast design: 16 S-boxes, 1 cycles Low cost implies more control ⇒ less efficient I I UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 10 Design tradeoffs (II) I Inner pipelining, e.g. with the AES round Ideally: fop × 2 (usually worse in practice) Latency: 11 → 22 (cycles) Throughput? UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 11 Design tradeoffs (II) I Inner pipelining, e.g. with the AES round Ideally: fop × 2 (usually worse in practice) Latency: 11 → 22 (cycles) Throughput? (128 bits/11 cycles) · fop (256 bits/22 cycles) · fop ⇒ ideally ×2 UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 11 Design tradeoffs (III) I Further improvements of the throughput (fop fixed) UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 12 Design tradeoffs (III) I Further improvements of the throughput (fop fixed) Parallelism (left) less efficient than outer pipelining (right) UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 12 FPGAs I I I I “Sea” of programmable logic blocks Connected with programmable routing Functionality determined by configuration bits Different technologies I I UCL Crypto Group Microelectronics Laboratory 0.18µm → 45 nm Several manufacturers SCAs against Embedded Crypto Devices - L1 13 FPGAs (II) UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 14 FPGAs (III) I Logic blocks I I I Routing blocks I I I From 3-input Look-Up Tables. . . to 8-bit Arithmetic and Logic Units The granularity of the device influences both the design performances and configuration time Structured according to the interconnect length Major impact in final performances Embedded blocks I UCL Crypto Group Microelectronics Laboratory Memories, multipliers, . . . SCAs against Embedded Crypto Devices - L1 15 (How to use) I FPGAs (IV) Compared to ASICs: fabrication + packaging are replaced by configuration (i.e. sending a programming file to the chip to determine the “gates” functionality) UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 16 FPGAs (V) I Example: Xilinx logic block UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 17 Application to block ciphers I Target FPGA 1 has logic blocks (LB1) made of: I I I I Target FPGA 2 has logic blocks (LB2) made of: I I I I Two 4-input LUTs One 1-bit MUX to combine the LUTs Two registers Four 6-input LUTs Three 1-bit MUX to combine the LUTs Four registers Embedded memory, with each block (MB) made of: I I I UCL Crypto Group Microelectronics Laboratory 4096-bit RAM memories Dual-ported (i.e. 2 R/W operations per cycle) Configurable (4096 × 1, 2048 × 2, . . . ) SCAs against Embedded Crypto Devices - L1 18 S-box implementations I I I “Minimum memory” cost (in bits) of S1 /S2 ? { . . . } Cost of S1 /S2 in LB1/LB2? { . . . } Would you use the memory to implement S1 /S2 ? UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 19 Block cipher design I Consider an AES-like cipher with the following round: I To be implemented in FPGA 1 with S-box S2 With MixColumn in 256 LUTs and logic depth 2 LUTs And the full cipher iterating 11 rounds I I UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 20 Block cipher design I I What is the cost of one round in LUTs? Design and evaluate the cost (in LUTs and regs) of: I I I I What is the latency (in cycles) of these architectures? Assume TLUT = 10 ns, what is the throughput achieved by these architectures (in bit/sec)? I I I A 1-round loop architecture without pipeline A 1-round loop architecture with maximum pipeline Is this assumption realistic (physically speaking)? “Ideally”, what would happen if we move to a 2-round loop architecture, or a 32-bit loop architecture? { ...} UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 21 Examples I FPGA implementations of the AES Rijndael Index E,D? Key Sched. Feedback? Device Architecture 1. 2. 3. 4. 5. E only E only E/D E/D E/D on-the-fly on-the-fly precomputed precomputed precomputed no no yes yes yes Virtex-E Virtex-E Virtex-II Spartan-II Spartan-II 128-bit unrolled 128-bit loop 32-bit loop 8-bit loop PicoBlaze LUTs 3516 3846 288 - Regs. 3840 2517 113 - Index 1. 2. 3. 4. 5. UCL Crypto Group Microelectronics Laboratory Slices 2784 2257 146 124 119 RAMBs 100 0 3 2 2 Freq. 92 MHz 169 MHz 123 MHz 67 MHz 90 MHz Throughput 11.7 Gbit/sec 2 Gbit/sec 358 Mbit/sec 2.2 Mbit/sec 710 Kbit/sec SCAs against Embedded Crypto Devices - L1 22 Summarizing I I I I Specialized hardware implementations (ASICs, FPGAs) can be used to reach high performances Many different metrics exist (cost, speed, . . . ) Hardware Design optimization (e.g. sharing, pipelining) depends on algorithmic features Technology scaling can have high impact too! UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 23 Further readings I I I I International Technology Roadmap for Semiconductors, http://www.itrs.net/ F. Rodriguez-Henriquez, F. Saqib, N.A. Diaz, C.K. Koc, Cryptographic Algorithms on Reconfigurable Hardware, Springer, 2007. H. Kaeslin, Digital Integrated Circuit Design, Cambridge University Press, 2008. J.M. Rabaey, Digital Integrated Circuits: a Design Perspective, second edition, Prentice Hall, 2003. UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 24 Thanks UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto Devices - L1 25
© Copyright 2024 ExpyDoc