Designing with Memory Prof. Stephen A. Edwards Columbia University Spring 2014 Using Memory Basic Memory Model Clock Address Address Data In Data Out Memory Data In A0 A1 A1 Read A0 D1 Write Write Clock Data Out D0 old D1 D1 Basic Memory Model Clock Address Address Data In A0 A1 Data Out Memory Data In A1 Write A1 D1 Write Write Clock Data Out D0 old D1 D1 Basic Memory Model Clock Address Address Data In A0 A1 A1 Data Out Memory Data In Read A1 D1 Write Write Clock Data Out D0 old D1 D1 Memory Is Fundamentally a Bottleneck Plenty of bits, but You can only see a small window each clock cycle Using memory = scheduling memory accesses Software hides this from you: sequential programs naturally schedule accesses You must schedule memory accesses in a hardware design Modeling Synchronous Memory in SystemVerilog module memory( input logic input logic input logic [3:0] input logic [7:0] output logic [7:0] Write enable clk , write , address , data_in , data_out); data_out 4-bit address 8-bit input bus 8-bit output bus logic [7:0] mem [15:0]; The memory array: 16 8-bit bytes always_ff @(posedge posedge clk) clk begin if (write) mem[address] <= data_in; data_in mem[address] data_out <= mem[address]; end endmodule Clocked Write to array when asked Always read (old) value from array M10K Blocks in the Cyclone V 10 kilobits (10240 bits) per block Dual ported: two addresses, write enable signals Data busses can be 1–20 bits wide Our Cyclone V 5CSXFC6 has 557 of these blocks (696 KB) Memory in Quartus: the Megafunction Wizard Memory: Single- or Dual-Ported Memory: Select Port Widths Memory: One or Two Clocks Memory: Output Ports Need Not Be Registered Memory: Wizard-Generated Verilog Module This generates the following SystemVerilog module: module memory ( input logic [12:0] input logic input logic [0:0] input logic output logic [0:0] // Port A: address_a, // 8192 1-bit words clock_a, data_a, wren_a, // Write enable q_a, // Port B: input logic [8:0] address_b, // 512 16-bit words input logic clock_b, input logic [15:0] data_b, input logic wren_b, // Write enable output logic [15:0] q_b); Instantiate like any module; Quartus treats specially Two Ways to Ask for Memory 1. Use the Megafunction Wizard + Warns you in advance about resource usage − Awkward to change 2. Let Quartus infer memory from your code + Better integrated with your code − Easy to inadvertantly ask for garbage The Perils of Memory Inference module twoport( input logic clk, input logic [8:0] aa, ab, input logic [19:0] da, db, input logic wa, wb, output logic [19:0] qa, qb); logic [19:0] mem [511:0]; always_ff @(posedge clk) begin if (wa) mem[aa] <= da; qa <= mem[aa]; if (wb) mem[ab] <= db; qb <= mem[ab]; end endmodule Failure: Exploded! Synthesized to an 854-page schematic with 10280 registers (no M10K blocks) Page 1 looked like this: The Perils of Memory Inference Failure module twoport2( input logic clk, input logic [8:0] aa, ab, input logic [19:0] da, db, input logic wa, wb, output logic [19:0] qa, qb); logic [19:0] mem [511:0]; always_ff @(posedge clk) begin if (wa) mem[aa] <= da; qa <= mem[aa]; end always_ff @(posedge clk) begin if (wb) mem[ab] <= db; qb <= mem[ab]; end endmodule Still didn’t work: RAM logic “mem” is uninferred due to unsupported read-during-write behavior The Perils of Memory Inference Finally! module twoport3( input logic clk, input logic [8:0] aa, ab, input logic [19:0] da, db, input logic wa, wb, output logic [19:0] qa, qb); logic [19:0] mem [511:0]; always_ff @(posedge clk) begin if (wa) begin mem[aa] <= da; qa <= da; end else qa <= mem[aa]; end Took this structure from a template: Edit→Insert Template→Verilog HDL→Full Designs→RAMs and ROMs→True Dual-Port RAM (single clock) clk DATAOUT[19..0] PORTBDATAOUT[0] Q qa[19..0] CLK PORTBDATAOUT[2] PORTBDATAOUT[3] CLK0 PORTBDATAOUT[4] DATAIN[19..0] PORTBDATAOUT[5] PORTBCLK0 PORTBDATAOUT[6] PORTBDATAIN[19..0] PORTBDATAOUT[7] qb[0]~reg[19..0] D da[19..0] ab[8..0] CLK PORTBRADDR[8..0] PORTBWADDR[8..0] wb aa[8..0] wa PORTBDATAOUT[8] PORTBDATAOUT[9] PORTBWE PORTBDATAOUT[10] RADDR[8..0] PORTBDATAOUT[11] WADDR[8..0] PORTBDATAOUT[12] WE PORTBDATAOUT[13] PORTBDATAOUT[14] PORTBDATAOUT[15] PORTBDATAOUT[16] PORTBDATAOUT[17] PORTBDATAOUT[18] PORTBDATAOUT[19] SYNC_RAM endmodule D PORTBDATAOUT[1] db[19..0] always_ff @(posedge clk) begin if (wb) begin mem[ab] <= db; qb <= db; end else qb <= mem[ab]; end qa[0]~reg[19..0] mem Q qb[19..0] The Perils of Memory Inference module twoport4( input logic clk, input logic [8:0] ra, wa, input logic write, input logic [19:0] d, output logic [19:0] q); logic [19:0] mem [511:0]; always_ff @(posedge clk) begin if (write) mem[wa] <= d; q <= mem[ra]; end endmodule Also works: separate read and write addresses q[0]~reg[19..0] D clk Q q[19..0] CLK mem CLK0 d[19..0] DATAIN[19..0] ra[8..0] RADDR[8..0] wa[8..0] WADDR[8..0] write DATAOUT[19..0] WE SYNC_RAM Conclusion: Inference is fine for single port or one read and one write port. Use the Megafunction Wizard for anything else. Implementing Memory Early Memories Williams Tube CRT-based random access memory, 1946. Used on the Manchester Mark I. 2048 bits. Early Memories Mercury acoustic delay line. Used in the EDASC, 1947. 32 × 17 bits Early Memories Magnetic core memory, 1952. IBM. Early Memories Magnetic drum memory. 1950s & 60s. Secondary storage. Modern Memory Choices Family Programmed Persistence Mask ROM at fabrication ∞ PROM once ∞ EPROM 1000s, UV 10 years FLASH 1000s, block 10 years EEPROM 1000s, byte 10 years NVRAM ∞ 5 years SRAM ∞ while powered DRAM ∞ 64 ms Implementing ROMs 0 0/1 Z: “not connected” Bitline 2 Bitline 1 Bitline 0 0 1 0 0 Wordline 0 0 1 1 1 1 0 1 A1 A0 1 2-to-4 Decoder Wordline 1 2 1 1 Add. Data 00 01 10 11 011 110 100 010 Wordline 2 1 0 0 0 1 0 3 Wordline 3 D2 D1 D0 Implementing ROMs 0 0/1 Z: “not connected” Bitline 2 0 1 0 0 1 A1 A0 1 1 0 2-to-4 Decoder 2 1 00 01 10 11 011 110 100 010 3 Bitline 0 Wordline 0 0 1 1 1 1 0 0 Wordline 1 1 1 Add. Data Bitline 1 0 Wordline 2 1 0 0 0 1 0 0 Wordline 3 1 D2 0 D1 0 D0 Implementing ROMs 0 0/1 Z: “not connected” 0 1 0 0 1 A1 A0 1 2-to-4 Decoder 2 1 1 Add. Data 00 01 10 11 011 110 100 010 3 D2 D1 D0 Implementing ROMs 0 0/1 Z: “not connected” 0 1 1 0 0 1 A1 A0 1 2-to-4 Decoder 2 1 0 1 Add. Data 00 01 10 11 1 011 110 100 010 3 1 1 D2 0 D1 0 D0 Mask ROM Die Photo A Floating Gate MOSFET Cross section of a NOR FLASH transistor. Kawai et al., ISSCC 2008 (Renesas) Floating Gate n-channel MOSFET SiO2 Control Gate Floating Gate Drain Source Channel Floating gate uncharged; Control gate at 0V: Off Floating Gate n-channel MOSFET SiO2 Control Gate +++++++++ − − − − − − −− Floating Gate +++++++++ − − − − − − −− Drain Source Channel Floating gate uncharged; Control gate positive: On Floating Gate n-channel MOSFET SiO2 Control Gate ++++ − − −− Floating Gate − − −− ++++ Drain Source Channel Floating gate negative; Control gate at 0V: Off Floating Gate n-channel MOSFET SiO2 Control Gate ++++++++ −−−−−−− Floating Gate −− ++ Drain Source Channel Floating gate negative; Control gate positive: Off EPROMs and FLASH use Floating-Gate MOSFETs Static Random-Access Memory Cell Bit line Bit line Word line Layout of a 6T SRAM Cell ! $% $%&$ ! '() "# Weste and Harris. Introduction to CMOS VLSI Design. Addison-Wesley, 2010. Intel’s 2102 SRAM, 1024 × 1 bit, 1972 2102 Block Diagram SRAM Timing A12 A11 . . . 6264 A2 A1 8K × 8 A0 SRAM CS1 CS2 WE OE D7 D6 . . . D1 D0 CS1 CS2 WE OE Addr 1 Data write 1 2 read 2 6264 SRAM Block Diagram I/O0 INPUT BUFFER I/O1 A1 A2 A3 A4 A5 A6 A7 A8 I/O2 I/O3 256 x 32 x 8 ARRAY I/O4 I/O5 I/O6 CE1 CE2 WE COLUMN DECODER POWER DOWN I/O7 OE CY6264-1 Toshiba TC55V16256J 256K × 16 A17 A16 D15 . . .A2 D14 . A1 . 256K × 16 . A0 SRAM D1 UB D0 LB WE OE CE Dynamic RAM Cell Column Row Ancient (c. 1982) DRAM: 4164 64K × 1 A7 A6 . . .A2 A1 A0 Din WE CAS RAS 4164 64K × 1 DRAM Dout Basic DRAM read and write cycles RAS CAS Addr Row Row Col Col WE Din Dout to write read Page Mode DRAM read cycle RAS CAS Addr Row Col Col Col WE Din Dout read read read Samsung 8M × 16 SDRAM I/O Control Data Input Register LWE LDQM Bank Select 8M x 4 / 4M x 8 / 2M x 16 8M x 4 / 4M x 8 / 2M x 16 Output Buffer 8M x 4 / 4M x 8 / 2M x 16 Sense AMP ADD Column Decoder Col. Buffer LCBR LRAS DQ1 DQ0 8M x 4 / 4M x 8 / 2M x 16 Row Decoder CLK Row Buffer UDQM LDQM WE CAS RAS CS CKE CLK Refresh Counter DQ15 DQ14 . 8M × 16 . . SDRAM Address Register BA1 BA0 A11 A10 . . . A2 A1 A0 Latency & Burst Length LCKE Programming Register LRAS LCBR LCAS LWE LWCBR LDQM Timing Register CLK CKE CS RAS CAS WE L(U)DQM DQi SDRAM: Control Signals RAS CAS WE 1 0 0 1 1 1 0 0 1 0 1 0 0 1 1 0 1 0 1 1 0 0 0 1 Action NOP Load mode register Active (select row) Read (select column, start burst) Write (select column, start burst) Terminate Burst Precharge (deselect row) Auto Refresh Mode register: selects 1/2/4/8-word bursts, CAS latency, burst on write SDRAM: Timing with 2-word bursts Load Active Write Read Op R C C B B B Refresh Clk RAS CAS WE Addr BA DQ W W R R
© Copyright 2024 ExpyDoc