Designing with Memory

Designing with Memory
Prof. Stephen A. Edwards
Columbia University
Spring 2014
Using Memory
Basic Memory Model
Clock
Address
Address
Data In
Data Out
Memory
Data In
A0
A1
A1
Read A0
D1
Write
Write
Clock
Data Out
D0
old D1
D1
Basic Memory Model
Clock
Address
Address
Data In
A0
A1
Data Out
Memory
Data In
A1
Write A1
D1
Write
Write
Clock
Data Out
D0
old D1
D1
Basic Memory Model
Clock
Address
Address
Data In
A0
A1
A1
Data Out
Memory
Data In
Read A1
D1
Write
Write
Clock
Data Out
D0
old D1
D1
Memory Is Fundamentally a Bottleneck
Plenty of bits, but
You can only see a small window each
clock cycle
Using memory = scheduling memory
accesses
Software hides this from you: sequential
programs naturally schedule accesses
You must schedule memory accesses in a
hardware design
Modeling Synchronous Memory in SystemVerilog
module memory(
input logic
input logic
input logic [3:0]
input logic [7:0]
output logic [7:0]
Write enable
clk
,
write
,
address ,
data_in ,
data_out);
data_out
4-bit address
8-bit input bus
8-bit output bus
logic [7:0] mem [15:0];
The memory array: 16 8-bit bytes
always_ff @(posedge
posedge clk)
clk
begin
if (write)
mem[address] <= data_in;
data_in
mem[address]
data_out <= mem[address];
end
endmodule
Clocked
Write to array when asked
Always read (old) value from array
M10K Blocks in the Cyclone V
10 kilobits (10240 bits) per block
Dual ported: two addresses, write enable signals
Data busses can be 1–20 bits wide
Our Cyclone V 5CSXFC6 has 557 of these blocks (696 KB)
Memory in Quartus: the Megafunction Wizard
Memory: Single- or Dual-Ported
Memory: Select Port Widths
Memory: One or Two Clocks
Memory: Output Ports Need Not Be Registered
Memory: Wizard-Generated Verilog Module
This generates the following SystemVerilog module:
module memory (
input logic [12:0]
input logic
input logic [0:0]
input logic
output logic [0:0]
// Port A:
address_a, // 8192 1-bit words
clock_a,
data_a,
wren_a,
// Write enable
q_a,
// Port B:
input logic [8:0]
address_b, // 512 16-bit words
input logic
clock_b,
input logic [15:0] data_b,
input logic
wren_b,
// Write enable
output logic [15:0] q_b);
Instantiate like any module; Quartus treats specially
Two Ways to Ask for Memory
1. Use the Megafunction Wizard
+ Warns you in advance about resource usage
− Awkward to change
2. Let Quartus infer memory from your code
+ Better integrated with your code
− Easy to inadvertantly ask for garbage
The Perils of Memory Inference
module twoport(
input logic clk,
input logic [8:0] aa, ab,
input logic [19:0] da, db,
input logic wa, wb,
output logic [19:0] qa, qb);
logic [19:0] mem [511:0];
always_ff @(posedge clk) begin
if (wa) mem[aa] <= da;
qa <= mem[aa];
if (wb) mem[ab] <= db;
qb <= mem[ab];
end
endmodule
Failure: Exploded!
Synthesized to an 854-page
schematic with 10280
registers (no M10K blocks)
Page 1 looked like this:
The Perils of Memory Inference
Failure
module twoport2(
input logic clk,
input logic [8:0] aa, ab,
input logic [19:0] da, db,
input logic wa, wb,
output logic [19:0] qa, qb);
logic [19:0] mem [511:0];
always_ff @(posedge clk) begin
if (wa) mem[aa] <= da;
qa <= mem[aa];
end
always_ff @(posedge clk) begin
if (wb) mem[ab] <= db;
qb <= mem[ab];
end
endmodule
Still didn’t work:
RAM logic “mem” is
uninferred due to
unsupported
read-during-write behavior
The Perils of Memory Inference
Finally!
module twoport3(
input logic clk,
input logic [8:0] aa, ab,
input logic [19:0] da, db,
input logic wa, wb,
output logic [19:0] qa, qb);
logic [19:0] mem [511:0];
always_ff @(posedge clk) begin
if (wa) begin
mem[aa] <= da;
qa <= da;
end else qa <= mem[aa];
end
Took this structure from a
template: Edit→Insert
Template→Verilog HDL→Full
Designs→RAMs and
ROMs→True Dual-Port RAM
(single clock)
clk
DATAOUT[19..0]
PORTBDATAOUT[0]
Q
qa[19..0]
CLK
PORTBDATAOUT[2]
PORTBDATAOUT[3]
CLK0
PORTBDATAOUT[4]
DATAIN[19..0]
PORTBDATAOUT[5]
PORTBCLK0
PORTBDATAOUT[6]
PORTBDATAIN[19..0]
PORTBDATAOUT[7]
qb[0]~reg[19..0]
D
da[19..0]
ab[8..0]
CLK
PORTBRADDR[8..0]
PORTBWADDR[8..0]
wb
aa[8..0]
wa
PORTBDATAOUT[8]
PORTBDATAOUT[9]
PORTBWE
PORTBDATAOUT[10]
RADDR[8..0]
PORTBDATAOUT[11]
WADDR[8..0]
PORTBDATAOUT[12]
WE
PORTBDATAOUT[13]
PORTBDATAOUT[14]
PORTBDATAOUT[15]
PORTBDATAOUT[16]
PORTBDATAOUT[17]
PORTBDATAOUT[18]
PORTBDATAOUT[19]
SYNC_RAM
endmodule
D
PORTBDATAOUT[1]
db[19..0]
always_ff @(posedge clk) begin
if (wb) begin
mem[ab] <= db;
qb <= db;
end else qb <= mem[ab];
end
qa[0]~reg[19..0]
mem
Q
qb[19..0]
The Perils of Memory Inference
module twoport4(
input logic clk,
input logic [8:0] ra, wa,
input logic write,
input logic [19:0] d,
output logic [19:0] q);
logic [19:0] mem [511:0];
always_ff @(posedge clk) begin
if (write) mem[wa] <= d;
q <= mem[ra];
end
endmodule
Also works: separate read
and write addresses
q[0]~reg[19..0]
D
clk
Q
q[19..0]
CLK
mem
CLK0
d[19..0]
DATAIN[19..0]
ra[8..0]
RADDR[8..0]
wa[8..0]
WADDR[8..0]
write
DATAOUT[19..0]
WE
SYNC_RAM
Conclusion:
Inference is fine for single
port or one read and one
write port.
Use the Megafunction Wizard
for anything else.
Implementing Memory
Early Memories
Williams Tube CRT-based random access memory, 1946.
Used on the Manchester Mark I. 2048 bits.
Early Memories
Mercury acoustic delay
line.
Used in the EDASC,
1947.
32 × 17 bits
Early Memories
Magnetic core memory, 1952. IBM.
Early Memories
Magnetic drum memory. 1950s & 60s. Secondary storage.
Modern Memory Choices
Family
Programmed Persistence
Mask ROM at fabrication ∞
PROM
once
∞
EPROM
1000s, UV
10 years
FLASH
1000s, block
10 years
EEPROM
1000s, byte
10 years
NVRAM
∞
5 years
SRAM
∞
while powered
DRAM
∞
64 ms
Implementing ROMs
0
0/1
Z: “not
connected”
Bitline 2
Bitline 1
Bitline 0
0
1
0
0
Wordline 0
0
1
1
1
1
0
1
A1
A0
1
2-to-4
Decoder
Wordline 1
2
1
1
Add. Data
00
01
10
11
011
110
100
010
Wordline 2
1
0
0
0
1
0
3
Wordline 3
D2
D1
D0
Implementing ROMs
0
0/1
Z: “not
connected”
Bitline 2
0
1
0
0
1
A1
A0
1
1
0
2-to-4
Decoder
2
1
00
01
10
11
011
110
100
010
3
Bitline 0
Wordline 0
0
1
1
1
1
0
0
Wordline 1
1
1
Add. Data
Bitline 1
0
Wordline 2
1
0
0
0
1
0
0
Wordline 3
1
D2
0
D1
0
D0
Implementing ROMs
0
0/1
Z: “not
connected”
0
1
0
0
1
A1
A0
1
2-to-4
Decoder
2
1
1
Add. Data
00
01
10
11
011
110
100
010
3
D2
D1
D0
Implementing ROMs
0
0/1
Z: “not
connected”
0
1
1
0
0
1
A1
A0
1
2-to-4
Decoder
2
1
0
1
Add. Data
00
01
10
11
1
011
110
100
010
3
1
1
D2
0
D1
0
D0
Mask ROM Die Photo
A Floating Gate MOSFET
Cross section of a NOR FLASH transistor. Kawai et al., ISSCC 2008 (Renesas)
Floating Gate n-channel MOSFET
SiO2
Control Gate
Floating Gate
Drain
Source
Channel
Floating gate uncharged; Control gate at 0V: Off
Floating Gate n-channel MOSFET
SiO2
Control Gate
+++++++++
− − − − − − −−
Floating Gate
+++++++++
− − − − − − −−
Drain
Source
Channel
Floating gate uncharged; Control gate positive: On
Floating Gate n-channel MOSFET
SiO2
Control Gate
++++
− − −−
Floating Gate
− − −−
++++
Drain
Source
Channel
Floating gate negative; Control gate at 0V: Off
Floating Gate n-channel MOSFET
SiO2
Control Gate
++++++++
−−−−−−−
Floating Gate
−−
++
Drain
Source
Channel
Floating gate negative; Control gate positive: Off
EPROMs and FLASH use Floating-Gate MOSFETs
Static Random-Access Memory Cell
Bit line
Bit line
Word line
Layout of a 6T SRAM Cell
!
$% $%&$
!
'()
"#
Weste and Harris. Introduction to CMOS VLSI Design. Addison-Wesley,
2010.
Intel’s 2102 SRAM, 1024 × 1 bit, 1972
2102 Block Diagram
SRAM Timing
A12
A11
.
.
.
6264
A2
A1 8K × 8
A0 SRAM
CS1
CS2
WE
OE
D7
D6
.
.
.
D1
D0
CS1
CS2
WE
OE
Addr
1
Data
write 1
2
read 2
6264 SRAM Block Diagram
I/O0
INPUT BUFFER
I/O1
A1
A2
A3
A4
A5
A6
A7
A8
I/O2
I/O3
256 x 32 x 8
ARRAY
I/O4
I/O5
I/O6
CE1
CE2
WE
COLUMN DECODER
POWER
DOWN
I/O7
OE
CY6264-1
Toshiba TC55V16256J 256K × 16
A17
A16
D15
.
.
.A2
D14
.
A1
.
256K × 16
.
A0
SRAM
D1
UB
D0
LB
WE
OE
CE
Dynamic RAM Cell
Column
Row
Ancient (c. 1982) DRAM: 4164 64K × 1
A7
A6
.
.
.A2
A1
A0
Din
WE
CAS
RAS
4164
64K × 1
DRAM
Dout
Basic DRAM read and write cycles
RAS
CAS
Addr
Row
Row
Col
Col
WE
Din
Dout
to write
read
Page Mode DRAM read cycle
RAS
CAS
Addr
Row
Col
Col
Col
WE
Din
Dout
read
read
read
Samsung 8M × 16 SDRAM
I/O Control
Data Input Register
LWE
LDQM
Bank Select
8M x 4 / 4M x 8 / 2M x 16
8M x 4 / 4M x 8 / 2M x 16
Output Buffer
8M x 4 / 4M x 8 / 2M x 16
Sense AMP
ADD
Column Decoder
Col. Buffer
LCBR
LRAS
DQ1
DQ0
8M x 4 / 4M x 8 / 2M x 16
Row Decoder
CLK
Row Buffer
UDQM
LDQM
WE
CAS
RAS
CS
CKE
CLK
Refresh Counter
DQ15
DQ14
.
8M × 16
.
.
SDRAM
Address Register
BA1
BA0
A11
A10
.
.
.
A2
A1
A0
Latency & Burst Length
LCKE
Programming Register
LRAS
LCBR
LCAS
LWE
LWCBR
LDQM
Timing Register
CLK
CKE
CS
RAS
CAS
WE
L(U)DQM
DQi
SDRAM: Control Signals
RAS
CAS
WE
1
0
0
1
1
1
0
0
1
0
1
0
0
1
1
0
1
0
1
1
0
0
0
1
Action
NOP
Load mode register
Active (select row)
Read (select column, start burst)
Write (select column, start burst)
Terminate Burst
Precharge (deselect row)
Auto Refresh
Mode register: selects 1/2/4/8-word bursts, CAS latency,
burst on write
SDRAM: Timing with 2-word bursts
Load
Active
Write
Read
Op
R
C
C
B
B
B
Refresh
Clk
RAS
CAS
WE
Addr
BA
DQ
W
W
R
R