Lecture 11: Beta ISA, Assembly

Lecture 11:
Beta ISA, Assembly
The First Computer Bug
Harvard Mark II logbook, Sep 9 1947
•  Beta ISA
•  Universality
•  Assembly Language
6.004 Computation Structures
Today’s handouts:
•  Lecture slides
•  Beta ISA
L11: Beta ISA & Assembly, Slide #1
Beta ISA: Storage
CPU State
Main Memory
PC
31
3
r0
r1
r2
...
r31
0
2
1 0
32-bit “words”
(4 bytes)
32-bit “words”
Up to 232 bytes (4GB of memory)
230 4-byte words
Each memory word is 32-bits wide,
but for historical reasons the β
uses byte memory addresses. Since
each word contains four 8-bit bytes,
addresses of consecutive words
differ by 4.
000000....0
General Registers
r31 hardwired to 0
6.004 Computation Structures
Why separate registers and main memory?
Tradeoff: Size vs speed and energy
L11: Beta ISA & Assembly, Slide #2
Beta ISA: Instructions
•  Three types of instructions:
–  ALU: Perform operations on general registers
–  Branches: Conditionally change the program counter
–  Loads and stores: Move data between general registers and
main memory
•  All instructions have a fixed length: 32 bits (4 bytes)
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #3
Beta ALU Instructions
Format:
OPCODE
rc
ra
rb
unused
Example coded instruction: ADD
00000000
1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0unused
OPCODE =
ra=1, rb=2
rc=3,
100000, encodes
encodes R1 and R2 as
encodes
R3
as
ADD
source locations
destination
32-bit hex: 0x80611000
We prefer to write a symbolic representation:
ADD(r1,r2,r3) ADD(ra,rb,rc): Reg[rc] ß Reg[ra] + Reg[rb] “Add the contents of ra
to the contents of rb;
store the result in rc”
6.004 Computation Structures
Similar instructions for
other ALU operations:
arithmetic: ADD, SUB, MUL, DIV
compare: CMPEQ, CMPLT, CMPLE
boolean: AND, OR, XOR, XNOR
shift: SHL, SHR, SAR
L11: Beta ISA & Assembly, Slide #4
Beta ALU Instructions with Constant
Format:
OPCODE
rc
ra
16-bit signed constant
Example instruction: ADDC adds register contents and constant:
11000000011000011111111111111101
OPCODE =
rc=3,
110000, encoding
encoding R3
ADDC
as destination
ra=1,
encoding R1
as first
operand
constant field,
encoding -3 as
second operand
(sign-extended!)
Symbolic version: ADDC(r1,-­‐3,r3) ADDC(ra,const,rc): Reg[rc] ß Reg[ra] + sext(const) “Add the contents of ra to
const; store the result in rc”
6.004 Computation Structures
Similar instructions for other
ALU operations:
arithmetic: ADDC, SUBC, MULC, DIVC
compare: CMPEQC, CMPLTC, CMPLEC
boolean: ANDC, ORC, XORC, XNORC
shift: SHLC, SHRC, SARC
L11: Beta ISA & Assembly, Slide #5
Why Have Instructions with Constants?
•  Many programs use small constants frequently
–  e.g., our factorial example: 0, 1, -1
–  Tradeoff:
•  When used, they save registers and instructions
•  More opcodes à more complex control logic and datapath
[Hennessy & Patterson]
Percentage of operations that use a constant operand
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #6
Can We Solve Factorial With ALU Instructions?
•  No! Recall high-level FSM:
b != 0 b == 0 start
loop
a ß 1 b ß N a ß a * b b ß b -­‐ 1 done
a ß a b ß b •  Factorial needs to loop
•  So far we can only encode sequences of operations
on registers
•  Need a way to change the PC based on data values!
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #7
Beta Branch Instructions
The Beta’s branch instructions provide a way to conditionally
change the PC to point to a nearby location...
... and, optionally, remembering (in Rc) where we came from
(useful for procedure calls).
OPCODE
rc
ra
16-bit signed constant
“offset” is a SIGNED
CONSTANT encoded as
part of the instruction!
BEQ(ra,offset,rc): Branch if equal BNE(ra,offset,rc): Branch if not equal
NPC ß PC + 4 Reg[rc] ß NPC if (Reg[ra] == 0) PC ß NPC + 4*offset else PC ß NPC NPC ß PC + 4 Reg[rc] ß NPC if (Reg[ra] != 0) PC ß NPC + 4*offset else PC ß NPC offset = (<addr of target> -­‐ <addr of BNE/BEQ>)/4 – 1 = up to 32767 instructions before/after BNE/BEQ 6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #8
Can We Solve Factorial Now?
int a = 1; int b = N; while (b != 0) { a = a * b; b = b – 1; } // Assume r1 = N ADDC(r31, 1, r0) // r0 = 1 L:MUL(r0, r1, r0) // r0 = r0 * r1 SUBC(r1, 1, r1) // r1 = r1 – 1 BNE(r1, L, r31) // if r1 != 0, run MUL next
// at this point, r0 = N! •  Remember control FSM for our simple programmable datapath?
z == 0
loop
mul
loop
sub
loop
bne
z == 1
done
•  Control FSM states à instructions!
–  Not the case in general
–  Happens here because datapath is similar to basic von Neumann datapath
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #9
Beta Load and Store Instructions
Loads and stores move data between general registers and
main memory
OPCODE
rc
ra
16-bit signed constant
address
LD(ra,const,rc) Reg[rc] ß Mem[Reg[ra] + sext(const)] Fetch into the contents of rc the contents of the memory location
whose address is C plus the contents of ra
ST(rc,const,ra) Mem[Reg[ra] + sext(const)] ß Reg[rc] Store the contents of rc into the memory location whose address
is C plus the contents of ra
BYTE ADDRESSES, but only 32-bit word accesses to word-aligned
addresses are supported. Low two address bits are ignored
Tradeoff (vs allowing unaligned accesses): Simple implementation,
but harder to use
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #10
Beta ISA Summary
•  Storage:
–  Processor: 32 registers (r31 hardwired to 0) and PC
–  Main memory: Up to 4 GB, 32-bit words, 32-bit byte
addresses, 4-byte-aligned accesses
•  Instruction formats:
OPCODE
rc
ra
OPCODE
rc
ra
rb
unused
16-bit signed constant
32 bits
•  Instruction types:
–  ALU: Two input registers, or register and constant
–  Branches
–  Loads and stores
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #11
Universality
•  Recall: We say a set of Boolean gates is universal if
we can implement any Boolean function with them
•  What problems can we solve with a von Neumann
computer? (e.g., the Beta)
–  Everything that FSMs can solve?
–  Every problem?
–  Does it depend on the ISA?
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #12
Computability
•  Possible reasons why we can’t solve a problem:
–  It is uncomputable: Can’t write an algorithm that solves it in
a finite number of steps (an effective method)
–  It is computable, but not enough memory
–  It is computable, and we have enough memory, but machine
can’t implement the algorithm
•  We can eliminate the memory restriction by defining
a hypothetical machine with infinite memory
–  Why study something we can’t build?
–  We’ll get a more precise answer about what real machines
can compute
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #13
Turing Machines (Alan Turing, 1937)
•  Mathematical model of a device with infinite memory
•  FSM augmented with infinite tape
•  Each position in the tape stores
a symbol from a finite alphabet…
0 1 1 1 0 1 1 0 0 …
•  Each cycle, the FSM can:
–  Read symbol at current position
–  Write another symbol
–  Move tape to the left or right by one position
FSM
•  Tape provides unbounded memory and input
–  Why not an infinite RAM?
•  Example: Turing machine that multiplies two
arbitrarily long integers
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #14
Church-Turing Thesis
•  Uncomputable functions: There are well-defined discrete
functions that a Turing machine cannot compute
–  No algorithm can compute f(x) for arbitrary x in finite number of
steps
–  Not that we don’t know algorithm - can prove no algorithm exists
–  Corollary: Finite memory is not the only limiting factor on
whether we can solve a problem
•  Church-Turing Thesis: Every discrete function computable
by any realizable machine can be computed by some
Turing machine
–  Unproved, but universally accepted
–  Practical consequence: Turing machines are an upper bound on
what any implementable machine can do
•  May seem obvious, since a TM is just an FSM with an infinite tape…
•  But FSMs are not the only thing we can implement in the real world!
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #15
Universal Turing Machine
•  Turing also described an Universal Turing machine
(UTM) that can simulate all Turing machines
–  Tape has data and description of a TM (i.e., its FSM) as input
–  UTM simulates the TM operating on input data
…
TM description
Input data
Output data
0 1 1 1 0 0 0 1 0 1 1 0 1 1 1 0 1 1 0 0
…
UTM
•  There are very simple UTM implementations (FSMs
with few states and simple logic)
•  UTM seems similar to a von Neumann computer...
–  Computer interprets instructions
–  UTM interprets the description of the TM
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #16
Turing Completeness
•  Formally, a computer is Turing complete (or Turing
universal) iff it can simulate any Turing machine
–  Can’t have infinite memory, so unimplementable
•  Colloquially, we say that a computer or programming
language is Turing complete iff we can write a program
on it that simulates an Universal Turing machine
–  We can do that with finite memory, even if we can only run
that simulated UTM on tapes of up to a certain size
–  Most ISAs and programming languages are Turing complete
•  Practical consequence: A Turing complete computer,
given enough memory and time, can solve any
computable function
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #17
Turing Completeness: Practical Implications
•  Turing completeness gives a formal framework for
computability
–  The cheapest phone CPU and the fastest supercomputer
can solve the same problems, given enough memory and
time…
–  But with vast differences in performance and cost
•  An ISA needs very little to be Turing complete
–  Sufficient (but not necessary): Branching and some
arithmetic
•  e.g., BEQ and SUB in Beta
–  If you change the ISA, easy to avoid breaking Turing
completeness
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #18
Programming Languages
32-bit (4-byte) ADD instruction:
1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
opcode
Means, to BETA,
rc
ra
rb
(unused)
Reg[4] ß Reg[2] + Reg[3] We’d rather write
ADD(R2, R3, R4) Today
(Assembly)
or better yet
a = b + c; 6.004 Computation Structures
(High-Level Language)
Next week
L11: Beta ISA & Assembly, Slide #19
Assembly Language
Symbolic
representation
of stream of bytes
Assembler
Source
text file
01101101
11000110
00101111
10110001
.....
Stream of bytes
to be loaded
into memory
Binary
Machine
Language
•  Abstracts bit-level representation of instructions and
addresses
•  We’ll learn UASM, built into BSIM
•  Main elements:
– 
– 
– 
– 
Values
Symbols
Labels (symbols for addresses)
Macros
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #20
Example UASM Source File
N = 12 ADDC(r31, N, r1) ADDC(r31, 1, r0) loop: MUL(r0, r1, r0) SUBC(r1, 1, r1) BNE(r1, loop, r31)
// r1 = N // r0 = 1 // r0 = r0 * r1 // r1 = r1 – 1 // if r1 != 0, NextPC=loop
•  Comments after //, ignored by assembler
•  Symbols are symbolic representations of a constant
value (they are NOT variables!)
•  Labels are symbols for addresses
•  Macros expand into sequences of bytes
–  Most frequently, macros are instructions
–  We can use them for other purposes
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #21
How Does It Get Assembled?
Input file
N = 12 ADDC(r31, N, r1) ADDC(r31, 1, r0) loop: MUL(r0, r1, r0) SUBC(r1, 1, r1) BNE(r1, loop, r31) •  Load predefined symbols
into a symbol table
•  Read input line by line
Output file
110000 00001 11111 00000000 00001100 110000 00000 11111 00000000 00000001 100010 00000 00001 00000 00000000000 …
6.004 Computation Structures
–  Add symbols to symbol table
as they are defined
–  Expand macros, translating
symbols to values first
Symbol table
Symbol
Value
r0
0
r1
1
…
r31
31
N
loop
12
8
L11: Beta ISA & Assembly, Slide #22
Registers are Predefined Symbols
•  r0 = 0, …, r31 = 31
•  Treated like
ADDC(r31, N, r1)
normal symbols:
Substitute symbols with their values
ADDC(31, 12, 1)
Expand macro
110000 00001 11111 00000000 00001100 •  No “type checking” if you use the wrong opcode…
ADDC(r31, r12, r1)
ADD(r31, N, r1)
ADDC(31, 12, 1)
ADD(31, 12, 1)
Reg[1] ß Reg[31] + 12 6.004 Computation Structures
Reg[1] ß Reg[31] + Reg[12] L11: Beta ISA & Assembly, Slide #23
Labels and Offsets
Input file
N = 12 ADDC(r31, N, r1) ADDC(r31, 1, r0) loop: MUL(r0, r1, r0) SUBC(r1, 1, r1) BNE(r1, loop, r31) •  Labels get translated to the
address where they appear
•  BEQ/BNE macros compute
offset automatically
•  Labels hide addresses!
Output file
110000 00001 11111 00000000 00001100 110000 00000 11111 00000000 00000001 100010 00000 00001 00000 00000000000 110001 00001 00001 00000000 00000001 011101 11111 00001 11111111 11111101 offset = (label -­‐ <addr of BNE/BEQ>)/4 – 1 = (8 – 16)/4 – 1 = -­‐3 6.004 Computation Structures
Symbol table
Symbol
Value
r0
0
r1
1
…
r31
31
N
loop
12
8
L11: Beta ISA & Assembly, Slide #24
Pseudoinstructions
•  Convenience macros that expand to one or more real instructions
•  Extend set of operations without adding instructions to the ISA
// Convenience macros so we don’t have to use R31 .macro LD(CC,RC) LD(R31,CC,RC) .macro ST(RA,CC) ST(RA,CC,R31) .macro BEQ(RA,LABEL) BEQ(RA,LABEL,R31) .macro BNE(RA,LABEL) BNE(RA,LABEL,R31) .macro MOVE(RA,RC) ADD(RA,R31,RC)
// Reg[RC] <-­‐ Reg[RA] .macro CMOVE(CC,RC)
ADDC(R31,C,RC)
// Reg[RC] <-­‐ C .macro COM(RA,RC) XORC(RA,-­‐1,RC)
// Reg[RC] <-­‐ ~Reg[RA] .macro NEG(RB,RC) SUB(R31,RB,RC)
// Reg[RC] <-­‐ -­‐Reg[RB] .macro NOP()
ADD(R31,R31,R31) // do nothing .macro BR(LABEL) BEQ(R31,LABEL)
// always branch .macro BR(LABEL,RC)
BEQ(R31,LABEL,RC) // always branch .macro CALL(LABEL) BEQ(R31,LABEL,LP) // call subroutine .macro BF(RA,LABEL,RC)
BEQ(RA,LABEL,RC) // 0 is false .macro BF(RA,LABEL)
BEQ(RA,LABEL) .macro BT(RA,LABEL,RC)
BNE(RA,LABEL,RC) // 1 is true .macro BT(RA,LABEL)
BNE(RA,LABEL) // Multi-­‐instruction sequences .macro PUSH(RA) ADDC(SP,4,SP) ST(RA,-­‐4,SP) .macro POP(RA)
LD(SP,-­‐4,RA) ADDC(SP,-­‐4,SP) 6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #25
Factorial with Pseudoinstructions
Before
N = 12 ADDC(r31, N, r1) ADDC(r31, 1, r0) loop: MUL(r0, r1, r0) SUBC(r1, 1, r1) BNE(r1, loop, r31) 6.004 Computation Structures
After
N = 12 CMOVE(N, r1) CMOVE(1, r0) loop: MUL(r0, r1, r0) SUBC(r1, 1, r1) BT(r1, loop) L11: Beta ISA & Assembly, Slide #26
Raw Data
•  LONG dumps 32-bit value
–  Variables
–  Constants > 16 bits
N:
LONG(12) factN: LONG(0xdeadbeef) LD(N, r1) CMOVE(1, r0) loop: MUL(r0, r1, r0) SUBC(r1, 1, r1) BT(r1, loop) ST(r0, factN) 6.004 Computation Structures
Symbol table
Symbol
Value
…
N
0
factN
4
LD(r31, N, r1)
LD(31, 0, 1)
Reg[1] ß Mem[Reg[31] + 0] L11: Beta ISA & Assembly, Slide #27
Expressions and Layout
•  Values can be written as expressions
–  Assembler evaluates these, they are not translated to instructions!
A = 7 + 3 * 0x0cc41 B = A -­‐ 3 •  The “.” (period) symbol means the next byte address to be
filled
–  Can read or write to it
–  Useful to control data layout or leave empty space (e.g., for arrays)
. = 0x100 // Assemble into 0x100 LONG(0xdeadbeef) k = . // Symbol “k” is 0x104 LONG(0x00dec0de) . = .+16 // Skip 16 bytes LONG(0xc0ffeeee) 6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #28
Summary: Assembly Language
•  Low-level language, symbolic representation of
sequence of bytes. Abstracts:
–  Bit-level representation of instructions
–  Addresses
• 
• 
• 
• 
• 
Elements: Values, symbols, labels, macros
Values can be constants or expressions
Symbols are symbolic representations of values
Labels are symbols for addresses
Macros are expanded to byte sequences:
–  Instructions
–  Pseudoinstructions (translate to 1+ real instructions)
–  Raw data
•  Can control where to assemble with “.” symbol
6.004 Computation Structures
L11: Beta ISA & Assembly, Slide #29