Lecture 11: Beta ISA, Assembly The First Computer Bug Harvard Mark II logbook, Sep 9 1947 • Beta ISA • Universality • Assembly Language 6.004 Computation Structures Today’s handouts: • Lecture slides • Beta ISA L11: Beta ISA & Assembly, Slide #1 Beta ISA: Storage CPU State Main Memory PC 31 3 r0 r1 r2 ... r31 0 2 1 0 32-bit “words” (4 bytes) 32-bit “words” Up to 232 bytes (4GB of memory) 230 4-byte words Each memory word is 32-bits wide, but for historical reasons the β uses byte memory addresses. Since each word contains four 8-bit bytes, addresses of consecutive words differ by 4. 000000....0 General Registers r31 hardwired to 0 6.004 Computation Structures Why separate registers and main memory? Tradeoff: Size vs speed and energy L11: Beta ISA & Assembly, Slide #2 Beta ISA: Instructions • Three types of instructions: – ALU: Perform operations on general registers – Branches: Conditionally change the program counter – Loads and stores: Move data between general registers and main memory • All instructions have a fixed length: 32 bits (4 bytes) 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #3 Beta ALU Instructions Format: OPCODE rc ra rb unused Example coded instruction: ADD 00000000 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0unused OPCODE = ra=1, rb=2 rc=3, 100000, encodes encodes R1 and R2 as encodes R3 as ADD source locations destination 32-bit hex: 0x80611000 We prefer to write a symbolic representation: ADD(r1,r2,r3) ADD(ra,rb,rc): Reg[rc] ß Reg[ra] + Reg[rb] “Add the contents of ra to the contents of rb; store the result in rc” 6.004 Computation Structures Similar instructions for other ALU operations: arithmetic: ADD, SUB, MUL, DIV compare: CMPEQ, CMPLT, CMPLE boolean: AND, OR, XOR, XNOR shift: SHL, SHR, SAR L11: Beta ISA & Assembly, Slide #4 Beta ALU Instructions with Constant Format: OPCODE rc ra 16-bit signed constant Example instruction: ADDC adds register contents and constant: 11000000011000011111111111111101 OPCODE = rc=3, 110000, encoding encoding R3 ADDC as destination ra=1, encoding R1 as first operand constant field, encoding -3 as second operand (sign-extended!) Symbolic version: ADDC(r1,-‐3,r3) ADDC(ra,const,rc): Reg[rc] ß Reg[ra] + sext(const) “Add the contents of ra to const; store the result in rc” 6.004 Computation Structures Similar instructions for other ALU operations: arithmetic: ADDC, SUBC, MULC, DIVC compare: CMPEQC, CMPLTC, CMPLEC boolean: ANDC, ORC, XORC, XNORC shift: SHLC, SHRC, SARC L11: Beta ISA & Assembly, Slide #5 Why Have Instructions with Constants? • Many programs use small constants frequently – e.g., our factorial example: 0, 1, -1 – Tradeoff: • When used, they save registers and instructions • More opcodes à more complex control logic and datapath [Hennessy & Patterson] Percentage of operations that use a constant operand 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #6 Can We Solve Factorial With ALU Instructions? • No! Recall high-level FSM: b != 0 b == 0 start loop a ß 1 b ß N a ß a * b b ß b -‐ 1 done a ß a b ß b • Factorial needs to loop • So far we can only encode sequences of operations on registers • Need a way to change the PC based on data values! 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #7 Beta Branch Instructions The Beta’s branch instructions provide a way to conditionally change the PC to point to a nearby location... ... and, optionally, remembering (in Rc) where we came from (useful for procedure calls). OPCODE rc ra 16-bit signed constant “offset” is a SIGNED CONSTANT encoded as part of the instruction! BEQ(ra,offset,rc): Branch if equal BNE(ra,offset,rc): Branch if not equal NPC ß PC + 4 Reg[rc] ß NPC if (Reg[ra] == 0) PC ß NPC + 4*offset else PC ß NPC NPC ß PC + 4 Reg[rc] ß NPC if (Reg[ra] != 0) PC ß NPC + 4*offset else PC ß NPC offset = (<addr of target> -‐ <addr of BNE/BEQ>)/4 – 1 = up to 32767 instructions before/after BNE/BEQ 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #8 Can We Solve Factorial Now? int a = 1; int b = N; while (b != 0) { a = a * b; b = b – 1; } // Assume r1 = N ADDC(r31, 1, r0) // r0 = 1 L:MUL(r0, r1, r0) // r0 = r0 * r1 SUBC(r1, 1, r1) // r1 = r1 – 1 BNE(r1, L, r31) // if r1 != 0, run MUL next // at this point, r0 = N! • Remember control FSM for our simple programmable datapath? z == 0 loop mul loop sub loop bne z == 1 done • Control FSM states à instructions! – Not the case in general – Happens here because datapath is similar to basic von Neumann datapath 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #9 Beta Load and Store Instructions Loads and stores move data between general registers and main memory OPCODE rc ra 16-bit signed constant address LD(ra,const,rc) Reg[rc] ß Mem[Reg[ra] + sext(const)] Fetch into the contents of rc the contents of the memory location whose address is C plus the contents of ra ST(rc,const,ra) Mem[Reg[ra] + sext(const)] ß Reg[rc] Store the contents of rc into the memory location whose address is C plus the contents of ra BYTE ADDRESSES, but only 32-bit word accesses to word-aligned addresses are supported. Low two address bits are ignored Tradeoff (vs allowing unaligned accesses): Simple implementation, but harder to use 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #10 Beta ISA Summary • Storage: – Processor: 32 registers (r31 hardwired to 0) and PC – Main memory: Up to 4 GB, 32-bit words, 32-bit byte addresses, 4-byte-aligned accesses • Instruction formats: OPCODE rc ra OPCODE rc ra rb unused 16-bit signed constant 32 bits • Instruction types: – ALU: Two input registers, or register and constant – Branches – Loads and stores 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #11 Universality • Recall: We say a set of Boolean gates is universal if we can implement any Boolean function with them • What problems can we solve with a von Neumann computer? (e.g., the Beta) – Everything that FSMs can solve? – Every problem? – Does it depend on the ISA? 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #12 Computability • Possible reasons why we can’t solve a problem: – It is uncomputable: Can’t write an algorithm that solves it in a finite number of steps (an effective method) – It is computable, but not enough memory – It is computable, and we have enough memory, but machine can’t implement the algorithm • We can eliminate the memory restriction by defining a hypothetical machine with infinite memory – Why study something we can’t build? – We’ll get a more precise answer about what real machines can compute 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #13 Turing Machines (Alan Turing, 1937) • Mathematical model of a device with infinite memory • FSM augmented with infinite tape • Each position in the tape stores a symbol from a finite alphabet… 0 1 1 1 0 1 1 0 0 … • Each cycle, the FSM can: – Read symbol at current position – Write another symbol – Move tape to the left or right by one position FSM • Tape provides unbounded memory and input – Why not an infinite RAM? • Example: Turing machine that multiplies two arbitrarily long integers 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #14 Church-Turing Thesis • Uncomputable functions: There are well-defined discrete functions that a Turing machine cannot compute – No algorithm can compute f(x) for arbitrary x in finite number of steps – Not that we don’t know algorithm - can prove no algorithm exists – Corollary: Finite memory is not the only limiting factor on whether we can solve a problem • Church-Turing Thesis: Every discrete function computable by any realizable machine can be computed by some Turing machine – Unproved, but universally accepted – Practical consequence: Turing machines are an upper bound on what any implementable machine can do • May seem obvious, since a TM is just an FSM with an infinite tape… • But FSMs are not the only thing we can implement in the real world! 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #15 Universal Turing Machine • Turing also described an Universal Turing machine (UTM) that can simulate all Turing machines – Tape has data and description of a TM (i.e., its FSM) as input – UTM simulates the TM operating on input data … TM description Input data Output data 0 1 1 1 0 0 0 1 0 1 1 0 1 1 1 0 1 1 0 0 … UTM • There are very simple UTM implementations (FSMs with few states and simple logic) • UTM seems similar to a von Neumann computer... – Computer interprets instructions – UTM interprets the description of the TM 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #16 Turing Completeness • Formally, a computer is Turing complete (or Turing universal) iff it can simulate any Turing machine – Can’t have infinite memory, so unimplementable • Colloquially, we say that a computer or programming language is Turing complete iff we can write a program on it that simulates an Universal Turing machine – We can do that with finite memory, even if we can only run that simulated UTM on tapes of up to a certain size – Most ISAs and programming languages are Turing complete • Practical consequence: A Turing complete computer, given enough memory and time, can solve any computable function 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #17 Turing Completeness: Practical Implications • Turing completeness gives a formal framework for computability – The cheapest phone CPU and the fastest supercomputer can solve the same problems, given enough memory and time… – But with vast differences in performance and cost • An ISA needs very little to be Turing complete – Sufficient (but not necessary): Branching and some arithmetic • e.g., BEQ and SUB in Beta – If you change the ISA, easy to avoid breaking Turing completeness 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #18 Programming Languages 32-bit (4-byte) ADD instruction: 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 opcode Means, to BETA, rc ra rb (unused) Reg[4] ß Reg[2] + Reg[3] We’d rather write ADD(R2, R3, R4) Today (Assembly) or better yet a = b + c; 6.004 Computation Structures (High-Level Language) Next week L11: Beta ISA & Assembly, Slide #19 Assembly Language Symbolic representation of stream of bytes Assembler Source text file 01101101 11000110 00101111 10110001 ..... Stream of bytes to be loaded into memory Binary Machine Language • Abstracts bit-level representation of instructions and addresses • We’ll learn UASM, built into BSIM • Main elements: – – – – Values Symbols Labels (symbols for addresses) Macros 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #20 Example UASM Source File N = 12 ADDC(r31, N, r1) ADDC(r31, 1, r0) loop: MUL(r0, r1, r0) SUBC(r1, 1, r1) BNE(r1, loop, r31) // r1 = N // r0 = 1 // r0 = r0 * r1 // r1 = r1 – 1 // if r1 != 0, NextPC=loop • Comments after //, ignored by assembler • Symbols are symbolic representations of a constant value (they are NOT variables!) • Labels are symbols for addresses • Macros expand into sequences of bytes – Most frequently, macros are instructions – We can use them for other purposes 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #21 How Does It Get Assembled? Input file N = 12 ADDC(r31, N, r1) ADDC(r31, 1, r0) loop: MUL(r0, r1, r0) SUBC(r1, 1, r1) BNE(r1, loop, r31) • Load predefined symbols into a symbol table • Read input line by line Output file 110000 00001 11111 00000000 00001100 110000 00000 11111 00000000 00000001 100010 00000 00001 00000 00000000000 … 6.004 Computation Structures – Add symbols to symbol table as they are defined – Expand macros, translating symbols to values first Symbol table Symbol Value r0 0 r1 1 … r31 31 N loop 12 8 L11: Beta ISA & Assembly, Slide #22 Registers are Predefined Symbols • r0 = 0, …, r31 = 31 • Treated like ADDC(r31, N, r1) normal symbols: Substitute symbols with their values ADDC(31, 12, 1) Expand macro 110000 00001 11111 00000000 00001100 • No “type checking” if you use the wrong opcode… ADDC(r31, r12, r1) ADD(r31, N, r1) ADDC(31, 12, 1) ADD(31, 12, 1) Reg[1] ß Reg[31] + 12 6.004 Computation Structures Reg[1] ß Reg[31] + Reg[12] L11: Beta ISA & Assembly, Slide #23 Labels and Offsets Input file N = 12 ADDC(r31, N, r1) ADDC(r31, 1, r0) loop: MUL(r0, r1, r0) SUBC(r1, 1, r1) BNE(r1, loop, r31) • Labels get translated to the address where they appear • BEQ/BNE macros compute offset automatically • Labels hide addresses! Output file 110000 00001 11111 00000000 00001100 110000 00000 11111 00000000 00000001 100010 00000 00001 00000 00000000000 110001 00001 00001 00000000 00000001 011101 11111 00001 11111111 11111101 offset = (label -‐ <addr of BNE/BEQ>)/4 – 1 = (8 – 16)/4 – 1 = -‐3 6.004 Computation Structures Symbol table Symbol Value r0 0 r1 1 … r31 31 N loop 12 8 L11: Beta ISA & Assembly, Slide #24 Pseudoinstructions • Convenience macros that expand to one or more real instructions • Extend set of operations without adding instructions to the ISA // Convenience macros so we don’t have to use R31 .macro LD(CC,RC) LD(R31,CC,RC) .macro ST(RA,CC) ST(RA,CC,R31) .macro BEQ(RA,LABEL) BEQ(RA,LABEL,R31) .macro BNE(RA,LABEL) BNE(RA,LABEL,R31) .macro MOVE(RA,RC) ADD(RA,R31,RC) // Reg[RC] <-‐ Reg[RA] .macro CMOVE(CC,RC) ADDC(R31,C,RC) // Reg[RC] <-‐ C .macro COM(RA,RC) XORC(RA,-‐1,RC) // Reg[RC] <-‐ ~Reg[RA] .macro NEG(RB,RC) SUB(R31,RB,RC) // Reg[RC] <-‐ -‐Reg[RB] .macro NOP() ADD(R31,R31,R31) // do nothing .macro BR(LABEL) BEQ(R31,LABEL) // always branch .macro BR(LABEL,RC) BEQ(R31,LABEL,RC) // always branch .macro CALL(LABEL) BEQ(R31,LABEL,LP) // call subroutine .macro BF(RA,LABEL,RC) BEQ(RA,LABEL,RC) // 0 is false .macro BF(RA,LABEL) BEQ(RA,LABEL) .macro BT(RA,LABEL,RC) BNE(RA,LABEL,RC) // 1 is true .macro BT(RA,LABEL) BNE(RA,LABEL) // Multi-‐instruction sequences .macro PUSH(RA) ADDC(SP,4,SP) ST(RA,-‐4,SP) .macro POP(RA) LD(SP,-‐4,RA) ADDC(SP,-‐4,SP) 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #25 Factorial with Pseudoinstructions Before N = 12 ADDC(r31, N, r1) ADDC(r31, 1, r0) loop: MUL(r0, r1, r0) SUBC(r1, 1, r1) BNE(r1, loop, r31) 6.004 Computation Structures After N = 12 CMOVE(N, r1) CMOVE(1, r0) loop: MUL(r0, r1, r0) SUBC(r1, 1, r1) BT(r1, loop) L11: Beta ISA & Assembly, Slide #26 Raw Data • LONG dumps 32-bit value – Variables – Constants > 16 bits N: LONG(12) factN: LONG(0xdeadbeef) LD(N, r1) CMOVE(1, r0) loop: MUL(r0, r1, r0) SUBC(r1, 1, r1) BT(r1, loop) ST(r0, factN) 6.004 Computation Structures Symbol table Symbol Value … N 0 factN 4 LD(r31, N, r1) LD(31, 0, 1) Reg[1] ß Mem[Reg[31] + 0] L11: Beta ISA & Assembly, Slide #27 Expressions and Layout • Values can be written as expressions – Assembler evaluates these, they are not translated to instructions! A = 7 + 3 * 0x0cc41 B = A -‐ 3 • The “.” (period) symbol means the next byte address to be filled – Can read or write to it – Useful to control data layout or leave empty space (e.g., for arrays) . = 0x100 // Assemble into 0x100 LONG(0xdeadbeef) k = . // Symbol “k” is 0x104 LONG(0x00dec0de) . = .+16 // Skip 16 bytes LONG(0xc0ffeeee) 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #28 Summary: Assembly Language • Low-level language, symbolic representation of sequence of bytes. Abstracts: – Bit-level representation of instructions – Addresses • • • • • Elements: Values, symbols, labels, macros Values can be constants or expressions Symbols are symbolic representations of values Labels are symbols for addresses Macros are expanded to byte sequences: – Instructions – Pseudoinstructions (translate to 1+ real instructions) – Raw data • Can control where to assemble with “.” symbol 6.004 Computation Structures L11: Beta ISA & Assembly, Slide #29
© Copyright 2025 ExpyDoc