Intel Nehalem microarchitecture quadruple associative Instruction Cache 32 KByte, 128-entry TLB-4K, 7 TLB-2/4M per thread Why should I learn computer architecture? Uncore 128 Quick Path Inter- connect Branch Prediction global/bimodal, loop, indirect jmp Prefetch Buffer (16 Bytes) Predecode & Instruction Length Decoder 4 x 20 Bit 6,4 GT/s Instruction Queue 18 x86 Instructions Alignment MacroOp Fusion Simple Decoder Complex Decoder Loop Stream Decoder DDR3 Memory Controller Simple Decoder Simple Decoder Decoded Instruction Queue (28 µOP entries) MicroOp Fusion 2 x Retirement Register File Common L3-Cache 8 MByte Micro Instruction Sequencer 2 x Register Allocation Table (RAT) Reorder Buffer (128-entry) fused 256 KByte 8-way, 64 Byte Cacheline, private L2-Cache Reservation Station (128-entry) fused Port 4 Store Data Port 3 Port 2 Port 5 Port 1 AGU AGU Store Addr. Unit Load Addr. Unit Integer/ MMX ALU, Branch Integer/ MMX ALU SSE ADD Move SSE ADD Move 128 Port 0 FP ADD Integer/ FP MMX ALU, MUL 2x AGU SSE MUL/DIV Move 128 512-entry L2-TLB-4K 128 Result Bus Memory Order Buffer (MOB) 128 128 octuple associative Data Cache 32 KByte, 64-entry TLB-4K, 32-entry TLB-2/4M GT/s: gigatransfers per second 256 3 x 64 Bit 1,33 GT/s Reason #1: It’s fun -‐  Moore’s Law means the ﬁeld is always radically changing -‐  where else do you get an exponenBally larger number of legos to play with every year? -‐  New applicaBon domains lead to totally new designs -‐  GPUs -‐  Phones -‐  Data Center -‐  Wearable -‐  Implantable processors -‐  Quantum, Biological, etc.. -‐  CS ideas are increasingly applicable to hardware design -‐  Making things faster, smaller, more energy eﬃcient is a rush Reason #2: Performance MaSers Google Data Center Mobile Devices 100,000 computers + your code + 2X faster 1 phone + your code + 2X faster = 50,000 computers saved = 1 MW of electricity saved = “fast enough” = runs on 4 M more ipads How can you speed up code if you don’t know how a computer works? Reason #3 Great computer scienBsts know the whole stack. Bill Gates Mark Zuckerberg Richard Stallman Jeﬀ Dean (Google) Guido (Python) Houston (Dropbox) Torvalds (Linux) Limor Fried (Adafruit) Alan Turing Larry Page (Google) Which of these people didn’t know how a computer works? Reason #4 Employers want employees that are generalists and know the whole stack. Who knows what problems you might end up having to innovate on? What we will learn in this class -‐  Basic architecture: -‐  InstrucBon Sets -‐  Performance Analysis -‐  Pipelining -‐  Caches -‐  Virtual Memory -‐  In-‐order processors -‐  How to build your own all of the above. -‐  Advanced Topics: -‐  MulBcore -‐  Data centers -‐  Mobile Processors -‐  GPUs -‐  Out-‐of-‐order Processors -‐  How x86 / ARM / NVidia combines all of the above