Language Processors - CS 434/534 Home Page

Course Overview
PART I: overview material
1
2
3
Introduction
Language processors (tombstone diagrams, bootstrapping)
Architecture of a compiler
PART II: inside a compiler
4
5
6
7
Syntax analysis
Contextual analysis
Runtime organization
Code generation
PART III: conclusion
8
9
Interpretation
Review
Language processors (Chapter 2)
1
Chapter 2 Language Processors
RECAP
– Different levels of programming languages
low-level <–> high-level
– different language processors to bridge the semantic gap
GOAL this lecture:
– A high level understanding of what language processors are.
• what do they do?
• what kinds are there?
Topics:
– translators (compilers, assemblers, ...)
– tombstone diagrams
– bootstrapping
Language processors (Chapter 2)
2
Compilers and other translators
Examples:
Chinese => English
Java => JVM byte codes
Scheme => C
C => Scheme
x86 Assembly Language => x86 binary codes
Other non-traditional examples:
disassembler, decompiler (e.g. JVM => Java)
Language processors (Chapter 2)
3
Assemblers versus Compilers
An assembler and a compiler both translate a “more high
level” language into a “more low-level” one.
More high level
Scheme
Scheme (to C) compiler
C
x86 assembly
x86 executable binary
C compiler
x86 assembler
More low level
Language processors (Chapter 2)
4
Assemblers versus Compilers
Q: What is the difference between a compiler and an
assembler?
Assembler:
• very direct mapping (almost 1 to 1 mapping) from
the source language onto the target language.
Compiler:
• more complex “conceptual” mapping from a HL
language to a LL language. 1 construct in the HL
language => many instructions in the LL language
Language processors (Chapter 2)
5
Terminology
Q: Which programming languages play a role in this picture?
input
source program
Translator
is expressed in the
source language
output
object program
is expressed in the
target language
is expressed in the
implementation language
Language processors (Chapter 2)
6
Tombstone Diagrams
What are they?
– diagrams consist of a set of “puzzle pieces” we can use to
reason about language processors and programs
– different kinds of pieces
– combination rules (not all diagrams are “well formed”)
Program P implemented in L
P
L
Machine implemented in hardware
M
Language processors (Chapter 2)
Translator implemented in L
S --> T
L
Language interpreter in L
M
L
7
Tombstone diagrams: Combination rules
P
M
M
P
L
M
OK!
P
S
P
T
S --> T
M
OK!
M OK!
OK!
WRONG!
Language processors (Chapter 2)
P
L
WRONG!
S --> T
M
8
Compilation
Example: Compilation of C programs on an x86 machine
Tetris
C
C --> x86
x86
x86
Language processors (Chapter 2)
Tetris
x86
Tetris
x86
x86
9
Cross compilation
Example: A C “cross compiler” from x86 to PPC
A cross compiler is a compiler which runs on one machine (the host
machine) but emits code for another machine (the target machine).
Tetris
C
C --> PPC
x86
x86
Tetris
PPC
download
Tetris
PPC
PPC
Host ≠ Target
Q: Are cross compilers useful? Why would/could we use them?
Language processors (Chapter 2)
10
Two Stage Compilation
A two-stage translator is a composition of two translators. The
output of the first translator is provided as input to the second
translator.
Tetris
Tetris
Tetris
Java Java-->JVM JVM JVM-->x86 x86
x86
x86
x86
x86
Language processors (Chapter 2)
11
Compiling a Compiler
Observation: A compiler is a program!
Therefore it can be provided as input to a language processor.
Example: compiling a compiler.
Java-->x86
Java-->x86
C++ C++-->x86 x86
x86
x86
Language processors (Chapter 2)
12
Interpreters
An interpreter is a language processor implemented in software, i.e.
as a program.
Terminology: abstract (or virtual) machine versus real machine
Example: The Java Virtual Machine
Tetris
JVM
JVM
x86
x86
Q: Why are abstract machines useful?
Language processors (Chapter 2)
13
Interpreters
Q: Why are abstract machines useful?
1) Abstract machines provide better platform independence
Tetris
JVM
JVM
x86
x86
Language processors (Chapter 2)
Tetris
JVM
JVM
PPC
PPC
14
Interpreters
Q: Why are abstract machines useful?
2) Abstract machines are useful for testing and debugging.
Example: Testing the “Ultima” processor using hardware emulation
P
Ultima
Ultima
x86
x86

P
Ultima
Ultima
Functional equivalence
Note: we don’t have to implement Ultima emulator directly in x86;
we can use a high-level language and compile it to x86.
Language processors (Chapter 2)
15
Interpreters versus Compilers
Q: What are the tradeoffs between compilation and interpretation?
Compilers typically offer more advantages when
– programs are deployed in a production setting
– programs are “repetitive”
– the instructions of the programming language are complex
Interpreters typically are a better choice when
– we are in a development/testing/debugging stage
– programs are run once and then discarded
– the instructions of the language are simple
Language processors (Chapter 2)
16
Interpretive Compilers
Why?
A tradeoff between fast(er) compilation and a reasonable runtime
performance.
How?
Use an “intermediate language”
• more high-level than machine code => easier to compile to
• more low-level than source language => easier to implement as
an interpreter
Example: A “Java Development Kit” for machine M
Java-->JVM
M
Language processors (Chapter 2)
JVM
M
17
Interpretive Compilers
Example: Here is how to use a “Java Development Kit” to run a Java
program P
P
P
Java Java-->JVM JVM
M
M
Language processors (Chapter 2)
P
JVM
JVM
M
M
18
Portable Compilers
Example: Two different “Java Development Kits”
Kit 1:
Java-->JVM
M
JVM
M
Java-->JVM
JVM
JVM
M
Kit 2:
Q: Which one is “more portable”?
Language processors (Chapter 2)
19
Portable Compilers
In the previous example we have seen that portability is
not an “all or nothing” kind of deal.
It is useful to talk about a “degree of portability” as the
percentage of code that needs to be re-written when
moving to a dissimilar machine.
In practice 100% portability is not possible.
Language processors (Chapter 2)
20
Example: a “portable” compiler kit
Portable Compiler Kit:
Java-->JVM
Java
Java-->JVM
JVM
JVM
Java
Q: Suppose we want to run this kit on some machine M. How could
we go about realizing that goal? (with the least amount of effort)
Language processors (Chapter 2)
21
Example: a “portable” compiler kit
Java-->JVM
Java
Java-->JVM
JVM
JVM
Java
Q: Suppose we want to run this kit on our some machine M. How
could we go about realizing that goal? (with the least amount of
effort)
JVM
Java
reimplement
Language processors (Chapter 2)
JVM
C++
C++-->M
M
M
JVM
M
22
Example: a “portable” compiler kit
This is what we have now:
Java-->JVM
Java
Java-->JVM
JVM
JVM
Java
JVM
M
Now, how do we run our Tetris program?
Tetris
Tetris
Java Java-->JVM JVM
JVM
JVM
M
M
Language processors (Chapter 2)
Tetris
JVM
JVM
M
M
23
Bootstrapping
Remember our “portable compiler kit”:
Java-->JVM
Java
Java-->JVM
JVM
JVM
Java
JVM
M
We haven’t used this yet!
Java-->JVM
Java
Same language!
Q: What can we do with a compiler written in
itself? Is that useful at all?
Language processors (Chapter 2)
24
Bootstrapping
Java-->JVM
Java
Same language!
Q: What can we do with a compiler written in
itself? Is that useful at all?
• By implementing the compiler in (a subset of) its own language, we
become less dependent on the target platform => more portable
implementation.
• But… “chicken and egg problem”? How can we get around that?
=> BOOTSTRAPPING: requires some work to make the first “egg”.
There are many possible variations on how to bootstrap a compiler
written in its own language.
Language processors (Chapter 2)
25
Bootstrapping an Interpretive Compiler to Generate
M code
Our “portable compiler kit”:
Java-->JVM
Java
Java-->JVM
JVM
JVM
Java
JVM
M
Goal: we want to get a “completely native” Java compiler on machine M
P
P
Java Java-->M
M
M
M
Language processors (Chapter 2)
26
Bootstrapping an Interpretive Compiler to Generate
M code
Idea: we will build a two-stage Java --> M compiler.
P
P
Java Java-->JVM JVM
M
M
M
We will make this by
compiling
Java-->JVM
JVM
Language processors (Chapter 2)
JVM-->M
M
M
P
M
To get this we implement
JVM-->M
Java
and compile it
27
Bootstrapping an Interpretive Compiler to Generate
M code
Step 1: implement
JVM-->M
Java
Step 2: compile it
JVM-->M
JVM-->M
Java Java-->JVM JVM
JVM
JVM
M
M
Step 3: compile this
Language processors (Chapter 2)
28
Bootstrapping an Interpretive Compiler to Generate
M code
Step 3: “Self compile” the JVM (in JVM) compiler
JVM-->M
JVM-->M
JVM JVM-->M
M
JVM
JVM
M
M
This is the second
stage of our
compiler!
Step 4: use this to compile the Java compiler
Language processors (Chapter 2)
29
Bootstrapping an Interpretive Compiler to Generate
M code
Step 4: Compile the Java-->JVM compiler into machine code
Java-->JVM
Java-->JVM
JVM JVM-->M
M
M
M
The first stage of
our compiler!
We are DONE!
Language processors (Chapter 2)
30
Full Bootstrap
A full bootstrap is necessary when we are building a new compiler
from scratch.
Example:
We want to implement an Ada compiler for machine M. We don’t
currently have access to any Ada compiler (not on M, nor on any other
machine).
Idea: Ada is very large, we will implement the compiler in a subset of
Ada and bootstrap it from a subset of Ada compiler in another
language. (e.g. C)
v1
Step 1a: Build a compiler for Ada-S
Ada-S-->M
in another language
C
Language processors (Chapter 2)
31
Full Bootstrap
Step 1a: Build a compiler (v1) for Ada-S in another language.
v1
Ada-S-->M
C
Step 1b: Compile v1 compiler on M
v1
v1
Ada-S-->M
Ada-S-->M
C-->M
C
M
M
This compiler can be used for
M
bootstrapping on machine M but we
do not want to rely on it permanently!
Language processors (Chapter 2)
32
Full Bootstrap
Step 2a: Implement v2 of Ada-S compiler in Ada-S
v2
Ada-S-->M
Q: Is it hard to rewrite the compiler in Ada-S?
Ada-S
Step 2b: Compile v2 compiler with v1 compiler
v2
v2
v1 Ada-S-->M
Ada-S-->M
M
Ada-S Ada-S-->M
M
We are now no longer dependent
M
on the availability of a C compiler!
Language processors (Chapter 2)
33
Full Bootstrap
Step 3a: Build a full Ada compiler in Ada-S
v3
Ada-->M
Ada-S
Step 3b: Compile with v2 compiler
v3
v3
v2
Ada-->M
Ada-->M
M
Ada-S Ada-S-->M
M
M
From this point on we can maintain the compiler in Ada.
Subsequent versions v4,v5,... of the compiler in Ada and compile
each with the the previous version.
Language processors (Chapter 2)
34
Half Bootstrap
We discussed full bootstrap which is required when we have no
access to a compiler for our language at all.
Q: What if we have access to a compiler for our language on a
different machine HM (host machine) but want to develop one for
TM (target machine)?
We have:
Ada-->HM
HM
We want:
Ada-->HM
Ada
Ada-->TM
TM
Idea: We can use cross compilation from HM to TM to bootstrap
the TM compiler.
Language processors (Chapter 2)
35
Half Bootstrap
Idea: We can use cross compilation from HM to TM to bootstrap
the TM compiler.
Step 1: Implement Ada-->TM compiler in Ada
Ada-->TM
Ada
Step 2: Compile on HM
Ada-->TM
Ada-->TM
Ada Ada-->HM HM
HM
HM
Language processors (Chapter 2)
Cross compiler:
running on HM but
emits TM code
36
Half Bootstrap
Step 3: Cross compile our Ada-->TM compiler.
Ada-->TM
Ada-->TM
Ada Ada-->TM TM
HM
HM
DONE!
From now on we can develop subsequent versions of the compiler
completely on TM
Language processors (Chapter 2)
37
Bootstrapping to Improve Efficiency
The efficiency of programs and compilers:
Efficiency of programs:
- memory usage
- runtime
Efficiency of compilers:
- Efficiency of the compiler itself
- Efficiency of the emitted code
Idea: We start from a simple compiler (generating inefficient code)
and develop more sophisticated version of it. We can then use
bootstrapping to improve performance of the compiler.
Language processors (Chapter 2)
38
Bootstrapping to Improve Efficiency
We have:
Step 1
Ada-->Mslow Ada-->Mslow
Mslow
Ada
We implement:
Ada-->Mfast
Ada
Ada-->Mfast
Ada-->Mfast
Ada Ada-->Mslow Mslow
Mslow
M
Step 2 Ada-->M
Ada-->Mfast
fast
Ada Ada-->Mfast Mfast
Mslow
Fast compiler that
emits fast code!
M
Language processors (Chapter 2)
39