Programming Language Syntax

Chapter3: Language Translation
issues
• Programming language Syntax
– Key criteria concerning syntax
– Basic syntactic concepts
– Overall Program-Subprogram structure
• Stages in Translation
– Analysis of the source program
– Synthesis of the object program
– Bootstrapping
What is Syntax
The syntax of a programming
language describes the structure
of programs without any
consideration of their meaning.
Key criteria concerning
syntax
Readability – a program is considered readable if
the algorithm and data are apparent by
inspection.
Writeability – ease of writing the program.
Verifiability – ability to prove program
correctness (very difficult issue)
Translatability – ease of translating the program
into executable form.
Lack of ambiguity – the syntax should provide for
ease of avoiding ambiguous structures
Basic syntactic concepts
• Character set – The alphabet of the
language. Several different character sets
are used: ASCII, EBCIDIC, Unicode
• Identifiers – strings of letters of digits
usually beginning with a letter
• Operator Symbols – +-*/
• Keywords or Reserved Words – used as a
fixed part of the syntax of a statement
Basic syntactic concepts
• Noise words – optional words inserted into
statements to improve readability
• Comments – used to improve readability
and for documentation purposes. Comments
are usually enclosed by special markers
• Blanks – rules vary from language to
language. Usually only significant in literal
strings
Basic syntactic concepts
• Delimiters – used to denote the
beginning and the end of syntactic
constructs
• Expressions – functions that access data
objects in a program and return a value
• Statements – these are the sentences of
the language, they describe a task to be
performed
Overall ProgramSubprogram Structure
Separate subprogram definitions:
Separate compilation, linked at load time E.G.
C/C++
Separate data definitions: General
approach in OOP.
Nested subprogram definitions:
Subprogram definitions appear as declarations
within the main program or other
subprograms. E.G. Pascal
Overall ProgramSubprogram Structure
Separate interface definitions:
C/C++ header files
Data descriptions separated from
executable statements. A centralized data
division contains all data declarations. E.G.
COBOL
Unseparated subprogram definitions: No
syntactic distinction between main program
statements and subprogram statements.
E.G BASIC
Stages in Translation
• Analysis of the source program
• Synthesis of the object program
• Bootstrapping
Analysis of the source
program
Lexical analysis (scanning) – identifying the
tokens of the programming language: keywords,
identifiers, constants and other symbols
In the program
void main()
{ printf("Hello World\n"); }
the tokens are
void, main, (, ), {, printf, (, "Hello
World\n", ), ;, }
Syntactic and semantic analysis
Syntactic analysis (parsing) – determining
the structure of the program, as defined by the
language grammar.
Semantic analysis - assigning meaning to the
syntactic structures
Example:
int variable1;
meaning: 4 bytes for variable1 , a specific set of
operations to be used with variable1.
Basic semantic tasks
The semantic analysis builds the bridge between
analysis and synthesis.
Basic semantic tasks:
•
•
•
•
Symbol–table maintenance
Insertion of implicit information
Error detection
Macro processing
Result : an internal representation, suitable to be
used for code optimization and code generation.
Synthesis of the object
program
Three main steps:
Optimization - Removing redundant statements
Code generation - generating assembler
commands with relative memory addresses for the
separate program modules - obtaining the object
code of the program.
Linking and loading - resolving the addresses
- obtaining the executable code of the program.
Optimization example
Intermediate code:
Temp1 = B + C
Temp2 = Temp1 + D
A = Temp2
Assembler code not
optimized:
LOAD_R B
ADD_R C
STORE_R Temp1
LOAD_R Temp1
Statements in yellow
can be removed
ADD_R D
STORE_R Temp2
LOAD_R Temp2
STORE_R A
Bootstrapping
The compiler for a given language can be
written in the same language.
• a program that translates some internal
representation into assembler code
• the programmer manually re-writes the compiler
into the internal representation, using the
algorithm that is encoded into the compiler.
From there on the internal representation is
translated into assembler and then into machine
language.