SYNOPSIS
Compiler Design
has been an active topic of research and development since the mid -1950s.
Since that time, both researchers and practitioners have improved and supplanted
them, repeatedly, with more effective ones. Compilers are tools that generate
efficient mappings from programs to machines. The language design continues
to change, the target architecture continues to change, and the programs
become even more ambitious in their scale and complexity. Compiler design
spans several areas like word processors, pretty printers, structure editors
and silicon compilers. This project aims at developing an efficient compiler
for subset of Java grammar. The target language is the Java Byte Code, able
to run on any standard JVM.
Java is the Language of the Internet. The Java Programming Language enables
the source language to be interpreted to an intermediate machine independent
format called the Java Bytecode Format that is able to run on Java Virtual
Machine (JVM). This intermediate format enables the class file to run on
any machine architecture and operating system provided that the Java virtual
machine supports that operating system. In this way, Java makes ‘Write Once,
Run anywhere’ possible. The source files are interpreted to class file and
could be run on any machine. The grammar of Java programming language is
published by Sun Microsystems. There are at least 10 – 25 popular java compilers
around the globe. The javac compiler provided by Sun Microsystems is the
most standard and trusted Java compiler.
This effort is aimed at generating a subset of Java compiler. The grammar
of the thus developed compiler is the trimmed version of the original Java
grammar. The resulting grammar includes all the essential parts of the
Java grammar. But it ignores those features that are added merely for comfort
of the user. This academic project primarily focuses on the efficiency of
the compilation process and aims at getting a strong knowledge base on Compiler
design, Java language specification and Java virtual machine specification.
The final output of this compiler is a class file. This class file is able
to run on any Java virtual machine.
This project is implemented in five phases. These phases are implemented
in such a way that each phase is independent on its own logic but depend
on the output of the previous phase. The phases are Grammar generation phase,
Lexical analysis, Parsing phase, Abstract Syntax tree construction, Symbol
Table generation, Semantic Checker, and Byte Code generator.. The input
is given in the form of a file. A jar file is created as the installation
component of this project.
In the Lexical analysis phase, the input source program is read as character.
They are then compared with the fixed list of keywords. If the input matches
with the keyword, a token is formed and returned. Else the word is considered
as an identifier. Errors are notified if the input does not match with
the specified grammar.
In the Parsing phase, the non – left recursive grammar is used. A descent
recursive parser has been developed. So each non-terminal is assigned a
function which consists of recursive calls to other non – terminals. Indefinite
loops are avoided by non – left recursive grammar. A new token is obtained
from the lexical analysis phase when a terminal is encountered. When a mismatching
token is obtained from the Lexical analysis phase, error is indicated along
with the line number. The output of the (initial) parsing is a simple message
on successful completion or notification of errors during failure.
In the Abstract Syntax Tree construction phase, each type of accepted sentences
of the specified grammar is assigned a class. The punctuation rules of
the parsing phase is avoided. The class variables are other classes like
Terminals, etc. A Syntax tree is formed so that each node is a valid sentence
(of the specified grammar) whose components are its child nodes. This phase
is integrated with the parsing phase. So, in the parsing phase, each (non
– terminal) function returns its corresponding class. Now, the output of
the parser phase is the abstract syntax tree. This abstract syntax tree
is parsed in the later phases for context checking and byte code generation.
In the Symbol Table Generation method, Le – Blanc Cook symbol table method
is applied. It uses two hash tables and one scope stack. The second hash
table is embedded into each row of the first table. The key values are the
name of the identifier, for the first table, and the scope value, for the
second table. During the lookup, the scope stack is analyzed and the nearest
scope to top of the stack is assumed to be the current scope and appropriate
value from the second table is returned. It returns false if the variable
is not found or during a name clash. Unique ID is used for each scope.
In the semantic analysis phase, a set of contextual constraints is fixed.
These constraints are checked while walking through the nodes of the Syntax
tree. During the traversal the identifiers the type values are inserted,
checked and validated. In the code generation phase, a class file is formed.
This class file generation uses the file jvm.jar, written by Joshua Engel,
and the file is available under GNU Library General Public License. The
class file format is byte code format as specified in the Java virtual
machine specification. This class file runs in any JVM. The implementation
language is Java. Thus the compiler construction is completed.