THEORETICAL INVESTIGATION
2.1 COMPILER DESIGN
The principles and techniques of compiler writing are so pervasive that
the ideas of compiler design will be used many times in the career of a computer
scientist. Compiler writing spans programming languages, machine architecture,
language theory, algorithms and software engineering. But a few basic compiler-writing
techniques can be used to construct translators for a wide variety of languages
and machines. Compiler Design has been an active topic of research and development
since the mid -1950s. Since that time, both researchers and practitioners
have improved and supplanted them, repeatedly, with more effective ones.
Compilers are tools that generate efficient mappings from programs to machines.
The language design continues to change, the target architecture continues
to change, and the programs become even more ambitious in their scale and
complexity. This project aims at developing an efficient compiler for subset
of Java grammar. The target language is the Java Byte Code, able to run on
any standard JVM.
A compiler
is a program that reads a program written in one language the source language
– and translates it into an equivalent program in another language – the
target language. As an important part of this translation process, the compiler
reports to its user the presence of errors in the source program.
FIGURE 1– FUNCTION OF A COMPILER
There
are thousands of source languages and equally number of varied target languages.
But the important categories are of two types of target language. The first
type is results in the creation of execution file. This execution file
is machine dependent. This file could not be executed in any other machine
architecture except the one in which it is compiled. The C, C++ compilers
belong to this category.
The second type results in some intermediate format that could be ported
to any machine architecture. The concept of portability plays an important
role in these types of compilers. This type of compilers is generally called
as the interpreters. With the growing capability of Internet and intranet
communication system, the portable files are very much necessary for areas
like e-business, etc. The Java compiler belongs to this category. The java
compiler produces a class file. Separate Java Virtual machines are available
for almost all kind of machine architecture. So this class file could be
sent through any network and be executed on the JVM of that particular machine.
Separate compilation, for each machine, is avoided by using this technique.
FIGURE 2 – COMPARISON C AND JAVA COMPILERS
Compilers are sometimes classified as single-pass, multi-pass, load-and-go,
debugging, or optimizing, depending on how they have been constructed or
on what function they are supposed to perform. Despite this apparent complexity,
the basic tasks that any compiler must perform are essentially the same.
There lies the difference between the compiler designer and compiler user.
The compiler user is unaware of all the inner details of the functioning
of the compiler. This project aims at developing strong knowledge over the
compiler designing techniques.
Our knowledge
about how to organize and write compilers has increased vastly since the
first compilers started to appear in the early 1950’s. We have since developed
systematic techniques for handling many of the important tasks that occur
during compilation. Good implementation languages, programming environments
and software tools have also been developed. In our project we implement
the compiler using the Java language. The grammar for which we develop the
compiler is the trimmed version of the Java grammar. The Java grammar has
vast features like inheritance, encapsulation, etc. It would e difficult to
implement all of them in a one-semester project. So a simple version of Java
grammar is chosen. But at the same time it is made sure that all the important
features of the language and compiler design are included as the parts of
the project.
There are two
parts to compilation: analysis and synthesis. These two parts are implemented
as several phases in the project. Compilers become a part of the several
tools. To mention a few, Structure editors, Pretty printers, Static checkers,
Interpreters. Other areas are Test formatters, Silicon compilers and query
interpreters. Our compiler is implemented in 5 different phases. And finally
a byte coded class file is created and could be executed in any standard
JVM. All these details are treated in a detailed fashion in the following
sections.
2.2 RECURSIVE DESCENT PARSING
Recursive descent parsing is a top-down method of syntax analysis in which
we execute a set of recursive procedures to process the input. A procedure
is associated with each non terminal of the grammar. There is a special
kind of RDP called as the predictive parsing in which the look ahead symbol
unambiguously determines the procedure selected for each non terminal. The
sequence of procedures called in processing the input implicitly defines a
parse tree for the input.
It is a type
of top – down parsing. This top – down parsing can be viewed as an attempt
to find a leftmost derivation for an input string. Equivalently it can be
viewed as an attempt to construct a parse tree for the input starting from
the root and creating the nodes of the parse tree in preorder. In the case
of predictive parsing, no backtracking is required.
To construct a predictive parser,
we must know, given the current input symbol ‘a’ and the non terminal A
to be expanded, which one of the alternatives of production
A → α 1 | α 2 | ………| α n is the unique alternative that derives a
string beginning with a. That is, the proper alternative must be detectable
by looking at only the first symbol it derives. Flow-of-control constructs
in most programming languages, with their distinguishing keywords, are usually
detectable in this way.
The RDP could
also be implemented by using the stack instead of recursive function calls.
But to avoid complexity, we apply the recursive function calls. Also this
RDP is very much helpful in applying the error recovery schemes. To name
a few, Panic mode, Phrase level mode, Error Productions and Global Corrections.
But due to the time constraints we do not involve in the error recovery schemes.
We maintain the error management to the level of indication of errors with
appropriate error messages and approximate location of the error. So for
our grammar also we have developed a recursive descent type of the parser.
The implementation details of the parser are given in the syntax analysis
phase.