Project Documentation JCOMP Subset of Java Compiler

Compiler Design has been an active topic of research and development since the mid -1950s. Since that time, both researchers and practitioners have improved and supplanted them, repeatedly, with more effective ones. Compilers are tools that generate efficient mappings from programs to machines. The language design continues to change, the target architecture continues to change, and the programs become even more ambitious in their scale and complexity. Compiler design spans several areas like word processors, pretty printers, structure editors and silicon compilers. This project aims at developing an efficient compiler for subset of Java grammar. The target language is the Java Byte Code, able to run on any standard JVM.

Java is the Language of the Internet. The Java Programming Language enables the source language to be interpreted to an intermediate machine independent format called the Java Bytecode Format that is able to run on Java Virtual Machine (JVM). This intermediate format enables the class file to run on any machine architecture and operating system provided that the Java virtual machine supports that operating system. In this way, Java makes ‘Write Once, Run anywhere’ possible. The source files are interpreted to class file and could be run on any machine. The grammar of Java programming language is published by Sun Microsystems. There are at least 10 – 25 popular java compilers around the globe. The javac compiler provided by Sun Microsystems is the most standard and trusted Java compiler.

This effort is aimed at generating a subset of Java compiler. The grammar of the thus developed compiler is the trimmed version of the original Java grammar. The resulting grammar includes all the essential parts of the Java grammar. But it ignores those features that are added merely for comfort of the user. This academic project primarily focuses on the efficiency of the compilation process and aims at getting a strong knowledge base on Compiler design, Java language specification and Java virtual machine specification. The final output of this compiler is a class file. This class file is able to run on any Java virtual machine.

This project is implemented in five phases. These phases are implemented in such a way that each phase is independent on its own logic but depend on the output of the previous phase. The phases are Grammar generation phase, Lexical analysis, Parsing phase, Abstract Syntax tree construction, Symbol Table generation, Semantic Checker, and Byte Code generator.. The input is given in the form of a file. A jar file is created as the installation component of this project.

In the Lexical analysis phase, the input source program is read as character. They are then compared with the fixed list of keywords. If the input matches with the keyword, a token is formed and returned. Else the word is considered as an identifier. Errors are notified if the input does not match with the specified grammar.

In the Parsing phase, the non – left recursive grammar is used. A descent recursive parser has been developed. So each non-terminal is assigned a function which consists of recursive calls to other non – terminals. Indefinite loops are avoided by non – left recursive grammar. A new token is obtained from the lexical analysis phase when a terminal is encountered. When a mismatching token is obtained from the Lexical analysis phase, error is indicated along with the line number. The output of the (initial) parsing is a simple message on successful completion or notification of errors during failure.

In the Abstract Syntax Tree construction phase, each type of accepted sentences of the specified grammar is assigned a class. The punctuation rules of the parsing phase is avoided. The class variables are other classes like Terminals, etc. A Syntax tree is formed so that each node is a valid sentence (of the specified grammar) whose components are its child nodes. This phase is integrated with the parsing phase. So, in the parsing phase, each (non – terminal) function returns its corresponding class. Now, the output of the parser phase is the abstract syntax tree. This abstract syntax tree is parsed in the later phases for context checking and byte code generation.

In the Symbol Table Generation method, Le – Blanc Cook symbol table method is applied. It uses two hash tables and one scope stack. The second hash table is embedded into each row of the first table. The key values are the name of the identifier, for the first table, and the scope value, for the second table. During the lookup, the scope stack is analyzed and the nearest scope to top of the stack is assumed to be the current scope and appropriate value from the second table is returned. It returns false if the variable is not found or during a name clash. Unique ID is used for each scope.

In the semantic analysis phase, a set of contextual constraints is fixed. These constraints are checked while walking through the nodes of the Syntax tree. During the traversal the identifiers the type values are inserted, checked and validated. In the code generation phase, a class file is formed. This class file generation uses the file jvm.jar, written by Joshua Engel, and the file is available under GNU Library General Public License. The class file format is byte code format as specified in the Java virtual machine specification. This class file runs in any JVM. The implementation language is Java. Thus the compiler construction is completed.