Chapter 4
The Java Virtual Machine

Java source code files are compiled into .class byte code files. The .class file will often be available for a class but the corresponding .java source code file will not be. In these cases, with a little effort, it’s possible to derive an astounding amount of information from the .class file alone.

Reading Compiled Files

Does the program in Listing 4-1 look familiar? I guarantee you’ve seen it before, probably many times.

Listing 4-1 Mystery code, Version 1

   CAFEBABE0003002D002008001D07001E07000E07001C0700160A0003000
   9090004000A0A0005000B0C000C00150C0014001B0C001A001F01000770
   72696E746C6E01000D436F6E7374616E7456616C75650100136A6176612F
   696F2F5072696E7453747265616D01000A457863657074696F6E7301000F
   4C696E654E756D6265725461626C6501000A536F7572636546696C65010
   00E4C6F63616C5661726961626C6573010004436F64650100036F7574010
   015284C6A6176612F6C616E672F537472696E673B29560100106A6176612
   F6C616E672F4F626A6563740100046D61696E01000F48656C6C6F576F726
   C642E6A617661010016285B4C6A6176612F6C616E672F537472696E673B2
   9560100063C696E69743E0100154C6A6176612F696F2F5072696E7453747265616D3B010
   0106A6176612F6C616E672F53797374656D01000C48656C6C6F
   20576F726C642101000A48656C6C6F576F726C6401000328295600000002000500000000
   000200090017001900010013000000250002000100000009B200071201B60006B1000000
   0100100000000A00
   0200000005000800030001001A001F000100130000001D00010001000000052AB70008B1
   00000001
   00100000000600010000000100010011000000020018

No? What if I write it like Listing 4-2?

Listing 4-2 Mystery code, Version 2

    ??æ___-_ __”__-____”___
   ___          ___

   ___@sr_____________println__
   ConstantValue___java/io/PrintStream__
   Exceptions___LineNumberTable__
   SourceFile__LocalVariables___Code___out___(Ljava/lang/String;)V___java/lang/Object___mai
   n___HelloWorld.java___([Ljava/lang/String;)V___<init>___Ljava/io/PrintStream;___java/lang/S
   ystem__Hello World!__
   HelloWorld___()V_____________     ___________%_______
   =____?__±_________
   ______________________”________*?__±__________________________

That’s a little better. You can guess that this has something to do with Java because the word Java and various Java keywords seem to show up. There’s also the string “Hello World” repeated a couple of times. This code isn’t very long, so just maybe this is a hello world program. Then again, maybe not. Let’s look at this same program another way in Listing 4-3.

Listing 4-3 Mystery code, Version 3

   Compiled from HelloWorld.java
   class HelloWorld extends java.lang.Object
      /* ACC_SUPER bit set */
   {
      public static void main();
      HelloWorld();

   Method void main()
      0 getstatic #7 <Field java.lang.System.out Ljava/io/PrintStream;>
      3 ldc #1 <String “Hello World”>
      5 invokevirtual #8 <Method java.io.PrintStream.print(Ljava/lang/String;)V>
      8 return

   Method HelloWorld()
      0 aload_0
      1 invokespecial #6 <Method java.lang.Object.<init>()V>
      4 return

   }

Now we’re getting somewhere. This is obviously a class called HelloWorld. It extends java.lang.Object. The class has two methods. The main() method is public static and void and takes an array of strings as arguments. The constructor HelloWorld() is public and takes no arguments.

However, what are all those funky lines like these?

      0 getstatic #7 <Field java.lang.System.out Ljava/io/PrintStream;>

and

      5 invokevirtual #8 <Method
    java.io.PrintStream.print(Ljava/lang/String;)V>

That doesn’t look like Java!

Finally, let’s look at the same program one more way in Listing 4-4.

Listing 4-4 Mystery code, Version 4

   class HelloWorld {

    public static void main (String args[]) {

     System.out.println(“Hello World!”);

    }

   }

Listing 4-4 is obviously the classic Hello World program in Java, although it seems not nearly as complex as the last example. Believe it or not, all four of these programs are the same, just viewed differently.

Listing 4-1 is pure hexadecimal and comes from the file HelloWorld.class. This is what you’d see by looking at the file with a disk editor such as Norton Disk Editor. Listing 4-5 is a simple application that you can use to read files as hexadecimal digits.

Listing 4-5 HexReader

   import java.awt.*;
   import java.io.*;

   public class HexReader extends Frame {

    TextArea output = new TextArea();
    Button OpenFile = new Button(“Open File”);

    public static void main (String[] args) {

     HexReader h = new HexReader();
     Toolkit t = Toolkit.getDefaultToolkit();
     Dimension d = t.getScreenSize();

     h.init();
     h.resize(d.width/2, d.height/2);
     h.move(d.width/4, d.height/4);
     h.show();

    }

    public HexReader() {

     super(“HexReader”);

    }

    public void init() {

     Panel p = new Panel();
     p.setLayout(new FlowLayout());
     p.add(OpenFile);
     add(“South”, p);
     add(“Center”, output);
    }

    public boolean action(Event e, Object what) {

     if (e.target == OpenFile) {
      File f = chooseFile();
      printFile(f);
      return true;
     }

     return false;

    }

    public File chooseFile() {

     FileDialog fd = new FileDialog(new Frame(),
     “Please choose a file:”, FileDialog.LOAD);
     fd.show();

     return new File(fd.getDirectory(), fd.getFile());

    }

    public void printFile(File f) {

     try {
      output.setText(“”);
      FileInputStream fin =  new FileInputStream(f);
      byte[] buffer = new byte[(int) f.length()];
      int bytesread = fin.read(buffer);
      output.setText(hexprint(buffer));
     }
     catch (Exception e) {
     }

    }

    public String hexprint(byte[] b) {

     StringBuffer sb = new StringBuffer(b.length * 2);
     for (int i = 0; i < b.length; i++) {
      sb.append(BitsToChar(b[i] >> 4));
      sb.append(BitsToChar(b[i] & 0x0000000F));
     }
     return sb.toString();
    }

    public char BitsToChar(int bits) {

     int j = bits & 0x0000000F;
     switch (j) {
      case 0: return ‘0’;
      case 1: return ‘1’;
      case 2: return ‘2’;
      case 3: return ‘3’;
      case 4: return ‘4’;
      case 5: return ‘5’;
      case 6: return ‘6’;
      case 7: return ‘7’;
      case 8: return ‘8’;
      case 9: return ‘9’;
      case 10: return ‘A’;
      case 11: return ‘B’;
      case 12: return ‘C’;
      case 13: return ‘D’;
      case 14: return ‘E’;
      case 15: return ‘F’;
      default:
       throw new IllegalArgumentException(j +
       “ is not a valid value for a hexadecimal digit.”);

     }

    }

   }

The main() method initializes the Frame shown in Figure 4-1. It has a TextArea field called output where the actual hex data appears and a single button with the label “Open File.” When the users click the Open File button, they see a file dialog box in which they can choose a file. The program passes the chosen file to the printFile() method. The printFile() method opens the file, connects an input stream to it so that the contents can be read, and reads the contents into a byte array called buffer. Then the buffer is passed to the hexprint() method to get a hexadecimal string that is displayed in the output TextArea.


Figure 4-1  The HexReader application.

Our main interest here is in the hexprint() method, so let’s take a closer look at it. The argument to hexprint() is a byte array, b. It returns a string that contains a hexadecimal printout of those bytes. Each byte in the array is read in order and converted to two hexadecimal digits.

To convert a byte to two hex digits, it is first split into its first four bits (b[i] >> 4) and its last four bits (b[i] & 0x0000000F). The result of each of these calculations is an int. This int is passed to BitsToChar(). BitsToChar() is little more than a switch statement that converts a single int between 0 and 15 to a hexadecimal digit between 0 and F. Numbers outside the acceptable range (greater than 15 or less than 0) cause a new IllegalArgumentException to be thrown. This is a RuntimeException, so you don’t need to catch or declare it. Each of the chars is appended to the temporary StringBuffer sb. Finally sb.toString() is returned.

A raw hex dump of a file is not very informative, although you can learn a little from it. All Java .class files should begin with the 4-byte magic number 0xCAFEBABE — that is -889,275,714 in decimal. If you don’t see this number at the beginning of the file, then you know it’s not a valid Java byte code file, even if the file name ends in .class.

Bytes four and five of a .class file (the two bytes immediately following CAFEBABE) show the minor version of the compiler that produced this file. The two bytes after that show the major version of the compiler. In this example, the minor version is 0x0003 (3), and the major version is 0x002D j(45). When a Java virtual machine reads a .class file, it checks to see if it understands that version of the format. A virtual machine can generally read all of the minor versions in a major version, but if the major version changes, a new virtual machine is required. Some virtual machines may also understand older major versions, but they should not attempt to read files with newer major versions. The .class file format is actually more stable than the language and the API. Both Java 1.0.2 and Java 1.1 use the 45.3 .class file format.

The remaining digits all have meanings, but pulling them out by hand is excruciatingly painful. In fact, even in total disaster situations (such as when your hard disk has crashed, taking with it three months of un-backed-up, mission-critical .class files while the corresponding .java files are completely lost), you would probably copy the byte codes out of the file by hand and manually enter them into another computer where you could decompile them.

The next variant, Listing 4-2, was obtained by forcing the file open in a text processor (specifically BBEdit). The printout here has thrown away a few characters, such as page break (ASCII), that would have completely screwed up the formatting of this book. This looks awful, but it’s a quick-and-dirty way to get a look at the String constants in a file.

I remember a stock trader in the early days of PCs who didn’t like one of the messages that his program gave him, so he opened up the DOS .exe file in WordPerfect, searched for the offending string, replaced it, and saved the file. Amazingly, the program worked with his “user modification.” I do not recommend this as general practice. There’s not much else to be learned here, so let’s move on.

Listing 4-3 is composed of disassembled byte code. This is much more useful than Listing 4-2. You get to see the name of the class, all imported classes, and all methods and fields. With a little effort, you can learn to read the byte codes as you would read someone else’s source code. This listing was produced with the JDK’s javap program with the -c command line flag — that is:

   % javap -c HelloWorld

I’ll develop a different byte code disassembler later in this chapter and the next.

Why would you want to do this when you can look at the .java source code file instead? The short answer is that you’ll almost never do this instead of looking at the .java file. However, it’s not uncommon to want to investigate the code of a class for which you do not have original source code. You’ve probably become accustomed to using your Web browser’s View Source command to find out how someone did a neat HTML trick. With the techniques and tools you’ll develop in this chapter, you’ll have an effective View Source equivalent for Java .class files.

Let’s begin by looking at the source and byte codes for the main() method of the HelloWorld class.

     public static void main (String args[]) {     Method void main()

        System.out.println(“Hello World!”);
        0 getstatic #7 <Field java.lang.System.out Ljava/io/PrintStream;>
              3 ldc #1 <String “Hello World”>
                5 invokevirtual #8 <Method java.io.PrintStream.print(Ljava/lang/String;)V>
                8 return
      }

     }

Lining them up side-by-side, you can see that these four lines

      0 getstatic #7 <Field java.lang.System.out Ljava/io/PrintStream;>
      3 ldc #1 <String “Hello World”>
      5 invokevirtual #8 <Method java.io.PrintStream.print(Ljava/lang/String;)V>
      8 return

are probably somehow equivalent to the single source code line:

   System.out.println(“Hello World!”);

See if you can figure out how. The numbers on the left of each line start counting at zero. They’re indices into the byte codes for this method. This is just a series of bytes in a particular place in memory.

The first byte is an instruction: getstatic. The argument to this instruction is #7. This refers to the seventh entry in the constant pool for this class. It so happens that the seventh entry in this particular pool is java.lang.System.out, an instance of java.io.PrintStream. You know that System.out is a static field in the System class of type PrintStream, so it seems logical to interpret getstatic as a command to retrieve a reference to a static class. In this case, it retrieves a reference to the System.out class and places it on the stack.

The next instruction is ldc, which stands for “load constant.” It has one argument: the integer constant #1. The 1 tells it which constant to load from the constant pool. In this case, it loads the first constant in the pool, which happens to be the string “Hello World!” Because “Hello World” is the only constant literal in HelloWorld.java, it’s not surprising that it’s the first one in the pool. A reference to this string is placed on the top of the stack.

The next instruction is invokevirtual. This instruction calls instance methods. In this case, it calls the eighth entry in the constant pool, the method println() of the java.io.PrintStream class. The arguments for println() are taken from the stack. In this case, the top of the stack has a reference to the String object HelloWorld. The object whose method it should call is one level deeper in the stack. That’s the reference to System.out placed there by getstatic.

The last instruction is return. There was no return statement in main(), but Java puts one here anyway. Writing return is optional in void methods. The compiler is smart enough to add a blank return for you if your void method requires one. However, in a non-void method, you have to return explicitly, because the compiler, although it knows you have to return, does not know what value you want to return.

The next method is the constructor HelloWorld(). The .java source code file did not include a constructor. However, the compiler puts a default constructor that takes no arguments in the byte code anyway.

   Method HelloWorld()
     0 aload_0
     1 invokespecial #6 <Method java.lang.Object.<init>()V>
     4 return

   }

The aload instruction loads a reference from a local variable. In this case, it loads the zeroth local variable. This local variable is the string “java.lang.Object()”. This becomes the argument to the next instruction. The next instruction, invokespecial, is used to call the superclass’s constructor from the subclass’s constructor. Finally, the return instruction transfers control back to the calling method.

What’s most interesting about this method is that none of it is in the Java source code. All Java classes have a constructor that takes no arguments if there are no other explicit constructors. Furthermore, all constructors call their superclass’s constructor before they do anything else, even if there isn’t an explicit super() call in the first line of the subclass’s constructor.

In the remainder of this chapter, you’ll explore the Java .class file format to see how you change raw hexadecimal bytes such as those in Listing 4-1 into something more intelligible like the byte code in Listing 4-4.

Reading Class Files

A Java .class file has 16 parts. Eleven of the parts always occupy the same number of bytes. For example, the magic number 0xCAFEBABE is always four bytes, never two bytes and never eight bytes. Five of the parts are of varying length. For example, longer methods must have more byte codes than shorter methods. Table 4-1 lists the 16 parts of every Java .class file in order. These parts always occur in exactly this order. The first step to disassembling a Java .class file is to break it up into these parts.

Table 4-1 The 16 parts of a .class file

Field Width (bytes) Meaning

magic

4

This identifies the .class file format. It should be 0xCAFEBABE. If it’s anything else, you’re dealing with a format more recent than this book.

minor version

2

The minor version of the compiler

major version

2

The major version of the compiler

constant pool

variable

The first two bytes give the number of entries in the constant pool. Then, as many bytes as are necessary to fill that many entries are read. The constant pool is a table of constant values used by this class.

access flags

2

These bit flags tell you whether the class is public, final, abstract, an interface, and a few other things.

this class

2

This tells you which entry in the constant pool holds this class’s class info.

superclass

2

If this is zero, then this class’s only superclass is java.lang.Object. Otherwise, this is an index into the constant pool for the superclass class info.

interfaces

variable

The interface table holds two byte indices into the constant pool table, one for each interface that this class implements. The first two bytes give the number of entries in the interface table. Therefore, after reading the first two bytes, you have to read twice as many bytes as the number stored in the first two bytes.

fields

variable

The fields table includes one field’s info structure for each field in the class.

methods

variable

The method table contains the byte codes for each method in the class, the return type of the method, and the types of each argument to the method.

attributes

2

The attributes of the class


Listing 4-6 is a skeleton of a program that will disassemble byte code files. It reads the name of a file from the command line, opens a FileInputStream to the file, chains a DataInputStream to that FileInputStream, and then proceeds to read bytes out of the file. Or at least it will as soon as the skeleton is filled out. It would be simple enough to add a graphical interface to this program, as I did with Listing 4-5, but let’s leave that as an exercise for you to explore.

Listing 4-6 Disassembler skeleton

   import java.io.*;
   import java.awt.FileDialog;
   import java.awt.Frame;

   public class Disassembler {

    DataInputStream theInput;
    PrintStream theOutput;

    public static void main (String[] args) {

     try {
      Disassembler d = new Disassembler();
      d.disassemble();

      }
      catch (Exception e) {
       System.err.println(e);
       e.printStackTrace();
      }

     }

     public Disassembler (String theFile, OutputStream os) throws IOException {

      this(new File(theFile), os);

     }

     public Disassembler (File theFile, OutputStream os) throws IOException {

      FileInputStream fis = new FileInputStream(theFile);
      theInput = new DataInputStream(fis);
      theOutput = new PrintStream(os);

     }

     public Disassembler (OutputStream os) throws IOException {

      this(chooseFile(), os);

     }

     public Disassembler () throws IOException {

      this(chooseFile(), System.out);

     }

     public static File chooseFile() {

      FileDialog fd = new FileDialog(new Frame(),
       “Please choose a file:”, FileDialog.LOAD);
      fd.show();

      return new File(fd.getDirectory(), fd.getFile());

     }

     public void disassemble() throws IOException {

      try {
       readMagic();
       readMinorVersion();
       readMajorVersion();
       readConstantPool();
       readAccessFlags();
       readClass();
       readSuperclass();
       readInterfaces();
       readFields();
       readMethods();
       readAttributes();
       // Output the file
       writeImports();
       writeAccess();
       writeClassName();
       writeSuperclass();
       writeInterfaces();
       writeFields();
       writeMethods();
       theOutput.println(“}”);
       theOutput.println(“\n/*\n” + thePool + “\n*/”);

      }
      catch (ClassFormatError e) {
       System.err.println(e);
       return;
      }

     }

     int magic;

     void readMagic() throws IOException {

     }

     void readMinorVersion() throws IOException {

     }

     void readMajorVersion() throws IOException {

     }

     void readConstantPool() throws IOException {

     }

     void readAccessFlags() throws IOException {

     }

     void readClass() throws IOException {

     }

     void readSuperclass() throws IOException {

     }

     void readInterfaces() throws IOException {

     }

     void readFields() throws IOException {

     }

     void readMethods() throws IOException {

     }

     void readAttributes() throws IOException {

     }

     public void writeAccess() {

     }

     public void writeClassName() {

     }

     public void writeSuperclass() {

     }

     public void writeImports() {

     }

     public void writeInterfaces() {

     }

     public void writeFields() {

     }

     public void writeMethods() {

     }

     public String getExceptions(MethodInfo mi) {

     }

     public String getCode(MethodInfo mi) {

     }

     public String getReturnType(MethodInfo mi) {

     }

     public String getArguments(MethodInfo mi) {

     }

     public String decodeDescriptor(String d) {

     }

   }

The key method of this program is disassemble(). This is the method that actually reads the bytes. It does this by calling the 11 methods — readMagic(), readMinorVersion(), readMajorVersion(), readConstantPool(), readAccessFlags(), readClass(), readSuperclass(), readInterfaces(), readFields(), readMethods(), and readAttributes() — in that order. You have to call them in that order because, except for the first few parts, the parts don’t start on any particular byte. For example, to find where the seventh part begins, you have to pick up where the sixth ended, and so on.

Once all the pieces have been read, you write them back out with writeImports(), writeAccess(), writeClassName(), writeSuperclass(), writeInterfaces(), writeFields(), and writeMethods(). You must write these in the order in which they normally appear in a .java source code file, not in the order in which they were read in the byte code file. Indeed, some of these parts, like the import statements, are not specifically included in the compiled file but can be deduced from it.

In the event that there’s a problem with a .class file, a java.lang.ClassFormatError appears. Once this program encounters an invalid file, it prints out an error message and stops executing.

It is rather unusual to catch an error rather than an Exception. You can’t normally recover from a ClassFormatError because it means that one of the classes that the program needs will not be available. However, because you’re not loading the class, but just parsing it, you have a little more leeway. This might get you into trouble if a ClassFormatError that you did not create yourself bubbles up during the file parsing. This is rather unlikely, but a somewhat more robust solution would provide a means to distinguish between the ClassFormatErrors that indicate a problem with the file being parsed and the ClassFormatErrors thrown by the Java VM when it fails to load a requested class.

The following sections explain each part in detail and fill in the code needed to make these methods work. The next chapter, Chapter 5, expands the readMethods() method to provide a better analysis of the code inside method bodies.

Magic number

All Java .class files are supposed to begin with the four-byte magic number 0xCAFEBABE, that is -1,258,207,934 in decimal. If you don’t see this at the beginning of the file, then you know that it’s not a valid Java byte code file, even if the file name ends in .class. In this event, the disassembler should throw a ClassFormatError and bail out. The easiest way to read the number is with the java.io.DataInputStream.readInt() method. Listing 4-7 shows the filled-out readMagic() method.

Listing 4-7 A method that reads the magic number and verifies that it is 0xCAFEBABE

   int magic;

   void readMagic() throws IOException {

     magic = theInput.readInt();
     if (magic != 0xCAFEBABE) {
      throw new ClassFormatError(“Incorrect Magic Number: “
       + magic);
     }

   }

Magic numbers are pure byte code phenomena. They do not appear anywhere in the .java source code file. Therefore, there’s no corresponding writeMagic() method.

Minor version

Bytes four and five of a .class file (the two bytes immediately following CAFEBABE) are the minor version of the compiler that produced this file. This is an unsigned 2-byte int (like a char) and can have a value between 0 and 65,535. This book documents minor version 3. The easiest way to read an unsigned 2-byte int is with java.io.DataInputStream.read UnsignedShort() method.


Note:  It’s important to realize that although readUnsignedShort() reads a short, it returns an int. A normal signed Java short can’t hold values up to 65,535 as an unsigned short can.


Listing 4-8 is the fleshed readminorVersion() method.

Listing 4-8 The readMajorVersion() method

   void readMinorVersion() throws IOException {

     minor_version = theInput.readUnsignedShort();
     if (minor_version != 3) {
      throw new ClassFormatError(“Minor Version not 3”);
     }

    }

Like magic numbers, minor and major versions are part of only the byte code, not the source code. Therefore, this method checks only that the minor version is what it’s expected to be. It doesn’t need to be saved because it doesn’t have any effect on what comes after it.

Major version

The next two bytes are the major version of the compiler. Like the minor version, the major version is a 2-byte, unsigned integer. The major version you expect is 45. The parser is unlikely to be able to read anything different, so the program throws a ClassFormatException and bails out. Listing 4-9 fleshes out this method:

Listing 4-9 The readMajorVersion() method

   void readMajorVersion() throws IOException {

    major_version = theInput.readUnsignedShort();
    if (major_version != 45) {
     throw new ClassFormatError(“Major Version not 45”);
    }
   }

Constant pool

The constant pool is a data structure that stores all the constants in a program, not just literals like 1.0 or 72, but also class structures, method references, and the names and types of variables. The disassembler needs to make frequent reference back to this data structure when parsing later parts of the file. Many other entries in the .class file simply refer back to constants stored in the constant pool. It is therefore necessary to create a data structure to hold the constants for later reference.

The largest difficulty in this endeavor is that the constant pool has to hold values of 11 different types. To make matters worse, six of those types are reference types and five are primitive types. Because this is relatively complex, I’m going to push all the details into a new class called ConstantPool. The Disassembler class will simply call the ConstantPool() constructor, as shown in Listing 4-10. The ConstantPool() constructor will read the data out of theInput and parse it, and a new ConstantPool object will be stored in the field thePool.

Listing 4-10 The ConstantPool() constructor

   ConstantPool thePool;

   void readConstantPool() throws IOException {

    thePool = new ConstantPool(theInput);

   }

The exact size of the constant pool depends on what’s inside it. The first two bytes of this part of the file are an unsigned short specifying the number of entries in the constant pool. This number must be greater than zero. The first constant pool entry is reserved for the virtual machine’s use. Therefore, there is actually one less than this number of actual entries to be read. However, entries of different types can have different sizes.

Listing 4-11 is the ConstantPool class. This class does two things: first, it reads the constant pool from the file; second, it responds to requests for items from the constant pool.

The ConstantPool class is implemented as an array of PoolEntry objects. Listing 4-12 is the PoolEntry class. A PoolEntry object can hold one item that has one of the 11 different types and classes that can be stored in the constant pool. The ConstantPool constructor first reads two bytes from the InputStream as an unsigned short. This specifies the number of entries in the ConstantPool so that it can decide how large to make the PoolEntry array. Then, it passes InputStream to the PoolEntry() constructor enough times to fill the array. The PoolEntry() constructor determines the type of that entry in the pool and reads the right number of bytes for that type.

To read a particular entry from the constant pool, you call the properly typed read method of ConstantPool — for example, readDouble(int i) to get a double constant from the pool. These methods retrieve the right PoolEntry from the array and then call that entry’s matching read method.

All information about the type of a PoolEntry is stored in the PoolEntry itself. The user, however, will generally need to know the type of the entry being requested. If the user requests the wrong type from a PoolEntry, then the PoolEntry will throw a ClassFormatError.


Note:  This is not the only way that I could have structured this program. Another possibility would have been to make PoolEntry an abstract class with an abstract read method. There would be subclasses for double, float, ClassInfo, and the other types.


Listing 4-11 The ConstantPool class

   import java.io.*;

   public class ConstantPool {

    PoolEntry[] thePool;

    public ConstantPool(DataInputStream dis) throws IOException {
     int length = dis.readUnsignedShort();
     thePool = new PoolEntry[length];
     for (int i = 1; i < length; i++) {
      thePool[i] = new PoolEntry(dis);
      // Doubles and longs take two pool entries
      // see Java VM Spec., p. 98
      if (thePool[i].tag == PoolEntry.cDouble || thePool[i].tag == PoolEntry.cLong)
       i++;
     }

    }

    public PoolEntry read(int i) {
     return thePool[i];
    }

    public String readUTF8(int i) {
     return thePool[i].readUTF8();
    }

    public int readInteger(int i) {
     return thePool[i].readInteger();
    }

    public float readFloat(int i) {
     return thePool[i].readFloat();
    }

    public double readDouble(int i) {
     return thePool[i].readDouble();
    }

    public ClassInfo readClassInfo(int i) {
     return thePool[i].readClassInfo();
    }

    public RefInfo readMethodRef(int i) {
     return thePool[i].readMethodRef();
    }

    public RefInfo readInterfaceMethodRef(int i) {
     return thePool[i].readInterfaceMethodRef();
    }

    public NameAndType readNameAndType(int i) {
     return thePool[i].readNameAndType();
    }

    public int howMany() {
     return thePool.length;
    }

    public String toString() {
     String result = “”;
     for (int i = 1; i < thePool.length; i++) {
      result += i + “:  “ + thePool[i].toString() + “\n”;
      // Doubles and longs take two pool entries
      // see Java VM Spec., p. 98
      if (thePool[i].tag == PoolEntry.cDouble || thePool[i].tag == PoolEntry.cLong) i++;
     }

     return result;

    }

   }

The PoolEntry class begins with 11 constants to represent the 11 types that may appear in the constant pool. That is, every constant pool entry is preceded by one unsigned byte that signals its type.

The PoolEntry() constructor reads this tag to determine how many bytes it should read. It reads four bytes for integer, float, ClassInfo, FieldRef, MethodRef, NameAndType, and InterfaceMethodRef types. It reads eight bytes for long and double types. ClassInfo and String types take two bytes. Finally, the UTF8 type requires a variable number of bytes, so first you must read one more unsigned short to learn how many bytes are in the UTF8 structure. Once you know how many bytes you need to read, reading them is almost trivial. Just use the read(byte[] b) method of the DataInputStream.


Note:  I decided to store the data as a byte array that will be converted to the appropriate type when requested. I could have performed the conversion immediately in the constructor and stored the converted values rather than the raw bytes. However, this would require many excess fields for each PoolEntry object. For example, if a PoolEntry object is a float, then the UTF8, integer, long, double, and all other fields would be empty. This seems excessively wasteful. However, if you anticipate repeatedly requesting the same entry from the constant pool, then you might want to trade off the extra memory in exchange for reduced CPU time.


Listing 4-12 The PoolEntry class

   import java.io.*;

   public class PoolEntry {

    public final static int cUTF8 = 1;
    public final static int cInteger = 3;
    public final static int cFloat = 4;
    public final static int cLong = 5;
    public final static int cDouble = 6;
    public final static int cClassInfo = 7;
    public final static int cString = 8;
    public final static int cFieldRef = 9;
    public final static int cMethodRef = 10;
    public final static int cInterfaceMethodRef = 11;
    public final static int cNameAndType = 12;

    int tag;
    byte[] data;

    public PoolEntry(DataInputStream dis) throws IOException {

     tag = dis.readUnsignedByte();
     int bytesToRead;
     switch (tag) {
      case cLong:
      case cDouble:
       bytesToRead = 8;
       break;
      case cInteger:
      case cFloat:

      case cFieldRef:
      case cMethodRef:
      case cNameAndType:
      case cInterfaceMethodRef:
       bytesToRead = 4;

       break;

      case cClassInfo:
      case cString:
       bytesToRead = 2;
       break;
      case cUTF8:
       bytesToRead = dis.readUnsignedShort();
       break;
      default:
       throw new ClassFormatError(“Unrecognized Constant Type “ + tag);
     }

     data = new byte[bytesToRead];

     int check = dis.read(data);
     if (check != data.length) {
      throw new ClassFormatError(“Not enough data to fill array”);
     }

    }

    public String readUTF8() {
     if (tag != cUTF8) {
      throw new ClassFormatError
      (“This is not a UTF8 string “);
     }
     try {
      // first put length of string back in string
      int len = data.length;
      byte[] newdata = new byte[len+2];
      newdata[0] = (byte) (len >> 8);
      newdata[1] = (byte) len;
      System.arraycopy(data, 0, newdata, 2, data.length);
      ByteArrayInputStream bis = new ByteArrayInputStream(newdata);
      DataInputStream dis = new DataInputStream(bis);
      return dis.readUTF();
     }
     catch (IOException e) {
      throw new ClassFormatError(e + “ Bad UTF8 string”);
     }

    }

    public int readInteger() {
     if (tag != cInteger) {
      throw new ClassFormatError
      (“This is not an integer.”);
     }
     return data[0] << 24 | data[1] << 16 | data[2] << 8 | data[3];

    }

    public int readLong() {
     if (tag != cLong) {
      throw new ClassFormatError
      (“This is not a long.”);
     }
     return data[0] << 56 | data[1] << 48 | data[2] << 40 |
      data[3] << 32 | data[4] << 24 | data[5] << 16 | data[6] << 8 |
      data[7];

    }

    public float readFloat() {
     if (tag != cFloat) {
      throw new ClassFormatError
      (“This is not a float”);
     }
     int bits = data[0] << 24 | data[1] << 16 | data[2] << 8 | data[3];
     return Float.intBitsToFloat(bits);

    }

    public double readDouble() {
     if (tag != cDouble) {
      throw new ClassFormatError
      (“This is not a double”);
     }
     long bits = (long) data[0] << 56 | (long) data[1] << 48
      | (long) data[2] << 40 | (long) data[3] << 32 | (long) data[4] << 24
      | (long) data[5] << 16 | (long) data[6] << 8 | (long) data[7];
     return Double.longBitsToDouble(bits);

    }

    public ClassInfo readClassInfo() {
     if (tag != cClassInfo) {
      throw new ClassFormatError
      (“This is not a ClassInfoStructure”);
     }
     return new ClassInfo(tag, data[0] << 8 | data[1]);

    }

    public RefInfo readFieldRef() {
     if (tag != cFieldRef) {
      throw new ClassFormatError
      (“This is not a FieldRefStructure”);
     }
     return new RefInfo(tag, data[0] << 8 | data[1],
     data[2] << 8 | data[3]);

    }

    public RefInfo readMethodRef() {
     if (tag != cMethodRef) {
      throw new ClassFormatError
      (“This is not a methodRef”);
     }
     return new RefInfo(tag, data[0] << 8 | data[1],
     data[2] << 8 | data[3]);

    }

    public RefInfo readInterfaceMethodRef() {
     if (tag != cInterfaceMethodRef) {
      throw new ClassFormatError
      (“This is not an InterfaceMethodRef”);
     }
     return new RefInfo(tag, data[0] << 8 | data[1],
     data[2] << 8 | data[3]);

    }

    public NameAndType readNameAndType() {
     if (tag != cNameAndType) {
      throw new ClassFormatError
      (“This is not a Name and Type structure”);
     }
     return new NameAndType(tag, data[0] << 8 | data[1],
     data[2] << 8 | data[3]);

    }

    public int readString() {
     if (tag != cString) {
      throw new ClassFormatError
      (“This is not a String”);
     }
     return data[0] << 8 | data[1];
    }

    public int tag() {
     return tag;
    }

    public String toString() {

     switch (tag) {
      case cLong:
       return “long               “ + String.valueOf(readLong());
      case cDouble:
       return “double             “ + String.valueOf(readDouble());
      case cInteger:
       return “int                “ + String.valueOf(readInteger());
      case cFloat:
       return “float              “ + String.valueOf(readFloat());
      case cFieldRef:
       return “FieldRef           “ + String.valueOf(readFieldRef());
      case cMethodRef:
       return “MethodRef          “ + String.valueOf(readMethodRef());
      case cNameAndType:
       return “NameAndType        “ + String.valueOf(readNameAndType());
      case cInterfaceMethodRef:
       return “InterfaceMethodRef “ + String.valueOf(readInterfaceMethodRef());
      case cClassInfo:
       return “ClassInfo          “ + String.valueOf(readClassInfo());
      case cString:
       return “String             “ + String.valueOf(readString());
      case cUTF8:
       return “UTF8               “ + readUTF8();
       default:
       throw new ClassFormatError(“Unrecognized Constant Type”);
     }

    }

   }

The PoolEntry class has 11 methods to return values. Each of these methods first checks to make sure that the type requested is in fact the type of this object. If the type doesn’t match, then a ClassFormatError is thrown. Once it verifies the type, it converts the data array into a primitive type or object of the appropriate type. In four cases, a new class is required to hold the return type.

The ClassInfo class holds a tag and an index into the constant pool for the name of the class. It appears in Listing 4-13.

Listing 4-13 The ClassInfo class

   public class ClassInfo {

    int nameIndex;
    int tag;

    public ClassInfo(int t, int n) {
     tag = t;
     nameIndex = n;
    }

    public int nameIndex() {
     return nameIndex;
    }

   }

The RefInfo class, shown in Listing 4-14, holds indices for the class and the NameAndType in the constant pool. This is used for method references, field references, and interface method references.

Listing 4-14 The RefInfo class

   public class RefInfo {

    int classIndex;
    int nameAndTypeIndex;
    int tag;

    public RefInfo(int t, int c, int n) {
     tag = t;
     classIndex = c;
     nameAndTypeIndex = n;
    }

    public int classIndex() {
     return classIndex;
    }

    public int nameAndTypeIndex() {
     return nameAndTypeIndex;
    }

   }

Finally, the NameAndType class shown in Listing 4-15 holds indices into the constant pool for a name and a descriptor.

Listing 4-15 The NameAndType class

   public class NameAndType {

    int nameIndex;
    int descriptorIndex;
    int tag;

    public NameAndType(int t, int c, int n) {
     tag = t;
     nameIndex = c;
     descriptorIndex = n;
    }

    public int nameIndex() {
     return nameIndex;
    }

    public int descriptorIndex() {
     return descriptorIndex;
    }

   }

Access flags

The access flags listed in Table 4-2 are stored in the .class file as a 2-byte bit mask. Bit 15 (the ones bit) is set if the class is public. Bit 11 (the sixteens bit) is set if the class is final. Bit 10 is set if invokespecial needs to treat the class specially. (Don’t worry too much about that. I explain what that means in the next chapter.) Bit 6 is set if the class is an interface. Bit 5 is set if the class is abstract. The remaining bits are not yet used.

Table 4-2 Access flags

Bit Mask Meaning if set

0

0x8000

Reserved for future use.

1

0x4000

Reserved for future use.

2

0x2000

Reserved for future use.

3

0x1000

Reserved for future use.

4

0x0800

Reserved for future use.

5

0x0400

This is an abstract class or interface.

6

0x0200

This is an interface.

7

0x0100

Reserved for future use.

8

0x0080

Reserved for future use.

9

0x0040

Reserved for future use.

10

0x0020

This is treated specially by invokespecial.

11

0x0010

This class is final.

12

0x0008

Reserved for future use.

13

0x0004

Reserved for future use.

14

0x0002

Reserved for future use.

15

0x0001

This class or interface is public.


These flags are not independent of each other. If bit 6 is set (this is an interface), then bit 5 must also be set, because all interfaces are abstract. Similarly, a class cannot have both bits 11 and 6 set, because a final class can’t be abstract.

The unused bits in the access flags are reserved for future use. For now, you should ignore them when parsing the file. Listing 4-16 provides the filled-out code to read the access flags. Listing 4-16 also introduces several new boolean fields to allow later methods to know the values of these flags.

Listing 4-16 Reading the access flags

   short access_flags;
   boolean isPublic;
   boolean isFinal;
   boolean isInterface;
   boolean isAbstract;
   boolean isSpecial;

   void readAccessFlags() throws IOException {

    access_flags = theInput.readShort();
    isPublic    = (access_flags & 0x0001) == 0 ? false : true;
    isFinal     = (access_flags & 0x0010) == 0 ? false : true;
    isInterface = (access_flags & 0x0020) == 0 ? false : true;
    isAbstract  = (access_flags & 0x0200) == 0 ? false : true;
    isSpecial   = (access_flags & 0x0400) == 0 ? false : true;
    if (isAbstract && isFinal) {
     throw new ClassFormatError(“This class is abstract and final!”);
    }
    if (isInterface && !isAbstract) {
     throw new ClassFormatError(“This interface is not abstract!”);
    }
    if (isFinal && isInterface) {
     throw new ClassFormatError(“This interface is final!”);
    }

   }

There are a few things to note about this code. First, it is necessary to make an explicit comparison with == and ?: to zero in order to convert the masked short to a boolean. In a language like C or C++, you would simply take zero to mean false.

The next thing to ask yourself is whether the final if clause is really necessary. Given that this code will throw an error if a class is abstract and final or if a class is an interface and not abstract, can it possibly reach the test for being both final and an interface?

thisClass

Next is a 2-byte unsigned short that is an index into the constant pool. At that index in the constant pool, you should find a ClassInfo structure. This ClassInfo structure represents the current class or interface that you’re parsing. Listing 4-17 reads this index and stores the ClassInfo structure it references in a new field: thisClass. Notice how we have to refer back to the constant pool at this point.

Listing 4-17 readClass()

   ClassInfo thisClass;

   void readClass() throws IOException {
    int index = theInput.readUnsignedShort();
    thisClass = thePool.readClassInfo(index);
   }

Superclass

Immediately following the index of this class, you’ll find the index into the constant pool for the ClassInfo structure of this class’s superclass (Listing 4-18). Reading this value is almost identical to the previous method. However, if this class does not have a superclass (that is, if this is java.lang.Object, the only class without a superclass), then the index into the constant pool will be zero. You therefore have to watch out for this special case. If the index is zero, then you should set superclass to null.

Listing 4-18 readSuperclass()

   ClassInfo superclass;

   void readSuperclass() throws IOException {
    int index = theInput.readUnsignedShort();
    if (index == 0) {
     superclass = null;
    }
    else {
     superclass = thePool.readClassInfo(index);
    }
   }

Interfaces

A single class can implement multiple interfaces. First, an unsigned short tells you how many interfaces that this class implements (possibly zero). There are exactly that many unsigned short indices in the constant pool. Each index points to a ClassInfo structure for the implemented interface. Listing 4-19 is the fleshed-out readInterfaces() method. The interfaces are read, resolved, and stored in a new field array called interfaces.

Listing 4-19 The readInterfaces() method

   ClassInfo[] interfaces;

   void readInterfaces() throws IOException {
    interfaces = new ClassInfo[theInput.readUnsignedShort()];
    for (int i =0; i< interfaces.length; i++) {
     interfaces[i] =  thePool.readClassInfo(i);
    }

Attributes

The last thing you read from a .class file is the class’s attributes. Before you get to a class’s attributes, you have to read its fields and methods. However, each field and method also has its own attributes table. Therefore, you should develop the classes needed to read attributes before you need them. This class will read the attributes of the fields, the methods, and the class itself.

An attribute table consists of a specified number of attribute_info structures (see Listing 4-20). Each attribute_info structure consists of one unsigned short that is the name index for this attribute. It’s an index into the constant pool. Next, there’s a 4-byte unsigned int that gives you the length of the attribute’s data. Finally, there’s an array of data.

Listing 4-20 AttributeInfo

   import java.io.*;

   public class AttributeInfo {

    int nameIndex;
    byte[] data;

    public AttributeInfo(DataInputStream dis) throws IOException {
     nameIndex = dis.readUnsignedShort();
     data = new byte[dis.readInt()];
     int bytesRead = dis.read(data);
     if (bytesRead != data.length) {
      throw new ClassFormatError(“Insufficioent bytes in attribute”);
     }

    }

   }

Listing 4-21 is a filled-in readAttributes() method for the Disassembler class. An array of AttributeInfo structures holds the different attributes.

Listing 4-21 readAttributes()

   AttributeInfo[] attributes;

    void readAttributes() throws IOException {
     attributes = new AttributeInfo[theInput.readUnsignedShort()];
     for (int i = 0; i < attributes.length; i++) {
      attributes[i] = new AttributeInfo(dis);
     }
    }

Fields

After you’ve read the interfaces, you next read the class’s fields. Some classes have no fields. For example, the HelloWorld program has only a method. An unsigned short tells you how many fields there are in the class. Then you read that many FieldInfo structures from the file. A FieldInfo structure is composed of five items.

The first unsigned short is the access flags for the field. These tell you whether the field is public, private, protected, static, final, volatile, and/or transient. Table 4-3 lists the bit masks for each of these modifiers. As usual, the bit mask values are chosen so that the bitwise operators can easily pick out individual values. Note that not all of the possible combinations of flags are allowed. For example, a field cannot be both public and private. Each flag is exactly equivalent to a Java keyword, which may modify a field.

Table 4-3 Field access flags

Flag Bit mask

public

0x0001

private

0x0002

protected

0x0004

static

0x0008

final

0x0010

volatile

0x0040

transient

0x0080



 

The 2-byte unsigned short immediately following the access flags is the name index—that is, an index into the constant pool that provides the field name’s location.

Next comes the descriptor index, another 2-byte unsigned short index into the constant pool. This points to a UTF8 structure, which represents a field descriptor.

Next comes the attributes table for this field. You read this by passing the DataInputStream into the AttributeTable constructor. Listing 4-22 is the full FieldInfo class.

Listing 4-22 The FieldInfo class

   import java.io.*;

   public class FieldInfo {

    int accessflags;
    int nameIndex;
    int descriptorIndex;
    AttributeInfo[] attributes;

    public final static int cPublic = 0x0001;
    public final static int cPrivate = 0x0002;
    public final static int cProtected = 0x0004;
    public final static int cStatic = 0x0008;
    public final static int cFinal = 0x0010;
    public final static int cVolatile = 0x0040;
    public final static int cTransient = 0x0080;

    public FieldInfo( DataInputStream dis) throws IOException {
     accessflags = dis.readUnsignedShort();
     nameIndex = dis.readUnsignedShort();
     descriptorIndex = dis.readUnsignedShort();
     attributes = new AttributeInfo[dis.readUnsignedShort()];
     for (int i = 0; i < attributes.length; i++) {
      attributes[i] = new AttributeInfo(dis);
     }

    }

    public int nameIndex() {
     return nameIndex;
    }

    public int descriptorIndex() {
     return descriptorIndex;
    }

    public boolean isPublic() {
     return (accessflags & cPublic) != 0;
    }

    public boolean isPrivate() {
     return (accessflags & cPrivate) != 0;
    }

    public boolean isProtected() {
     return (accessflags & cProtected) != 0;
    }

    public boolean isStatic() {
     return (accessflags & cStatic) != 0;
    }

    public boolean isVolatile() {
     return (accessflags & cVolatile) != 0;
    }

    public boolean isTransient() {
     return (accessflags & cTransient) != 0;
    }

    public boolean isFinal() {
     return (accessflags & cFinal) != 0;
    }

   }

Here’s the fleshed-out readFields() method for the Disassembler class. It’s quite simple, because all the work goes on inside the FieldInfo class.

   FieldInfo[] fields;

    void readFields() throws IOException {
     fields = new FieldInfo[theInput.readUnsignedShort()];
     for (int i = 0; i < fields.length; i++) {
      fields[i] = new FieldInfo[dis];
     }
    }

Methods

The methods table is similar to the fields table. First, there’s an unsigned short to tell you how many methods there are. Then there’s an array of method_info structures. As with the FieldInfo structure, this program keeps all the intelligence inside the MethodInfo constructor. Listing 4-23 is the fleshed-out readMethods() method for the Disassembler class.

Listing 4-23 readMethods()

   MethodInfo[] methods;

   void readMethods() throws IOException {
    methods = new MethodInfo[theInput.readUnsignedShort()];
    for (int i = 0; i < methods.length; i++) {
     methods[i] = new MethodInfo[dis];
    }
   }

The MethodInfo structure is almost identical to a FieldInfo structure. In fact, the only difference is in the permitted values for the access flags and the meaning of the attributes. Listing 4-24 is the MethodInfo class.

Listing 4-24 The MethodInfo class

   import java.io.*;

   public class MethodInfo {
    int accessflags;
    int nameIndex;
    int descriptorIndex;
    AttributeInfo[] attributes;

    public final static int cPublic = 0x0001;
    public final static int cPrivate = 0x0002;
    public final static int cProtected = 0x0004;
    public final static int cStatic = 0x0008;
    public final static int cFinal = 0x0010;
    public final static int cSynchronized = 0x0020;
    public final static int cNative = 0x0100;
    public final static int cAbstract = 0x0400;

    public MethodInfo (DataInputStream dis) throws IOException {
     accessflags = dis.readUnsignedShort();
     nameIndex = dis.readUnsignedShort();
     descriptorIndex = dis.readUnsignedShort();
     attributes = new AttributeInfo[dis.readUnsignedShort()];
     for (int i = 0; i < attributes.length; i++) {
      attributes[i] = new AttributeInfo(dis);
     }

    }

    public int nameIndex() {
     return nameIndex;
    }

    public int descriptorIndex() {
     return descriptorIndex;
    }

    public boolean isPublic() {
     return (accessflags & cPublic) != 0;
    }

    public boolean isPrivate() {
     return (accessflags & cPrivate) != 0;
    }

    public boolean isProtected() {
     return (accessflags & cProtected) != 0;
    }

    public boolean isStatic() {
     return (accessflags & cStatic) != 0;
    }

    public boolean isSynchronized() {
     return (accessflags & cSynchronized) != 0;
    }

    public boolean isNative() {
     return (accessflags & cNative) != 0;
    }

    public boolean isAbstract() {
     return (accessflags & cAbstract) != 0;
    }

    public AttributeInfo[] getAttributes() {
     return attributes;
    }

    public String toString() {
     return “NameIndex: “ + nameIndex + “;\tDescriptorIndex: “ + descriptorIndex;
    }

   }

Putting It All Together

Now that the entire .class file has been read into memory and parsed, it can be output as more-or-less-legible source code. You do not need to output items in the order in which they appeared in the .class file. For example, the first thing outputted will be any import statements in the file. Then you’ll produce the access specifiers for the class and then the class name itself, followed by any interfaces that the class implements. Next come the fields, and then the methods. Along the way, you’ll add in necessary syntax — such as semicolons and keywords — that is normally present in source code but is not included in byte code.

To do this, the Disassembler class needs for eight more methods to be filled out:

   writeImports();
   writeAccess();
   writeClassName();
   writeInterfaces();
   writeFields();
   writeMethods();

Each of these methods will parse the data structures read in the first part of this chapter to collect the needed information.

The 2-byte unsigned short immediately following the access flags is the name index—that is, an index into the constant pool that provides the field name’s location.

Next comes the descriptor index, another 2-byte unsigned short index into the constant pool. This points to a UTF8 structure, which represents a field descriptor.

Next comes the attributes table for this field. You read this by passing the DataInputStream into the AttributeTable constructor. Listing 4-22 is the full FieldInfo class.

Listing 4-22 The FieldInfo class

   import java.io.*;

   public class FieldInfo {

    int accessflags;
    int nameIndex;
    int descriptorIndex;
    AttributeInfo[] attributes;

    public final static int cPublic = 0x0001;
    public final static int cPrivate = 0x0002;
    public final static int cProtected = 0x0004;
    public final static int cStatic = 0x0008;
    public final static int cFinal = 0x0010;
    public final static int cVolatile = 0x0040;
    public final static int cTransient = 0x0080;

    public FieldInfo( DataInputStream dis) throws IOException {
     accessflags = dis.readUnsignedShort();
     nameIndex = dis.readUnsignedShort();
     descriptorIndex = dis.readUnsignedShort();
     attributes = new AttributeInfo[dis.readUnsignedShort()];
     for (int i = 0; i < attributes.length; i++) {
      attributes[i] = new AttributeInfo(dis);
     }

    }

    public int nameIndex() {
     return nameIndex;
    }

    public int descriptorIndex() {
     return descriptorIndex;
    }

    public boolean isPublic() {
     return (accessflags & cPublic) != 0;
    }

    public boolean isPrivate() {
     return (accessflags & cPrivate) != 0;
    }

    public boolean isProtected() {
     return (accessflags & cProtected) != 0;
    }

    public boolean isStatic() {
     return (accessflags & cStatic) != 0;
    }

    public boolean isVolatile() {
     return (accessflags & cVolatile) != 0;
    }

    public boolean isTransient() {
     return (accessflags & cTransient) != 0;
    }

    public boolean isFinal() {
     return (accessflags & cFinal) != 0;
    }

   }

Here’s the fleshed-out readFields() method for the Disassembler class. It’s quite simple, because all the work goes on inside the FieldInfo class.

   FieldInfo[] fields;

    void readFields() throws IOException {
     fields = new FieldInfo[theInput.readUnsignedShort()];
     for (int i = 0; i < fields.length; i++) {
      fields[i] = new FieldInfo[dis];
     }
    }

Methods

The methods table is similar to the fields table. First, there’s an unsigned short to tell you how many methods there are. Then there’s an array of method_info structures. As with the FieldInfo structure, this program keeps all the intelligence inside the MethodInfo constructor. Listing 4-23 is the fleshed-out readMethods() method for the Disassembler class.

Listing 4-23 readMethods()

   MethodInfo[] methods;

   void readMethods() throws IOException {
    methods = new MethodInfo[theInput.readUnsignedShort()];
    for (int i = 0; i < methods.length; i++) {
     methods[i] = new MethodInfo[dis];
    }
   }

The MethodInfo structure is almost identical to a FieldInfo structure. In fact, the only difference is in the permitted values for the access flags and the meaning of the attributes. Listing 4-24 is the MethodInfo class.

Listing 4-24 The MethodInfo class

   import java.io.*;

   public class MethodInfo {
    int accessflags;
    int nameIndex;
    int descriptorIndex;
    AttributeInfo[] attributes;

    public final static int cPublic = 0x0001;
    public final static int cPrivate = 0x0002;
    public final static int cProtected = 0x0004;
    public final static int cStatic = 0x0008;
    public final static int cFinal = 0x0010;
    public final static int cSynchronized = 0x0020;
    public final static int cNative = 0x0100;
    public final static int cAbstract = 0x0400;

    public MethodInfo (DataInputStream dis) throws IOException {
     accessflags = dis.readUnsignedShort();
     nameIndex = dis.readUnsignedShort();
     descriptorIndex = dis.readUnsignedShort();
     attributes = new AttributeInfo[dis.readUnsignedShort()];
     for (int i = 0; i < attributes.length; i++) {
      attributes[i] = new AttributeInfo(dis);
     }

    }

    public int nameIndex() {
     return nameIndex;
    }

    public int descriptorIndex() {
     return descriptorIndex;
    }

    public boolean isPublic() {
     return (accessflags & cPublic) != 0;
    }

    public boolean isPrivate() {
     return (accessflags & cPrivate) != 0;
    }

    public boolean isProtected() {
     return (accessflags & cProtected) != 0;
    }

    public boolean isStatic() {
     return (accessflags & cStatic) != 0;
    }

    public boolean isSynchronized() {
     return (accessflags & cSynchronized) != 0;
    }

    public boolean isNative() {
     return (accessflags & cNative) != 0;
    }

    public boolean isAbstract() {
     return (accessflags & cAbstract) != 0;
    }

    public AttributeInfo[] getAttributes() {
     return attributes;
    }

    public String toString() {
     return “NameIndex: “ + nameIndex + “;\tDescriptorIndex: “ + descriptorIndex;
    }

   }

Putting It All Together

Now that the entire .class file has been read into memory and parsed, it can be output as more-or-less-legible source code. You do not need to output items in the order in which they appeared in the .class file. For example, the first thing outputted will be any import statements in the file. Then you’ll produce the access specifiers for the class and then the class name itself, followed by any interfaces that the class implements. Next come the fields, and then the methods. Along the way, you’ll add in necessary syntax — such as semicolons and keywords — that is normally present in source code but is not included in byte code.

To do this, the Disassembler class needs for eight more methods to be filled out:

   writeImports();
   writeAccess();
   writeClassName();
   writeInterfaces();
   writeFields();
   writeMethods();

Each of these methods will parse the data structures read in the first part of this chapter to collect the needed information.

Import statements

There’s no one place in a .class file where all the import statements are stored. To determine which import statements were in the source code, you have to list all the classes in the constant pool. You might choose to output one import statement for each class, or you might be somewhat more selective. In this example, I have chosen not to produce import statements for the class itself or any classes in java.lang. This makes the disassembled source code more similar to what you actually write in programs. If you wanted to, you could include import statements only for entire packages (for example, import java.util.*) rather than for individual classes. However, I find it convenient to be able to see exactly what classes a particular class references.

To find the classes, you loop through the constant pool and check each entry to see if it’s a ClassInfo structure. It’s important to remember that the zeroth entry in the constant pool is not included in the .class file. When a ClassInfo structure is found, you use its nameIndex() method to get the class’s name as a UTF8 structure from the constant pool. Each name thus retrieved is tested to be sure that it’s not the name of this class and that it’s not a class from java.lang. Assuming neither of these is the case, an import statement for the class is printed. Listing 4-25 demonstrates the writeImports() method.

Listing 4-25 The writeImports() method

   public void writeImports() {

    PoolEntry pe = null;
    String thisname =  thePool.readUTF8(thisClass.nameIndex());
    // recall that there’s nothing in the zeroth pool entry
    for (int i = 1; i < thePool.howMany(); i++) {
     pe = thePool.read(i);
     if (pe.tag() == PoolEntry.cClassInfo) {
      ClassInfo ci = pe.readClassInfo();
      String name = thePool.readUTF8(ci.nameIndex());
      name = name.replace(‘/’,’.’);
       postedif (!name.startsWith(“java.lang.”) && !name.equals(thisname)) {
        theOutput.println(“import “ + name + “;”);
      }
     }
    }

    theOutput.println();

   }

Access specifiers

The writeAccess() method looks at the access specifiers for the class and prints them in Java form. Listing 4-26 has the code.

Listing 4-26 The writeAccess() method

   public void writeAccess() {

    if (isPublic) theOutput.print(“public “);
    if (isFinal) theOutput.print(“final “);
    if (isAbstract) theOutput.print(“abstract “);
    if (isInterface) theOutput.print(“interface “);
    else theOutput.print(“class “);

   }

Note that if a .class file is not an interface, then it must represent a class. Note also that one access flag, isSpecial, has no equivalent in Java source code. It exists only for the use of the compiler and the virtual machine.

Class and superclass

The next thing you want to know is the name of the class. You can easily retrieve this from the thisClass field, which points to the name of the class in UTF8 format in the constant pool (see Listing 4-27).

Listing 4-27 The writeClassName() method

   public void writeClassName() {

    String name = thePool.readUTF8(thisClass.nameIndex());
    theOutput.print(name + “ “);
   }

Next, you want to find out which class this class extends (see Listing 4-28). You have to watch out for the special case of java.lang.Object, which has no superclass. Otherwise, this is very similar to the previous method.

Listing 4-28 The writeSuperclass() method

   public void writeSuperclass() {

    if (superclass.nameIndex() != 0) {
     String name = thePool.readUTF8(superclass.nameIndex());
     theOutput.print(“extends “ + name + “ “);
    }

   }

Interfaces

The interfaces are similar except that there may be more than one of them. When you’re finished outputting all the interfaces, open the class with an opening brace. The writeInterfaces() method is shown in Listing 4-29.

Listing 4-29 The writeInterfaces() method

   public void writeInterfaces() {

    if (interfaces.length > 0) {
     String name = thePool.readUTF8(interfaces[0].nameIndex());
     theOutput.print(“implements “ + name + “ “);
     for (int i=1; i < interfaces.length; i++) {
      name =  thePool.readUTF8(interfaces[i].nameIndex());
      theOutput.print(“, “ + name);
     }
    }
    theOutput.println(“ {“);
   }

I’ve chosen to put the access specifiers, the class name, the class that this extends, all interfaces that this class implements, and the opening brace on a single line of the file. This produces output that looks like:

   public final class myVector extends java.util.Vector implements java.io.Serializable {

Feel free to adjust this to match your preferences. For example, some people prefer to write each of these on separate lines.

   public final class myVector
    extends java.util.Vector
    implements java.io.Serializable
    {

Both versions produce identical byte code, so when you’re working backward from the byte code, there’s no way to distinguish the two cases.

Fields

Only two parts of the file are left: the fields and the methods. Let’s look at the fields first. It’s not at all uncommon for a class to have many fields. You therefore need to loop through all the fields with a for loop. Inside the loop, you check the access specifiers, the name, and the type of each field.

To read the name of the field, you simply read the UTF8 structure in the constant pool at the field’s name index. The type of the field requires more effort. Although it is stored as a UTF8 string in the constant pool at the FieldInfo’s descriptorIndex, the UTF8 string needs to be decoded first. Primitive types like int or char are encoded as single letters. For example, an int is the capital letter I. Table 4-4 lists the encodings for the primitive types.

Table 4-4 Primitive type encodings

B

byte

C

char

D

double

F

float

I

int

J

long

S

short

Z

boolean


Class types are encoded as the capital letter L, followed by the fully qualified class name, followed by a semicolon. Furthermore, for historical reasons, the periods in the fully qualified class name change to forward slashes. Therefore, inside the constant pool, the String class is written as Ljava/lang/String;, the Object class is written as Ljava/lang/Object;, the Vector class is written Ljava/util/Vector;, and so on. Converting this into the format you expect is easy. Just trim the first and last characters of the string with the substring() method and use the replace() method to change the slashes to periods like this:

   String s = “Ljava/lang/String;”;
   String r = d.substring(1, r.length() - 1);
   r = r.replace(‘/’, ‘.’);

The final type you need to deal with are the array types. These are encoded by prefixing the type of the array with left bracket signs ([), one for each dimension in the array. Thus, a double[] array is encoded as [double. A String[][] array is encoded as [[Ljava/lang/String;. To decode array types, you first count the number of left brackets and then recursively call the decodeDescriptor() method. Listing 4-30 shows the complete decodeDescriptor() method. It takes a single argument — the string to be decoded — and returns the decoded string.

Listing 4-30 The decodeDescriptor() method

   public String decodeDescriptor(String d) {
    if (d.equals(“B”)) return “byte”;
    else if (d.equals(“C”)) return “char”;
    else if (d.equals(“D”)) return “double”;
    else if (d.equals(“F”)) return “float”;
    else if (d.equals(“I”)) return “int”;
    else if (d.equals(“J”)) return “long”;
    else if (d.equals(“S”)) return “short”;
    else if (d.equals(“Z”)) return “boolean”;
    else if (d.startsWith(“L”)) {  // object
     String r = d.substring(1, r.length() - 1);
     r = r.replace(‘/’, ‘.’);
     return r;
    }
    else if (d.startsWith(“[“)) { // array
     int dimensions = d.lastIndexOf(‘[‘) + 1;
     String type = decodeDescriptor(d.substring(dimensions));
     for (int i=0; i < dimensions; i++) {
      type += “[]”;
     }
     return type;
    }
    else {
     throw new ClassFormatError(“Unrecognized Type: “ + d);
    }

   }

Now that you have a method to decode descriptors, it’s easy to finish the writeFields() method. Listing 4-31 demonstrates.

Listing 4-31 The writeFields() method

   public void writeFields() {

    for (int i = 0; i < fields.length; i++) {
     // indent two spaces
     theOutput.print(“  “);

     // print the access specifiers
     if (fields[i].isPublic()) theOutput.print(“public “);
     if (fields[i].isPrivate()) theOutput.print(“private “);
     if (fields[i].isProtected()) theOutput.print(“protected “);
     if (fields[i].isStatic()) theOutput.print(“static “);
     if (fields[i].isVolatile()) theOutput.print(“volatile “);
     if (fields[i].isTransient()) theOutput.print(“transient “);
     if (fields[i].isFinal()) theOutput.print(“final “);

     //print the type
     String descriptor = thePool.readUTF8(fields[i].descriptorIndex());
     theOutput.print(decodeDescriptor(descriptor) + “ “);

     //print the name
     theOutput.print(thePool.readUTF8(fields[i].nameIndex()));

     theOutput.println(“;”);

    }

   }


Note:  I debated whether to include the code to read a field info structure and convert it into a string in the FieldInfo class or in the Disassembler class. Although it would make somewhat more sense to encapsulate the code in the FieldInfo class, it can be decoded only if each FieldInfo object carries a reference to its constant pool.


Methods

The final piece of the disassembly puzzle is decoding the methods. As with the fields, this will take place inside a loop, because almost all classes have multiple methods. Every method has five parts that you must decode: the access specifiers, the return type, the name, the argument list, the exception list, and the byte codes. Here’s a skeleton for the writeMethods() method:

   public void writeMethods() {

    for (int i = 0; i < methods.length; i++) {

     theOutput.println();
     theOutput.print(“  “);
     // access specifiers

     //print the return type

     //print the name of the method

     //argument list

     //exceptions

     // method body

    }

   }

Method access specifiers

The access specifiers are quite simple to read with the methods of the MethodInfo class. Here’s the code:

   if (methods[i].isPublic()) theOutput.print(“public “);
   if (methods[i].isPrivate()) theOutput.print(“private “);
   if (methods[i].isProtected()) theOutput.print(“protected “);
   if (methods[i].isStatic()) theOutput.print(“static “);
   if (methods[i].isNative()) theOutput.print(“native “);
   if (methods[i].isSynchronized()) theOutput.print(“synchronized “);
   if (methods[i].isAbstract()) theOutput.print(“abstract “);

Method arguments and return type

The arguments and return type are considerably harder to get at. The method descriptor contains a complete list of all of a method’s arguments and its return value. These are encoded much like the field type descriptor, except that there can be more than one at a time. The arguments appear in parentheses and the return value follows that. For example, a method with the signature

   public static void main(String[] args)

has the descriptor ([Ljava/lang/String)V. This indicates that the method takes a one-dimensional array of java.lang.String objects and returns void. A method that takes two doubles as arguments and returns a double would have a signature of (DD)D. The disassembler program uses two separate methods to parse the method descriptor. The getReturnType() method gets the return type, and the getArguments() method handles the arguments.

The getReturnType() method (see Listing 4-32) reads the descriptor and passes everything after the closing parenthesis to the decodeDescriptor() method. This is the same decodeDescriptor() method used to get the type of a field. For example, if the method descriptor is (DD)D, then the string “D” is passed to decodeDescriptor(). If the method descriptor is ([Ljava/lang/String)V, then the string “V” is passed to decodeDescriptor().

Listing 4-32 The getReturnType() method

   public String getReturnType(MethodInfo mi) {
    String descriptor = thePool.readUTF8(mi.descriptorIndex());
    String d = descriptor.substring(descriptor.indexOf(‘)’) + 1);
    return decodeDescriptor(d);
   }

The getArguments() method is more complex because it needs to parse several arguments at a time. Furthermore, there are no convenient separators between the types. Finally, to make matters even worse, different types can have different sizes in the method descriptor strings. Primitive and void types are always one character wide, but array and object types have undetermined sizes.

Therefore, you must consider the character to decide what to do with it. If the character is one for a primitive data type, then you should pass that character (after converting it to a string) to the decodeDescriptor() method. However, if that character is an L, then you need to read up to the next semicolon and pass that string to decodeDescriptor(). Finally, if a character is a left bracket, then you must read as many brackets as follow and then read a type that may be a single character (that is, a primitive data type) or an object type. In essence, you need to embed the method inside itself to properly handle array types.

Listing 4-33 is the getArguments() method. This uses the variable a to keep track of the number of arguments that have been processed (so that you can tell where commas are needed in the argument list). It uses the variable i to tell which character in the descriptor begins the next type. This method would be much simpler if the descriptor had a constant with format that allowed a and i to be kept in sync.

Listing 4-33 The getArguments() method

   public String getArguments(MethodInfo mi) {

    String descriptor = thePool.readUTF8(mi.descriptorIndex());
    String params = descriptor.substring(1,descriptor.indexOf(“)”));
    String result = “”;
    try {
     int i = 0;
     int a = 0; // number of arguments
     while (i < params.length()) {
      char c = params.charAt(i);
      switch (c) {
       case ‘[‘:
        if (a++ != 0) result += “, “;
        int dimensions = 0;
        while (params.charAt(i) == ‘[‘) {
         i++;
         dimensions++;
        }
        char t = params.charAt(i);
        String type;
        if (t == ‘L’) {
         type = decodeDescriptor(params.substring(i, params.indexOf(“;”, i) + 1));
         i = params.indexOf(“;”, i) + 1;
        }
        else {
         type = decodeDescriptor(String.valueOf(t));
         i++;
        }
        for (int j=0; j < dimensions; j++) {
         type += “[]”;
        }
        result += type;
        break;
       case ‘L’:
        if (a++ != 0) result += “, “;
        String o = params.substring(i+1, params.indexOf(‘;’, i));
        result += o.replace(‘/’, ‘.’);
        i =  params.indexOf(‘;’, i) + 1;
        break;
       case ‘B’:
       case ‘C’:
       case ‘D’:
       case ‘F’:
       case ‘I’:
       case ‘J’:
       case ‘S’:
       case ‘Z’:
        if (a++ != 0) result += “, “;
        result += decodeDescriptor(String.valueOf(c));
        i++;
        break;
       case ‘V’:
        i++;
        break;
       default:
        throw new ClassFormatError(“Bad Parameter String: “ + params + “ “ + c);
       }
      }
     }
     catch (StringIndexOutOfBoundsException e) {
     }

     return result;

    }

Methods

The final piece of the disassembly puzzle is decoding the methods. As with the fields, this will take place inside a loop, because almost all classes have multiple methods. Every method has five parts that you must decode: the access specifiers, the return type, the name, the argument list, the exception list, and the byte codes. Here’s a skeleton for the writeMethods() method:

   public void writeMethods() {

    for (int i = 0; i < methods.length; i++) {

     theOutput.println();
     theOutput.print(“  “);
     // access specifiers

     //print the return type

     //print the name of the method

     //argument list

     //exceptions

     // method body

    }

   }

Method access specifiers

The access specifiers are quite simple to read with the methods of the MethodInfo class. Here’s the code:

   if (methods[i].isPublic()) theOutput.print(“public “);
   if (methods[i].isPrivate()) theOutput.print(“private “);
   if (methods[i].isProtected()) theOutput.print(“protected “);
   if (methods[i].isStatic()) theOutput.print(“static “);
   if (methods[i].isNative()) theOutput.print(“native “);
   if (methods[i].isSynchronized()) theOutput.print(“synchronized “);
   if (methods[i].isAbstract()) theOutput.print(“abstract “);

Method arguments and return type

The arguments and return type are considerably harder to get at. The method descriptor contains a complete list of all of a method’s arguments and its return value. These are encoded much like the field type descriptor, except that there can be more than one at a time. The arguments appear in parentheses and the return value follows that. For example, a method with the signature

   public static void main(String[] args)

has the descriptor ([Ljava/lang/String)V. This indicates that the method takes a one-dimensional array of java.lang.String objects and returns void. A method that takes two doubles as arguments and returns a double would have a signature of (DD)D. The disassembler program uses two separate methods to parse the method descriptor. The getReturnType() method gets the return type, and the getArguments() method handles the arguments.

The getReturnType() method (see Listing 4-32) reads the descriptor and passes everything after the closing parenthesis to the decodeDescriptor() method. This is the same decodeDescriptor() method used to get the type of a field. For example, if the method descriptor is (DD)D, then the string “D” is passed to decodeDescriptor(). If the method descriptor is ([Ljava/lang/String)V, then the string “V” is passed to decodeDescriptor().

Listing 4-32 The getReturnType() method

   public String getReturnType(MethodInfo mi) {
    String descriptor = thePool.readUTF8(mi.descriptorIndex());
    String d = descriptor.substring(descriptor.indexOf(‘)’) + 1);
    return decodeDescriptor(d);
   }

The getArguments() method is more complex because it needs to parse several arguments at a time. Furthermore, there are no convenient separators between the types. Finally, to make matters even worse, different types can have different sizes in the method descriptor strings. Primitive and void types are always one character wide, but array and object types have undetermined sizes.

Therefore, you must consider the character to decide what to do with it. If the character is one for a primitive data type, then you should pass that character (after converting it to a string) to the decodeDescriptor() method. However, if that character is an L, then you need to read up to the next semicolon and pass that string to decodeDescriptor(). Finally, if a character is a left bracket, then you must read as many brackets as follow and then read a type that may be a single character (that is, a primitive data type) or an object type. In essence, you need to embed the method inside itself to properly handle array types.

Listing 4-33 is the getArguments() method. This uses the variable a to keep track of the number of arguments that have been processed (so that you can tell where commas are needed in the argument list). It uses the variable i to tell which character in the descriptor begins the next type. This method would be much simpler if the descriptor had a constant with format that allowed a and i to be kept in sync.

Listing 4-33 The getArguments() method

   public String getArguments(MethodInfo mi) {

    String descriptor = thePool.readUTF8(mi.descriptorIndex());
    String params = descriptor.substring(1,descriptor.indexOf(“)”));
    String result = “”;
    try {
     int i = 0;
     int a = 0; // number of arguments
     while (i < params.length()) {
      char c = params.charAt(i);
      switch (c) {
       case ‘[‘:
        if (a++ != 0) result += “, “;
        int dimensions = 0;
        while (params.charAt(i) == ‘[‘) {
         i++;
         dimensions++;
        }
        char t = params.charAt(i);
        String type;
        if (t == ‘L’) {
         type = decodeDescriptor(params.substring(i, params.indexOf(“;”, i) + 1));
         i = params.indexOf(“;”, i) + 1;
        }
        else {
         type = decodeDescriptor(String.valueOf(t));
         i++;
        }
        for (int j=0; j < dimensions; j++) {
         type += “[]”;
        }
        result += type;
        break;
       case ‘L’:
        if (a++ != 0) result += “, “;
        String o = params.substring(i+1, params.indexOf(‘;’, i));
        result += o.replace(‘/’, ‘.’);
        i =  params.indexOf(‘;’, i) + 1;
        break;
       case ‘B’:
       case ‘C’:
       case ‘D’:
       case ‘F’:
       case ‘I’:
       case ‘J’:
       case ‘S’:
       case ‘Z’:
        if (a++ != 0) result += “, “;
        result += decodeDescriptor(String.valueOf(c));
        i++;
        break;
       case ‘V’:
        i++;
        break;
       default:
        throw new ClassFormatError(“Bad Parameter String: “ + params + “ “ + c);
       }
      }
     }
     catch (StringIndexOutOfBoundsException e) {
     }

     return result;

    }

Exceptions thrown by a method

The .class file also tells you which checked exceptions a method can throw. A checked exception is one that you must catch or declare in a throws clause. The exceptions declared in the throws clause of a method are an attribute of the method. The ExceptionsAttribute class, Listing 4-34, holds an array of indices into the constant pool, each of which points to a ClassInfo structure. The ClassInfo structure represents the class of the exception that’s thrown.

Listing 4-34 The ExceptionsAttribute class

   import java.io.*;

   public class ExceptionsAttribute {

    int nameIndex;
    int[] exceptions;

    public ExceptionsAttribute(AttributeInfo ai) throws IOException {
     nameIndex = ai.nameIndex();
     ByteArrayInputStream bis = new ByteArrayInputStream(ai.data);
     DataInputStream dis = new DataInputStream(bis);
     exceptions = new int[dis.readUnsignedShort()];
     for (int i = 0; i < exceptions.length; i++) {
      exceptions[i] = dis.readUnsignedShort();
     }
    }

    public int nameIndex() {
     return nameIndex;
    }

    public int howMany() {
     return exceptions.length;
    }

    public int getIndex(int i) {
     return exceptions[i];
    }

    public String toString() {
     return String.valueOf(nameIndex);
    }

   }


Secret:  In the Disassembler class, the getExceptions() method returns a throws clause for a particular method. The exceptions, if any, are stored in an attribute of the method with the name “Exceptions.” This attribute does not necessarily exist. Methods that declare no exceptions will not have an Exceptions attribute.



Secret:  This differs from what appears in the Java Virtual Machine Specification. According to that document, “There must be exactly one Exceptions attribute in each method info structure.” However, current Java compilers do not write an exceptions attribute in the method _ info structure unless the method actually has a throws clause. Listing 4-35 demonstrates the getExceptions() method.


Listing 4-35 The getExceptions() method

   public String getExceptions(Method mi) {

    ExceptionsAttribute theExceptions=null;
    String result = “”;

    // find the exceptions attribute
    AttributeInfo[] mAttributes = mi.getAttributes();
    for (int i = 0; i < mAttributes.length; i++) {
     String name = thePool.readUTF8(mAttributes[i].nameIndex());
     if (name.equals(“Exceptions”)) {
      try {
       theExceptions = new ExceptionsAttribute(mAttributes[i]);
      }
      catch (IOException e) {
      }
      break;
     }
    }
    if (theExceptions != null) {
     for (int i = 0; i < theExceptions.howMany(); i++) {
      if (i == 0) result += “ throws “;
      else result += “, “;
      ClassInfo ci = thePool.readClassInfo(theExceptions.getIndex(i));
      result += thePool.readUTF8(ci.nameIndex()).replace(‘/’, ‘.’);
     }
    }

    return result;

   }

The method body

The one piece left is the code inside the methods. This is the one piece of a Java .class file that you can’t easily make to match the source code. That’s because the Java source language in which you write programs is compiled to the much lower level byte code.

In this chapter, I only show you where the bytes of the byte code are stored so that you can output them in a disassembly. The next chapter, however, discusses what those byte codes mean, how you can read and understand them, and how you can work backward from the byte codes to Java source code.

The byte codes for each method are stored in a Code attribute for the method. The Code attribute has many different fields, but most of them are used only when interpreting code. In this chapter, you see only the actual byte codes.

The constructor has more information to parse than you need immediately. The toString() method converts the signed bytes in the code array to integers between zero and 255. Listing 4-36 shows this CodeaAttribute class.

Listing 4-36 The CodeAttribute class

   import java.io.*;

   public class CodeAttribute {

    int nameIndex;
    int maxStack;
    int maxLocals;
    byte[] code;
    int startpc;
    int endpc;
    int handlerpc;
    ExceptionTable[] exceptions;
    AttributeInfo[] attributes;

    public  CodeAttribute(AttributeInfo ai) throws IOException {
     nameIndex = ai.nameIndex();
     ByteArrayInputStream bis = new ByteArrayInputStream(ai.data);
     DataInputStream dis = new DataInputStream(bis);
     maxStack = dis.readUnsignedShort();
     maxLocals = dis.readUnsignedShort();
     code = new byte[dis.readInt()];
     dis.read(code);
     exceptions = new ExceptionTable[dis.readUnsignedShort()];
     for (int i = 0; i < exceptions.length; i++) {
      exceptions[i] = new ExceptionTable(dis.readUnsignedShort(),
      dis.readUnsignedShort(), dis.readUnsignedShort(),
      dis.readUnsignedShort());
     }
     attributes = new AttributeInfo[dis.readUnsignedShort()];
     for (int i = 0; i < exceptions.length; i++) {
       attributes[i] = new AttributeInfo(dis);
     }
    }
    public int nameIndex() {
     return nameIndex;
    }

     // just print the code array
     public String toString() {
      String result = “”;
      for (int i = 0; i < code.length; i++) {
       int thisByte;
       thisByte = code[i] < 0 ? 256 + code[i] : code[i];
       result += “    “ + thisByte + “\n”;
      }
      return result;
     }

    }

This class makes reference to another class called ExceptionTable. Listing 4-37 shows this class. It provides information to the virtual machine about where exception handlers begin and end. You won’t actually need it until the next chapter. However, this information is included in the .class file, so you have to read it now.

Listing 4-37 The ExceptionTable class

   public class ExceptionTable {

    int start_pc;
    int end_pc;
    int handler_pc;
    int catch_type;

    public ExceptionTable (int start_pc, int end_pc, int handler_pc, int catch_type) {

     this.start_pc = start_pc;
     this.end_pc =  end_pc;
     this.handler_pc = handler_pc;
     this.catch_type = catch_type;

    }

   }

The getCode() method in the Disassembler class is particularly simple. It just needs to find the Code attribute of the method and call its toString() method. Listing 4-38 demonstrates.

Listing 4-38 The getCode() method

   public String getCode(MethodInfo mi) {

    CodeAttribute theCode = null;

     // find the exceptions attribute
     AttributeInfo[] mAttributes = mi.getAttributes();
     for (int i = 0; i < mAttributes.length; i++) {
      String name = thePool.readUTF8(mAttributes[i].nameIndex());
      if (name.equals(“Code”)) {
       try{
        theCode = new CodeAttribute(mAttributes[i]);
       }
       catch (IOException e) {

       }
       break;
      }
     }
     if (theCode != null) {
      return theCode.toString();
     }
      return “”;

    }

Legal Issues

Many software companies want to tell you that it is illegal to disassemble, decompile, or reverse-engineer code. This is flatly wrong. The courts in the United States have decided more than once that this is permissible. (Laws outside the United States may be different. Consult a local attorney if this is a matter of concern.) Because the sort of reverse engineering described here is permitted by law, many companies try to prevent it through copyright, patent, or licensing restrictions.

Copyright protects the expression of an idea, not the idea itself. Copyright does not prevent you from reusing an idea. Thus, if you discover a neat algorithm by investigating the byte codes for SuperDuperApplet.class, just because SuperDuperApplet is copyrighted does not mean you cannot reuse the algorithm in your own programs. Although it is illegal to copy the byte code verbatim and paste it into your own files, it is perfectly legal to rewrite and recompile the algorithm.

A patent is a more serious level of protection. Software patents protect ideas, not merely the expression of ideas. If an organization or individual owns a patent on an algorithm — RSA encryption, for example — then, you are legally required to license the patent from the patent owner before using the algorithm in your own software.

Finally, many companies attempt to protect ideas through licensing. For example, the license for Developer Release 1 of Natural Intelligence’s Roaster states that the licensee may not “reverse-engineer, decompile, disassemble, modify, translate, make any attempt to discover the source code of the Software, or create derivative works from the Software.” This is fairly standard boilerplate in software licensing agreements. Interestingly, a similar clause is not part of the license agreement from Sun for Java 1.0.2. To the best of my knowledge, no one has tested this sort of clause in court, and I cannot offer an educated opinion as to whether it is enforceable. Those aspects of shrink-wrap licenses that people have tested in court tend to relate to matters already covered under copyright law (such as the making of additional copies), so even the validity of shrink-wrap licenses in general is in doubt.

To make matters even more confusing, the laws in 49 of the 50 states are often slightly different from each other. (The laws in Louisiana are wildly different. I am not familiar with laws relating to this in Puerto Rico, Washington, D.C., or other non-state territories in the United States, but I doubt they’re as different as Louisiana’s.) There is something called the Uniform Commercial Code, UCC for short, which is an effort to get the laws of 49 states to conform to each other. (Louisiana law is really just too different to be included.) As of late 1996, the UCC is being revised. The commercial software industry is trying to have provisions written into the new UCC that would increase the validity and enforceability of shrink-wrap licenses. It remains to be seen what will happen.

One problem with laws like the UCC is that they have a very hard time keeping up with the fast-changing software industry and the Internet. Even if the new UCC does clarify the status of shrink-wrap licenses, it probably will not address the fact that most software downloaded from the Internet does not have shrink-wrap! Some packages like Sun’s JDK display a splash screen with the license agreement the first time that a user launches the software with buttons for the user to accept or reject the agreement. To my knowledge, no one has tested such splash screen licenses in court. Even if they are held to be valid, what about splash screens that provide only an “Agree” button and no “Disagree” button? What if programmer A agrees to the license, but later programmer B starts using that computer and never sees the license? (For that matter, this applies to shrink-wrapped software, too.)

Furthermore, http servers allow .class files to be downloaded from many sites with no license of any sort. My suspicion is that no more than copyright law protects these .class files, but I would not be surprised to see a software company dispute this in court.

As you can see, these issues are quite complex. There are few easy answers. However, on a practical note, nobody is going to know or care if you disassemble a file to satisfy your personal curiosity or expand your knowledge. On rare occasions, companies have gotten perturbed and called out their lawyers when a competitor released a product that could read their file formats. They also tend to be annoyed when a writer publicly reveals information that they’d rather keep private. However, even in these cases there’s relatively little they can do besides write threatening letters.

This discussion has been necessarily brief. Table 4-5 lists some more resources on the Internet for investigating these intellectual property matters.

Table 4-5 Some intellectual property resources on the Internet
Resource Location Comments

The Oppedahl & Larson Patent Law Web Server

http://www.patents.com/

This law firm has many useful FAQ lists and other information about intellectual property written from a carefully legal perspective.

The Union for the Public Domain

http://www.public-domain.org/

The Union for the Public Domain is fighting the efforts of the copyright industry and the Clinton-Gore administration to allow the copyrighting of facts.

The Yahoo Intellectual Property Page

http://www.yahoo.com/Government/ Law/Intellectual_Property/

As usual, Yahoo has a good collection of links to all sorts of resources.

The Electronic Frontier Foundation’s “Intellectual Property Online: Patent, Trademark, Copyright” Archive

http://www.eff.org/pub/ Intellectual_property/

The Electronic Frontier Organization is primarily concerned with ensuring that freedoms taken for granted in the print world aren’t trampled on the Internet.

The League for Programming Freedom

http://www.lpf.org/

The League for Programming Freedom endeavors to make sure programmers aren’t unduly burdened by software patents and bad laws.

The International Federation of Library Associations and Institutions Copyright and Intellectual Property Resources page

http://www.nlc-bnc.ca/ifla/II/ cpyright.htm

This page has links to many excellent articles about intellectual property on- and off-line.

Accessing Class Files

So far in this chapter I’ve assumed that you’re working with a single .class file. In practice, that’s not always true. Sometimes it takes a little work to get a .class file to disassemble. First, you may have to do a little work to retrieve the .class file from a remote Web server. Second, .class files are often distributed as parts of larger zip or jar archives. However, it’s not hard to extract the necessary .class file from an archive.

Copying .class files

Most Web browsers play Java applets when they encounter them. They do not save them onto your hard drive in an easily retrievable form. Downloaded .class files may or may not be present in your browser’s cache. However, whether or not a browser caches a .class file on disk, you can use the following trick to download a copy of the file you want.

Let’s suppose you’ve seen a cool applet on a Web site at http://www.idgbooks.com/example.html and you want to learn how the programmer wrote it. Of course you’ll need a copy of the applet’s .class files. But how do you get them? This will take a little work, but it’s really not hard. Here are the steps.

1.  Use your Web browser’s View Source command to see the HTML for the Web page. You’re looking for the <APPLET> tag like this:

   <applet code=CoolApplet.class width=200 height=200>
   </applet>

2.  Write a very simple Web page that includes an HREF link to the file you want to look at it. For example,

   <A HREF=”http://www.idgbooks.com/CoolApplet.class”>Download Me</a>

3.  Load the page with the HREF link to the file into your Web browser. Then, use the Save this Link As command in the pop-up menu to save the file on your hard drive, as shown in Figure 4-2. That’s it. You should now have a clean copy of the .class file to work with.


Figure 4-2  The Save this Link as menu command in the Macintosh version of Netscape.

Zip files

Most VMs include their class libraries in the form of uncompressed zip archives called classes.zip. If you want to poke around in the innards of the class library, the first thing you need to do is unzip this file.

Theoretically, it shouldn’t matter whether a package hierarchy is or is not zipped, as long as your CLASSPATH is set up properly. In practice, that’s not always true. Before dearchiving someone else’s file for experimentation, you should always copy it to a directory that’s not in your CLASSPATH. It’s best not to work on the original copies.

As soon as the Sysops had installed JDK 1.1 on sunsite.unc.edu, I copied the classes.zip file to a test directory of my home directory and then unzipped it to start poking around. In other words,

   % mkdir ~/test
   % cp /usr/local/java-1.1/lib/classes.zip ~/test
   % unzip ~/test/classes.zip

There are many tools available for unzipping .zip files. Unzip is the dearchiver of choice for zip files on UNIX. StuffIt Expander works well on the Macintosh. Although PKZip is the original zip program, it cannot handle the long filenames that Java requires. Therefore, on Windows platforms, you should use WinZip instead.

Jar files

Java 1.1 introduced Jar files. Jar is a rough concatenation of “Java archive.” Jar files can contain all the .class files, image files, sound files, and other files needed to run an applet. By placing all these different files into a single file, a Web browser can download them with only a single request to the Web server. Depending on the server’s load and network conditions, this can save from a few milliseconds to several minutes of time. Furthermore, a Jar file can compress its contents so the savings can be even larger.

Jar files are included on Web pages with applet tags that look like this:

   <applet code=CoolApplet.class width=200 height=200>
     <param name=archives value=”jars/coolapplet.jar”>
   </applet>

You can download a Jar file to your hard drive exactly the same way you’d download a .class file. Once you have the Jar file on your local hard drive, you need to dearchive it to retrieve the individual parts. The java.util.zip package includes classes that can parse and handle Jar files. Sun’s JDK 1.1 for Windows and Solaris includes a command line jar program based on this package that you can use to pack and unpack Jar files. An equivalent program will likely be available for the Macintosh by the time this book hits store shelves.

The jar command line syntax (see Table 4-6) is deliberately similar to the classic UNIX tar command. Options are passed as one-character flags that follow the word jar on the command line. Archiving versus dearchiving is chosen through the c (create) or x (extract) flag, not via jar and unjar commands as a PKZip user might expect.

Table 4-6 Command line options for jar

Option

Purpose

c

Creates a new jar archive on System.out.

t

Lists the table of contents of the jar file on System.out.

x

Extracts from System.in.

f

The second argument specifies the jar file to process or create.

v

Produces verbose output on System.err.


For example, to archive all files in the current directory you would type

   C:\> jar cf allfiles.jar *

To archive just some files and directories, specify them by name on the command line like this:

   C:\> jar cf allfiles.jar CoolApplet.class Helper.class audio images

Directories are archived recursively; in other words, their immediate contents and the contents of any sub-directories are archived.

To dearchive the file, you would type the following:

   C:\> jar xf allfiles.jar

Summary

In this chapter, you learn about the format of Java .class files and how to read them. In particular, you learn the following:

!  You can view the same program in different formats: pure hexadecimal bytes, ASCII text, disassembled byte code, and Java source code. The first three formats are available in the .class file. The last is the form of the .java source code file.

!  How a .class file is organized and how to split it into its component parts.

!  How to work backward from the compiled .class byte code file to an approximation of the source that generated it. This task will continue in the next chapter.

!  That there are legal issues involved in doing this. Although copyright doesn’t prevent reverse engineering any more than it prevents you from reading a book you’ve bought, it may prevent you from copying what you’ve learned verbatim. Patents may provide more serious restrictions.

!  How to retrieve a .class file, wherever it resides, whether on a Web site, in a zip archive, or in a jar file.

In the next chapter, you learn how to decode and understand method bodies, instead of just printing them as streams of bytes.

 

Hosted by www.Geocities.ws

1