Sanjoy Chatterjee

Hosted by www.Geocities.ws

java tech tips

Tech Tips on Java ( ref. java.sun.com )

Topics :

Reflection

Performance -- using Object to represent disparate types

Faster I/O using JDK 1.1 classes

Command dispatching

Global variables

Performance tip: Garbage Collection and setting to null

Reflection
top

An important Java language feature is reflection, a feature also known as "introspection." Reflection is the ability to query a Java class about its properties, and to operate on methods and fields by name for a given object instance.

You can use reflection to set object fields or invoke particular object methods by name. For example, given an object instance "obj," and a method name "f5" specified at program execution time, the method can be invoked on the instance.

To see how this works, look at this simple example:


import java.lang.reflect.*;

public class DumpMethods {
  public static void main(String args[])
  {
    try {
      Class c = Class.forName(args[0]);
      Method m[] = c.getDeclaredMethods();
      for (int i = 0; i < m.length; i++)
        System.out.println(m[i].toString());             
    }
              
    catch (Throwable e) {
      System.err.println(e);
    }
  }
}

For an invocation of:


        java DumpMethods java.util.Stack

the output is:


public java.lang.Object 
         java.util.Stack.push(java.lang.Object)
public synchronized 
         java.lang.Object java.util.Stack.pop()
public synchronized 
         java.lang.Object java.util.Stack.peek()
public boolean java.util.Stack.empty()
public synchronized int 
         java.util.Stack.search(java.lang.Object)

That is, the method names of class java.util.Stack are listed, along with their fully qualified return types and parameter types.

This program loads the specified class using class.forName, and then calls getDeclaredMethods to retrieve the list of methods defined in the class. The class java.lang.reflect.Method represents a single class method.

The reflection feature may not seem like much at first, but it's impossible to do in other languages such as C, C++, or Fortran. The names and properties of functions in these other languages are gone by the time the program is executed. In technical terms, these other languages have "earlybinding," whereas the Java language has "late binding."

Performance -- using Object to represent disparate types
top

This tip is a little tricky, but it recently came up in an actual application, and illustrates how Java language features are used to efficiently represent a large data structure.

The application is one where a very large tree structure, consuming millions of bytes, is built up. Some of the nodes in the tree reference child nodes (non-terminals), while others are leaf nodes (terminals) and have no children, but contain String information. The application involves parsing a large Java program and representing it internally via a tree.

One simple approach to this problem is to define a Node class such as the following:


        public class Node {
                private int type;
                private Node child[];
                private String info;
        }

If the node is a leaf node, then info is used. Otherwise, child refers to the children of the node, and child.length to the number of children.

This approach works pretty well, but uses a lot of memory. Only one of child and info are used at any one time, meaning that the other field is wasted. child is an array, with attendant overhead, for example, in storing the dimensions of the array for subscript checking. For certain large inputs, the parser program runs out of memory.

The first refinement of this approach is to collapse child and info:


        public class Node {
                private int type;
                private Object info;
        }

In this scheme, info can refer to either a String, for a leaf node, or to a child node array. Object is the root of the Java class hierarchy, so that for example, the following:


        class A {}

implicitly means:


        class A extends Object {}

An instance of a subclass of Object, such as String, can be assigned to an Object reference. An array of Nodes can likewise be assigned to an Object. The instanceof operator can be used to determine the actual type of an Object reference.

In the parser application, using Object to represent both data types is not good enough because it still takes up too much memory. So a further change has been implemented. After doing some research, it was found that the child array consisted of a single Node element about 95 percent of the time. So it's possible to represent one-child cases directly using an Object reference to the child node, rather than a reference to a one-long array of child nodes.

This representation is complicated, and it's useful to define a method for encapsulating the abstraction as in the following example:


public class Node {
  private int type;
  private Object info;

// constructors, other methods here ...

// gets the i-th child reference

  public Node getChild(int i)
  {
    if (info instanceof String)
      return null;
    else if (info instanceof Node && i == 0)
      return (Node)info;
    else
      return ((Node[])info)[i];
   }
}

getChild returns the i-th child, or null for leaf nodes. If there is exactly one child, then info is of type Node, referencing that child. If there is more than one child, info is of type Node[], and a cast to Node[] is done, followed by a retrieval and return of the child reference.

In the parser application, this change is enough to tip the scales, so that the application would not run out of memory. The internal representation in this example is tricky, but it can be hidden via methods such as getChild. In general, it's wise to avoid tricky coding, but useful to know how to do it when the need arises.

The example also illustrates the utility of using one Object reference to represent several different data types. In C/C++ similar techniques would use void* pointers or unions.

Please see Tech Tips: January 20, 1998 for a followup on this topic.

Faster I/O using JDK 1.1 classes
top

With JDK 1.0, a standard way of reading and writing text lines was to use the DataInputStream class and the readLine method. JDK 1.1 offers a faster way of performing line-oriented text I/O using the classes FileReader, BufferedReader, FileWriter, and BufferedWriter.

This program gives a simple example of how these classes are used.


import java.io.*;

public class readwrite {
  public static void main(String args[])
  {
    if (args.length != 2) {
      System.err.println("usage: infile outfile\n");
      System.exit(1);
    }
         
    String in_file = args[0];
    String out_file = args[1];
         
    try {
      FileReader reader = new FileReader(in_file);
      BufferedReader buf_reader = 
                       new BufferedReader(reader);
      FileWriter writer = new FileWriter(out_file);
      BufferedWriter buf_writer = 
                       new BufferedWriter(writer);
      String ln = null;
      while ((ln = buf_reader.readLine()) != null){
        buf_writer.write(ln);
        buf_writer.newLine();
      }
      buf_reader.close();
      buf_writer.close();
    }
    catch (IOException e) {
      System.err.println(e);
      System.exit(1);
    }
  }
}

The program copies its input file to the output file. It runs about twice as fast as with the JDK 1.0 method of copying. One major reason for the difference in speed is because method call overhead to read underlying characters is eliminated.

Note: this approach is oriented toward text lines represented as Strings. Faster but more primitive I/O methods are also available, such as FileInputStream.

Command dispatching
top

In C and C++, programmers often use a set of global functions and function pointers to represent a series of commands or operations and then dispatch to them via a table. For example:


   #include <stdio.h>
   
   void func1() {printf("func1\n");}
   
   void func2() {printf("func2\n");}
   
   typedef void (*fp)(void);
   
   fp funcs[] = {&func1, &func2, 0};
   
   int main()
   {
      int i = 0;
      
      while (funcs[i])
         funcs[i++]();
      return 0;
   }

Java, though, does things differently. It doesn't use pointers and doesn't have global functions. Instead, Java uses a technique of creating and referencing object instances through which the objects call a method known to be implemented by an object of a given type.

How does the technique work in practice? Consider this example:


interface Action {
  public void doit();
}

class func1 implements Action {
  public void doit() {System.out.println("func1");}
}

class func2 implements Action {
  public void doit() {System.out.println("func2");}
}

public class call {
  private static Action alist[] = {
    new func1(),
    new func2()
  };

  public static void main(String args[])
  {
    for (int i = 0; i < alist.length; i++)
      alist[i].doit();
  }

}

In this example, an interface Action is defined, and any class that implements the interface must define a method doit (this notion, by the way, is basic to the meaning of interfaces). Objects of the class can be referred to with a reference to the implemented interface name type, so that a reference of type Action can refer to an object of a class that implements Action.

In C++ programming Action would be a base class, with func1 and func2 being derived classes, and object manipulation would be performed via a base class pointer. With Java, an Action reference supports a similar manipulation, although Action is not a superclass of func1 and func2.

The Java technique given here sometimes goes by the name "method wrappers," and is quite different from the C/C++ approach. There are some tradeoffs, of course, as to which approach works the best in a given situation.

Global variables

top

If you've programmed with languages such as C or C++ or Pascal, you've probably used global variables and functions. For example, the following simple code:


   int x = 0;

   void f()
   {
      x = 37;
   }

in C/C++ establishes a global variable "x," and the function f sets its value to 37.

Java does not have such variables and functions, however. Instead, you wrap the code in class declarations. The Java equivalent of the code given above is:


   public class Globals {
      public static int x = 0;

      public static void f() {x = 37;}
   }

And you refer to "x" with:


   Globals.x = 59;

Note that the "static" keyword is used in declaring these class members. This usage makes the members into class variables and methods, that is, ones that do not reference or operate on specific class object instances. You could say that the class structure in this example is most valuable for packaging purposes--instead of as a basis for object-oriented design.

A common idiom is to use a Java class to group related constants as in the following example:


public class TextConstants {
  public static final int WIDTH = 80;
  public static final int HEIGHT = 25;
  public static final int POINTSIZE = 12;
  public static final String FONT = "monospaced";
}

You can refer to individual constants with expressions such as:


   int i = TextConstants.POINTSIZE;

Declaring a private constructor

There's one more useful tip that applies to this approach. If you're using a class simply for packaging purposes, it really doesn't make sense to create a new instance of the class. Instead, you can declare a private constructor:


   private TextConstants() {}

The constructor must execute when an instance of the class is created, but with a private constructor, this is impossible.

Performance tip: Garbage Collection and setting to null

top

Java uses garbage collection, or reclaiming no-longer-used storage, rather than requiring you to explicitly manage storage. Garbage collection is automatic, but sometimes there are ways to help it out. Imagine a case where you're managing a stack of Object references:


public class Stack {
  private static final int MAXLEN = 10;
  private Object stk[] = new Object[MAXLEN];
  private int stkp = -1;

  public void push(Object p) {stk[++stkp] = p;}

  public Object pop() {return stk[stkp--];}
}

Now consider a case where the stack has two elements on it, and you pop one of them. At this point stk[0] will have a valid element in it, and stk[1] will have the element just popped. That is, stk[1] will have a reference to an Object, which could be a reference to anything, including a large data structure of many thousands of bytes. In such a case, this data structure cannot be garbage collected, even though it may no longer be in use.

To remedy this problem, you can rewrite pop like so:


     public Object pop()
     {
         Object p = stk[stkp];
         stk[stkp--] = null;
         return p;
     }

You haven't nullified the Object itself, just a reference to it that's no longer valid. The Stack object itself may have a long lifetime, and rewriting the pop method in this way helps ensure that garbage collection gets done in a timely fashion.

Hosted by www.Geocities.ws