Chapter 3
Classes, Strings, and Arrays

The last chapter explored Java’s primitive data types. This chapter explores Java’s reference data types. A primitive data type is one whose value is stored directly in memory. A reference data type stores only a reference to the place where the actual data can be found. There are two reference data types: objects and arrays. Objects and arrays are normally explained very abstractly and at a very high level. It’s my goal in this chapter to explain them very concretely and at a very low level. By understanding the low-level structure you can make sure you’re that working with Java rather than against it and substantially speed up your programs.

The Heap

The heap is a large block of memory that Java uses to store objects and arrays. Memory in the heap can be allocated discontiguously. When a new object or array is created, the space comes from somewhere in the heap. Exactly where isn’t important, or even defined. When an object or array is garbage-collected, the memory that it occupied in the heap is freed. That is, the memory is marked as unused and made available for reuse by other objects.

An object has two parts: its fields and its methods. Each field requires memory to hold a value appropriate to its type. Each method requires memory to hold its arguments and return values and code. However, the memory for the method is needed only when the method is invoked. Furthermore, methods are the same for each instance of the class. Methods therefore are allocated on an as-needed basis in an area of memory called the stack.

Consider the following 3DPoint class:

     public class 3DPoint {

      double x;
      double y;
      double z;

      //  various methods...
     }

This class has three double fields. Each double occupies eight bytes. Therefore, each instance of this class needs 24 bytes of memory in the heap. If there is one 3DPoint object in existence, then exactly 24 bytes of heap memory are needed. If there are two 3DPoint objects in existence, then 48 bytes of heap memory are needed. If there are three 3DPoint objects, then 72 bytes of heap memory are needed, and so on.

Arrays are similar. To determine how much heap memory that an array requires, multiply the length of the array by the width of the data type stored in the array. A float array of length 10 thus needs 40 bytes of heap memory; a char array of length 10 needs 20 bytes of heap memory; and a byte array of length 10 needs 10 bytes of heap memory.

When a new object or array is created, the necessary amount of space is set aside for it in the heap. The new operator returns a reference to the block of memory in the heap where the object or array is stored. The virtual machine is responsible for managing the heap and making sure that the same block of memory is not used for two different objects or arrays at the same time.

The exact size of the heap is system-dependent. However, the heap is finite on all systems. In some Java implementations, the heap can grow if more space is needed. On others the size of the heap is fixed when the virtual machine starts up. Nonetheless, the heap is definitely smaller than the memory (physical or virtual) available on the host computer. If the heap fills up, the runtime system throws an OutOfMemoryError.

Garbage collection attempts to prevent this from happening by purging objects and arrays from the heap when they’re no longer necessary. Exactly how the garbage collector decides what can and cannot be purged from the heap is one of the topics in Chapter 6. For now, all you need to know is that the garbage collector is quite reliable and won’t purge anything that you might actually need to use.

Objects of different types require different amounts of memory. The more fields that an object has, the more memory that it needs in the heap. Objects can contain other objects as fields. For example, consider this class:

     public class GridPoint {

      Integer i1;
      Integer i2

      //  various methods...
     }

The GridPoint class contains two Integer objects. A GridPoint object does not store the Integer objects themselves in its own block of memory; it stores only references to the Integer objects. References take up four bytes. Therefore, the GridPoint object needs eight bytes of heap memory, regardless of how much heap memory an Integer object requires. Of course, the total memory used by a program will include the memory used by all of the GridPoints, all of the Integers, and all of the other objects stored in the heap.

Pointers, Handles, and References

There’s a lot of confusion about whether Java does or does not have pointers. If you’ve never programmed in a pointer-based language like C or Pascal, then you will probably never need to understand pointers. You can rest assured that Java lets you do everything that you normally use pointers to do, especially with respect to data structures. However, if you’re accustomed to a pointer-based language like C, then you probably need to be convinced of this statement.

What is a pointer?

A pointer is the address of a particular byte of a computer’s memory. For example, a computer with eight megabytes of memory has 8 * 1024 * 1024 = 8,388,608 bytes of memory. Therefore, the valid pointers on this system begin at zero and count up to 8,388,607. The first byte of memory has the address zero. The last byte of memory has the address 8,388,607. With a pointer, you can inspect the contents of any byte or group of bytes. Similarly, you can write any value you like at any point in memory. For example, in C, to write the int 768 in the four bytes starting with byte 4,324,682, you would write

     int n = 4324682;
     int* m = (int*) n;
     *m = 768;

No check is performed to make sure that it makes sense to put the value 768 at memory location 4324682. If you put the wrong value in the wrong place, it can crash your program, your machine, or worse.

These sorts of bugs are common in C programs. Java has eliminated pointers in order to prevent them. Furthermore, pointers open up many security holes, because they allow any program more or less unrestricted access to all parts of the system.

What is a handle?

A handle is a pointer to a pointer. That is, a handle points to a location in memory where the address of the actual data can be found. The advantage of handles over raw pointers is that an object can be moved in memory and the pointer to it updated while the handle for the program remains valid. This has significant advantages for keeping memory clear and defragmented.

For example, after a program has run for some time, many objects will have been constructed and garbage collected. This can make a heap very fragmented, as shown in Figure 3-1. Each block is a word. The gray blocks are words in use. The white blocks are free words. Of course, real heaps have many more words than this, but this is sufficient for a demonstration.


Figure 3-1  A fragmented heap.

Suppose, with the heap in this state, that you need four words for an object. There is plenty of space in the heap, but it’s fragmented. There is no one place where you can get four words of contiguous memory. To make space for the new object, you have to move some of the allocated blocks around in memory. However, this can cause problems if the running program has pointers straight into the heap. For example, consider Figure 3-2. This is the same heap, with object variables shown as ovals. The arrows are pointers into the heap. Each object has at least one pointer (to its own data), and some have multiple pointers if they themselves contain references to other objects. Furthermore, one object may be pointed to from several different places. This interconnected web of pointers makes it very difficult to move objects in the heap, because you have to update all of the different pointers that can exist in hundreds of different objects, methods, and threads.


Figure 3-2  A heap with pointers.

Now look at what happens if you just willy-nilly compact the heap by moving all the data down to the bottom. Figure 3-3 shows the result. Now there is space for a four-word object. However, many — perhaps most — of the pointers are broken. Some now point to the wrong object. Others point to nowhere in particular. The VM can try to identify every reference to each moved object in the running program and update it with the new address of its data, but there can be thousands of these, and the operation can be extremely time-consuming in a large program.


Figure 3-3  A compacted heap.

How can the references be arranged in such a way that they don’t break when the heap is defragmented? One way to look at the problem is that references point to areas of different sizes in the heap. If you could somehow arrange it so that every object needed exactly the same amount of space in the heap, then fragmentation would not be a problem. As long as there was any free space at all, it could be used.

Of course, different objects do take different amounts of space, but references always take four bytes (one word). The solution is to insert an extra block of references between the references in your source code and the heap. When an object is moved in the heap, only one link needs to be updated: the one between the offset table and the data in the heap. The many more pointers to the offset table do not need to be updated. Furthermore, it’s relatively easy to find the pointers in the offset table that need to be updated. The VM does not need to search the entire memory space of the running program looking for anything that might be a pointer.

Figure 3-4 shows this scheme. To find an object’s data, you follow the first arrow into the offset table. Then you follow the second arrow out to the actual data in the heap.


Figure 3-4  A fragmented heap with handles.

At first glance this appears more complicated than the method in Figure 3-2. However, consider what happens when the heap is compacted. Figure 3-5 shows the result. The object pointers don’t need to be changed. Only one pointer needs to be adjusted for each object, not one pointer for each reference, as in the previous case. Because there’s a one-to-one relationship between filled entries in the offset table and objects in the heap, once you’ve adjusted the pointer from the offset table to the object, you’ll never have to adjust another pointer to the same object later. If you’re moving only one object in the heap, you can stop looking as soon as you find the pointer to it in the offset table.


Figure 3-5  A compacted heap with handles.

There are many optimizations that can be made to this scheme. For example, each object in the heap can contain the index of its pointer in the offset table, so when the memory manger needs to move it, the memory manager can adjust the pointer in constant time.


Secret:  This all happens behind the scenes, so you normally don’t need to worry about it. Sun’s virtual machines use handles, but this isn’t absolutely necessary. Microsoft’s VM implements references as pointers, not doubly indirected handles.


Of course, double indirection is useful not only in virtual machines. This scheme, or variants of it, can be used in situations where moving objects in the heap is very expensive but moving objects in the offset table is cheap. For example, if the heap is actually a file on disk but the offset table is in memory, then you can reorganize the structure of a file by changing the offset table. Variations on this scheme are used in most relational databases.

What is a reference?

Reference is strictly a Java term. There are no references in C or Pascal. A reference is an abstract identifier for a block of memory in the heap. Furthermore, a reference has a type like string or double[]. At the level of the non-virtual host machine, references may be implemented as handles, pointers, or something else entirely. However, references are not pointers; they are not handles; they are merely a means of identifying a particular block of memory in the heap.

How exactly the virtual machine implements references at the level of machine code is VM-dependent and completely hidden from the programmer in any case. Most VMs — including Sun’s — use handles, not pointers. Microsoft’s VM uses pointers rather than handles. Other schemes are possible.

Ninety percent of the time, you can ignore the difference between a reference to an object and the object itself. However, there is always that annoying 10 percent of the time when the difference becomes important. This 10 percent occurs mostly when passing arguments to methods.

Passing arguments to methods

There are two ways to pass an argument to a method: by value and by reference. The difference is in what happens to the variable passed in the calling method as a result of what’s done to it in the called method. For example, consider this code fragment:

     int a = 7;
     changeVariable(a);
     System.out.println(a);

Now suppose the changeVariable() method looks like this:

     public void changeVariable(int a) {
      a = 10;
     }

What value gets printed — 7 or 10? If the argument a is passed by value, then a copy of variable a’s value is used by the changeVariable() method. The changeVariable() method never gets access to the original variable a in the calling method. It has a different variable, also named a. Therefore, the calling method prints the value 7.

On the other hand, if a is passed by reference, then the changeVariable() method does not get a copy of the variable named a. It gets the real thing. The name a in the calling method and the name a in the changeVariable() method refer to the same variable. Therefore, System.out.println() prints 10.

Note that the names aren’t important here. If changeVariable() were written using i or some name other than a for its argument, the result would be the same. What makes the difference is whether the variable is passed by reference or by value. If a variable is passed by reference, it can change in the calling method. If it’s passed by value, it cannot. Java passes all arguments by value, not by reference.

Because object and array variables in Java are references to the object or array, it can appear as if an object is passed by reference if you modify only the fields of the object or array, but do not change the reference itself. For example, consider this program:

     import java.awt.Point;

     class changePoint {

      public static void main(String args[]) {

       Point p1 = new Point(0, 0);
       changePoint(p1);
       System.out.println(p1);

      }

      static void changePoint(Point p) {

       p.x = 38;
       p.y = 97;

      }

     }

This program prints

     java.awt.Point[x=38,y=97]

The point has therefore been changed. However, the reference, which is what was really passed, has not been changed. To see that, consider the following program:

     import java.awt.Point;

     class dontChangePoint {

      public static void main(String args[]) {

       Point p1 = new Point(0, 0);
       dontChangePoint(p1);
       System.out.println(p1);

      }

      static void dontChangePoint(Point p) {

       p = new Point(38, 97);

      }

     }

It prints the following:

     java.awt.Point[x=0,y=0]

Once and for all: Does Java Have Pointers?

The issue of whether Java really has pointers seems to generate countless flame wars on Usenet, almost as many as are generated by Star Trek trivia. Some of this confusion is a result of incorrect or incomplete knowledge. Even more of the confusion is a result of using the same word to mean two different things.

In this book, I use the word pointer to mean the address of a byte of memory on the computer. This is the definition of pointer used by assembly language, C, and C++ programmers. Java has no equivalent for this kind of pointer.

Some programmers, particularly those accustomed to pointerless languages like Fortran 77 and Basic, use the word pointer in a very abstract sense, in which it is just about anything that gives a reference to or points to a block of data. Thus, an array that contains indexes of entries in another array is often said to be an array of pointers, although C programmers would not recognize it as such. In this sense, Java does have pointers.

Some programmers also claim that Java has pointers because the virtual machine may use pointers in the same sense as used by assembly language programmers to implement Java’s references. In fact, some implementations of Java, particularly Microsoft’s, do use pointers in exactly this fashion. However, the .class file verifier severely restricts what you can do with these pointers. In particular, you cannot use them as freely as you can in C or assembly language. You cannot perform arithmetic on them. You cannot convert them to and from numeric data types like int. Furthermore, the virtual machine need not implement references as pointers. Sun’s virtual machines use handles instead. Others may use something completely different, like numeric indexes into a large, static array.

The bottom line is that Java doesn’t have pointers in the sense that 95 percent of the people who talk about pointers mean. It’s best just to use the term reference, and try to stay out of flame wars when possible.

In this example, a copy of the reference p1 was passed to the dontChangePoint() method. A new Point object was then assigned to that copy. This, however, did not change the old reference in the main() method. In the previous example, the reference p in the changePoint() method and p1 in the main() method both referred to the same object. In this example, p and p1 refer to different objects after the new Point is assigned to p.

Special references

There are three special references in Java source code: null, this, and super. The meaning of these references generally depends on their context.

The null reference

The null reference is an invalid reference. It has no type, and thus may be assigned to a variable of any reference type. When a reference variable is declared but not constructed, it is initially equal to null.

The this reference

The special reference this always refers to the current object. For example, the statement

     int j = this.x;

sets the variable j equal to the x field of this object. Using the this reference is normally optional. Using “int j = x” would work equally well. However, on occasion, a variable declared inside a method can shadow a field. This is most common in constructors. For example, consider this elaboration of the 3Dpoint class:

     public class 3DPoint {

      double x;
      double y;
      double z;

      public 3DPoint (double x, double y, double z) {

       this.x = x;
       this.y = y;
       this.z = z;

      }

      //  other methods...

     }

The three arguments to the constructor — x, y, and z — shadow the fields of the same name. Inside the 3DPoint constructor, x, y, and z no longer refer to the fields of the object but rather to the arguments to the method. However, this.x, this.y, and this.z still refer to the fields x, y, and z.

The this keyword can also be used to call a different constructor in the current class. With this usage, this is not, strictly speaking, a reference. The this keyword can be used this way only in the first statement of another constructor. For example, to call the 3DPoint (double x, double y, double z) constructor from the noargs 3DPoint() constructor, you would write

     public 3DPoint () {

      this(0, 0, 0);

     }

This technique is especially common in polymorphic constructors. Arguments not passed to one constructor are filled in with default values as a call to another constructor.

The super reference

The this reference refers to methods and fields of the current object. The super reference refers to the methods and fields of the immediate superclass. You need to use the super prefix only if a field or method in the subclass has the same name as a field or method in the superclass.

For example, the java.awt.Component class has a handleEvent() method, so its subclasses do, too. Specifically, java.awt.Container has a handleEvent() method; java.awt.Frame() has a handleEvent() method; and any subclass of java.awt.Frame that you write has a handleEvent() method. Now let’s suppose that you want to write a subclass of Frame that allows the window (that is, the Frame) to be closed. One way to do this is to override handleEvent() in your subclass of Frame so that it handles WINDOW_DESTROY events. That method might look like this:

     public boolean handleEvent(Event e) {

      if (e.id == Event.WINDOW_DESTROY) {
       hide();
       return true;
      }
      else return false;

     }

This method will close the window (by calling the Frame’s hide() method), but it doesn’t handle any other events. It completely misses mouse clicks, key presses, action events, and more. This would normally be handled by the handleEvent() method of the Component class, but we’ve shadowed that method with our own handleEvent(). Once we’ve finished our custom processing of the WINDOW_DESTROY event, we want to pass all other events to the handleEvent() method of java.awt.Component. The super keyword acts like a reference to that class that lets us do that. Instead of writing “else return false;”, write “else return super.handleEvent(e);”. This calls the handleEvent() method in the superclass rather than the handleEvent() method in this class. Here’s the complete method:

     public boolean handleEvent(Event e) {

      if (e.id == Event.WINDOW_DESTROY) {
       hide();
       return true;
      }
      else return super.handleEvent(e);

     }

Using the super keyword like this finds the nearest method with a matching signature. In this case, it’s the handleEvent() method in java.awt.Component. If there were a handleEvent() method in java.awt.Frame or java.awt.Container, that would be called instead.

Like the this keyword, super also has a non-reference meaning inside a constructor. If you use super() as the first statement in a constructor, it calls the matching constructor in the immediate superclass.

If you do not include an explicit call to super() as the first statement in your constructor, then the compiler will insert such a call into the byte code. The compiler always chooses the noargs super() constructor if you don’t explicitly choose a different one. This can lead to annoying bugs that prevent you from instantiating a subclass if the superclass doesn’t have a public or protected noargs constructor. For example, consider this incorrect Java program:

     public class superclass {

      public superclass(int i) {

      }

     }

     class subclass extends superclass {

      subclass() {

      }

     }

If you try to compile this program, you get the error message “No constructor matching superclass() found in class superclass: superclass.java line 12”. This is a common problem for novices. What you should do to fix this is completely un-obvious, because you never actually called the superclass() constructor that you’re being warned doesn’t exist. The solution is either add a noargs constructor to the superclass or to call the superclass(int i) constructor in the first line of the subclass. For example,

     class subclass extends superclass {

      subclass() {
        super(0);
      }

     }

The Class Class

Introductory texts and classes about object-oriented programming spend a lot of time trying to explain the difference between objects and classes. If there is a single thing that separates people who understand object-oriented programming from the people who merely know a few buzzwords, it’s the proper use of the words object and class. It’s drilled into students that they are two different things, as different as the recipe for a cake and the cake itself. However, now that you are a more advanced student of object-oriented programming, I can tell you the truth. Classes really are objects, at least in Java.

A Java .class file contains the byte code for a particular class. When the Java VM loads a class, a ClassLoader object reads the byte codes and uses them to instantiate a new object of type java.lang.Class — in other words, a Class object. A Class object has methods that are useful for deducing information about the class at runtime.

There are two primary ways that your program can bootstrap a reference to a Class object: a ClassLoader object can load the class from bytes, or your program can call the static method Class.forName(String s) to load the class given its name. For example:

     Class threadClass = Class.forName("java.lang.Thread");

The name of the class must be the fully qualified name, including the entire package. For example, you must write java.lang.Thread and not just Thread. This is true regardless of the import statements in the program or whether the class is in the java.lang package.

Once you have a Class object, you can use the newInstance() method to create instances of the class. Its signature is

     public Object newInstance() throws InstantiationException,
      IllegalAccessException

For example:

     try {
      Object o = threadClass.newInstance();
     }
     catch (InstantiationException e) {
     }
     catch (IllegalAccessException e) {
     }

The object is returned without any type information, though. In other words, it’s a raw java.lang.Object, not a java.lang.Thread. You can cast the created object to the appropriate type like this:

     Thread t = (Thread) o;

You often see these three steps combined on one line like this:

     Thread t =(Thread) Class.forName("java.lang.Thread").newInstance();

You need to know what the object will be before you can cast it. There’s no convenient way to determine the most specific type of an object created at runtime. If you know roughly what sort of objects are likely to be created, then you can check the possibilities with the instanceof operator and respond accordingly. For example,

     if (o instanceof Thread) {
      Thread t = (Thread) o;
      // ... work with the Thread
     }
     else if (o instanceof Applet) {
      Applet a = (Applet) o;
      // work with the Applet
     }

In general, though, it’s much easier if you know your objects’ types at compile time.

Aside from the type change, the newInstance() method behaves exactly like using the new operator with the noargs constructor for the class. The following lines produce identical byte code:

     Thread t =(Thread) Class.getClass("java.lang.Thread").
     newInstance(); Thread t = new Thread();

If a class doesn’t have a noargs constructor, you can’t instantiate it with the newInstance() method. For example java.lang.Integer has two constructors:

     public Integer(int  value)
     public Integer(String  s)

It does not have a constructor with the signature

     public Integer()

Therefore, you should not write

     Integer i =(Integer) Class.getClass("java.lang.Integer")
      .newInstance();

If you try this, a java.lang.NoSuchMethodError will be thrown at runtime. Integers and other classes that do not have noargs constructors must be instantiated with the new operator.

Once you have a Class object, there are several other methods to help you determine runtime type information.

The getName() method returns the full package and class name of the class of this object. Its signature is

     public String getName()

This is primarily useful for debugging. For example,

     public void printname(Object o) {
      System.out.println("I got an object of class " + o.getname());
     }

The name returned is always the most specific type possible, never a superclass or an interface.

The getSuperclass() method returns a Class object representing the class of the immediate superclass of this object. Its signature is

     public Class getSuperclass()

You can use this to walk the class hierarchy of an object. For example, given a Class object c,

     while ((c = c.getSuperclass()) != null) {
      System.out.println("extends " + c.getName());
     }

However, java.lang.Object does not have a superclass, so if the Class object is of type java.lang.Object, then null is returned. Thus, the last name printed by this loop will always be java.lang.Object because that’s the ultimate superclass of all Java objects. Also, null is returned if the object is an interface.

Interfaces are also loaded into the VM as objects of type Class. You can test whether a Class object in fact represents an interface with the isInterface() method. It returns true if the Class object in question represents an interface and false if it doesn’t. Its signature is

     public boolean isInterface()

The getInterfaces() method returns an array containing Class objects that represent interfaces. Its signature is

     public Class[] getInterfaces()

If the Class object represents a class, then the array contains Class objects representing all interfaces implemented by the class. However, if the Class object represents an interface, then the array contains objects representing all the interfaces extended by this interface. This array may be of length zero if the Class object neither implements nor extends any interfaces.

The one remaining piece of information about a Class object is the ClassLoader that was used to load the class from bytes on disk or on the network. The getClassLoader() method returns the ClassLoader object which loaded this class. It returns null if the class was not created by a ClassLoader. Its signature is

     public ClassLoader getClassLoader()

Finally, there’s the usual toString() method for creating a string representation of the Class object. The string is the word class or interface followed by the full, package-qualified name of the class. Some examples are “class java.lang.Integer” and “interface java.io.Serializable.”

Listing 3-1 is a program that demonstrates most of these methods. The static printRTTI(Object o) method is the heart of it. This method checks all the possible type information about the object that it has passed and prints it on System.out. This can be very useful for debugging. The main() method tests the program on two objects: an Integer and a DataOutputStream. The three methods printHierarchy(), printInterfaces(), and printClassLoader() break up the code to make it a little more legible. Each one handles a particular aspect of the runtime type.

Listing 3-1 RTTI

     import java.io.*;

          public class RTTI {

           public static void main(String args[]) {
            printRTTI(new Integer(7));
            printRTTI(new DataOutputStream(System.out));
           }

            public static void printRTTI(Object o) {

            if (o == null) {
             System.out.println("This object is null");
             return;
            }
            Class c = o.getClass();
            printHierarchy(c);
            printInterfaces(c);
            printClassLoader(c);

           }

           static void printHierarchy(Class c) {
            System.out.println(c.getName());
            while ((c = c.getSuperclass()) != null) {
      System.out.println("extends " + c.getName());
            }

           }

           static void printInterfaces(Class c) {

            Class[] ci = c.getInterfaces();
            if (ci.length > 0) {
             if (c.isInterface()) {
              for (int i = 0; i < ci.length; i++) {
               System.out.println("extends" + ci[i].getName());
              }
             }
             else {
              System.out.println("implements ");
              for (int i = 0; i < ci.length; i++) {
               System.out.println(ci[i].getName() + ",");
              }
             }
            }

           }

           static void printClassLoader(Class c) {

            ClassLoader cl = c.getClassLoader();
            if (cl != null) System.out.println("This object was
             loaded by " + cl);
           }
     }

The Object Class

Now that you know that classes are objects, I’ll confuse you a little more by telling you that objects are classes, too. As with the Class class, there is, however, a difference between the big O Object (which is a class) and the little o object (which is an object).

The java.lang.Object class is the common superclass for all Java classes. All classes eventually extend java.lang.Object. Thus, all classes have access to the methods of java.lang.Object. The primary purpose of java.lang.Object is to provide several useful methods that the programmer can count on all classes having. These are clone(), equals(), finalize(), getClass(), hashCode(), notify(), notifyAll(), toString(), and wait(). Furthermore there’s a single constructor, Object(), but you’ll rarely (if ever) call it directly.

What these methods have in common is that they represent internal details of Java objects, not anything external to Java like a window, a mouse, a motorcycle, a supernova, or anything else that exists outside the virtual machine. Objects normally represent real world entities. However, the java.lang.Object class is something of a meta-class — that is, a class that represents objects. The clone() method copies objects. The equals() method compares objects. The getClass() method returns the class of an object. The hashCode() method computes a unique integer for an object. The toString() method creates a string representation of an object. The notify(), notifyAll(), and the wait() methods interface between objects and threads. What all these methods have in common is that they treat objects as computer-based abstractions, not as real world things. Let’s take a closer look at these methods.

Cloning

The clone() method makes a bitwise copy of an object in memory and returns the copy. In other words, it creates a new instance of the object’s class and copies the values of each field of the object into the new object. It copies all fields, whether they’re public, private, protected, or package protected.

Not all objects can be cloned. In fact, by default an object may not be cloned. Only objects that implement the java.lang.Cloneable interface may be cloned. The Cloneable interface does not actually declare any methods. It just tells the clone method in java.lang.Object that it’s okay to clone this object. If you try to clone an object that does not implement the Cloneable interface, a CloneNotSupportedException is thrown.

Equality

The equals() method compares two objects for equality. Because equals() is a method of java.lang.Object, any object can be compared for equality to any other object. That is, every object at least inherits an equals() method.

The equals() method has this signature:

     public boolean equals(Object  obj)

The java.lang.Object.equals() method just checks to see if two reference variables refer to the same object. For example, consider the following three reference variables:

     Integer i1 = new Integer(7);
     Integer i2 = new Integer(7);
     Integer i3 = i2;

Using the equals() method in java.lang.Object, i1 is not equal to i2 because i1 and i2 refer to two different objects, even though those objects have the same value. On the other hand, i2 and i3 are considered equal to each other because they refer to the same object.

This is often not the behavior you want. Therefore, most classes override equals() with a method that is more appropriate to the specific class. For example, the equals() method in java.lang.Integer does in fact test the values of the Integer objects to see if they’re the same. Thus, i1.equals(i2) returns true because the equals() method in java.lang.Integer behaves differently than the equals() method in java.lang.Object.

It’s nonetheless important to realize that there are often multiple, sensible ways to decide whether two objects are equal. For example, consider these two URL objects:

     URL u1 = new URL("http://www.inch.com/");
     URL u2 = new URL("http://worm.inch.com/");

www.inch.com and worm.inch.com are different names for the same machine, so these URLs point to the same page. But are the URLs the same? Maybe not. After all, one of the host names could be moved to a different machine while the other one stayed behind. Or consider these two URLs:

     URL u3 = new URL("http://sunsite.unc.edu/javafaq/");
     URL u4 = new URL("http://calzone.oit.unc.edu/javafaq/");

These two URLs point to the same page on the Web. However the first URL goes over a 100 Megabit per second (Mbps) FDDI connection and the second over a 10 Mbps Ethernet connection. So these URLs are probably best considered to be unequal.

In fact, Java considers all four of the URLs above to be unequal. The equals() method in the URL class, like most of the equals() methods in the java packages, is relatively shallow. It does not attempt to discover whether two different objects might mean the same thing if they are superficially different.

This isn’t all. What about these two URLs?

     URL u5 = new URL("http://sunsite.unc.edu/javafaq/
      javafaq.html#xtocid1902930");
     URL u6 = new URL("http://sunsite.unc.edu/javafaq/
      javafaq.html#xtocid1902931");

These point to different sections of the same page. Although they are different in one sense, Java considers them to be equal because they point to the same page.

The bottom line is that there are few guarantees about how the equals() method behaves. The only thing you can be reasonably confident about is that references to the same object will be equal to each other. Otherwise, you have to do the best you can. Regrettably, the documentation for the equals() methods is often incomplete. The only real way to find out what a particular equals() method really does is to look at the source code. If that’s not possible, equals() methods tend to be simple enough to understand from decompiled or disassembled byte code. If for some reason you can’t disassemble or decompile a class, then you have to run as many tests as you can think of to determine what’s really going on.

Finalization

The finalize() method is called when an object is garbage-collected. The finalize()method is the programmer’s last chance to do something with an object. The finalize() method of the java.lang.Object class is an empty method; that is, it does absolutely nothing. It looks something like this:

     protected void finalize() throws Throwable {

     }

You may ask yourself, why even have a finalize() method in java.lang.Object if it never does anything? The reason is so that the runtime knows that it can always call an object’s finalize() method. A method doesn’t have to do anything to be called. Subclasses of java.lang.Object may override finalize(). If they do, their finalize() method is called. Otherwise the finalize() method in java.lang.Object is called. It’s much simpler to know you can always call finalize() for any object at all than to have to check whether an object has a finalize() method before calling it.

Runtime type information

In Java, an object can tell you what class it belongs to via the getClass() method of java.lang.Object, which has the following signature:

     public final Class getClass()

getClass() returns a Class object (an object of type Class) that can be manipulated with the methods of the last section.

There’s little reason to override this method in your own classes.

Hash codes

Hash codes are integers used as keys in hash tables. Each object that can serve as a key in a hash table must be associated with a precise integer. The list of items in the hash table is then indexed with these integer keys. Equal objects — objects which compare equal to each other with the equals() method — are supposed to have identical hash codes. Unequal objects normally have different hash codes, although this is not always true. The efficiency of a hash table is closely related to the percentage of objects in the table with unique keys.

The default hashCode() method used by java.lang.Object is the numeric value of the reference to the object. Although Java programs aren’t allowed to convert 32-bit references to 32-bit ints, you can do this in native C code, and that’s exactly what Java does, at least on 32-bit platforms. (Porting Java to 64-bit platforms like the DEC Alpha or 16-bit platforms like Windows 3.1 is decidedly non-trivial, for this and many other similar reasons.)

No reference can point to two different objects. Therefore, hash codes calculated in this fashion will always be unique. Conversely, no object can have two different addresses, so an object always has the same hash code. (An object can be referred to by two different reference variables, but these two variables will still have the same value.)

The hashCode() method is closely tied to the equals() method. When you override equals(), you need to override hashCode(), too. Remember that all objects that are equal according to the equals() method must have the same hash code.

Threading

Discussion among many people in the same place at the same time tends to degenerate rapidly into babble as everyone begins talking at once. To make discussion possible among large groups of people, a special object, sometimes called a “magic feather,” is created and endowed with the special power that only the person holding the magic feather may speak. Because no more than one person can hold the magic feather at a time, no more than one person can talk at one time.

Many different threads talking at the same time can be a huge problem for Java programs as well. It is extremely important to guarantee that two different threads don’t modify the same object at the same time. Therefore, each object is created with its own magic feather. The magic feather for an object can be held by at most one thread at any given time. As long as a thread holds an object’s magic feather, it can do anything it’s normally allowed to do with the object. All other threads that want to use the object have to wait until they get the object’s magic feather.

I should note that “magic feather” isn’t a sufficiently impressive technical term for most programmers. Instead, the commonly used word is monitor.

If you search the java packages, you won’t find any class called monitor or magic feather. A monitor is not a separate object. It is a part of each individual object. Threads ask for an object’s monitor when they execute a synchronized instance method of the object, execute the body of a synchronized statement that synchronizes on the object, or invoke a synchronized static method of a class. (In the latter case, the Class object associated with the class is synchronized.) Threads give back the monitor when they finish executing the synchronized code.

By calling one of an object’s wait() methods, a thread can yield possession of the monitor and put itself to sleep until the monitor is available again. The thread can then be awakened with the object’s notify() or notifyAll() methods. There are three polymorphic wait() methods. These are

     public final void wait() throws InterruptedException
     public final void wait(long  ms) throws InterruptedException
     public final void wait(long  ms, int  ns) throws InterruptedException

Each of these methods causes the calling thread to release the object’s monitor and go to sleep until another thread notifies threads waiting on this object’s monitor to wake up. At that point, the thread wakes up, waits until it can regain the object’s monitor, and then resumes running. The first wait() method, with no arguments, sleeps indefinitely. The second, with a single long argument, sleeps for at most the specified number of milliseconds and then wakes up whether it has been notified or not.

The third and final wait() method allows more finely grained control, down to a nanosecond. The first argument is the number of milliseconds to wait before waking, and the second argument is the number of nanoseconds to add to that. Not all architectures allow such finely grained timing. You shouldn’t rely on accuracy of more than a millsecond or two.

There are two notify methods: notify() and notifyAll(). Their signatures are

     public final void notify()
     public final void notifyAll()

The notify() method wakes up a single thread that’s waiting on this object’s monitor. The notifyAll() method wakes up all threads waiting on this object’s monitor. Both of these methods should be called only in a thread that owns the object’s monitor.


 

Strings

The final thing that all objects must be able to do is to provide a string representation of themselves. They can do this with the toString() method. The default toString() method from the java.lang.Object class merely prints the name of the class of the object. Most classes will override this method with one that provides more information.

The toString() method is rarely called explicitly. It is instead invoked implicitly when an object is passed as an argument into a print() method or is concatenated with a string using a + sign. I’ll talk more about strings and toString() methods later in this chapter.

Arrays

What is an array? To a high-level programmer, an array is an indexed list of values of the same type with a fixed length. The most important feature of an array is that you can retrieve any particular element of the array in constant time. In other words, it takes no more or less time to retrieve the seventh component of the array than it takes to retrieve the 70th or the 700th.


Secret:  Internally, arrays are contiguous blocks of memory in the heap. To find the size of an array, multiply the size of the array’s type by the length of the array. For example, an int[] array with length 60 takes up 240 bytes of heap memory because it has space for 60 four-byte ints. The memory is used even if the space isn’t occupied.


Primitive data types like shorts and ints are stored directly in the array. They take up no more space than the array itself. In fact, an array of shorts, bytes, or chars takes up less space than the same number of short, byte, or char variables. Individual variables of these types are always promoted to ints. Thus, a byte variable occupies four bytes, and four byte variables occupy 16 bytes because each byte is promoted to an int. However, promotion does not occur inside arrays, so an array of four bytes occupies exactly four bytes. Similarly, an array of four shorts or an array of four chars occupies exactly eight bytes.

Arrays of objects are a little different. Each entry in the array is not the object itself but rather a reference to the object. Each reference requires four bytes, whether or not that reference is null. However, if a component of the object array is indeed non-null, then somewhere else in the heap is an object that also needs memory. When you calculate the total memory needed for an array of objects, you have to account for the array and the objects themselves separately. For example, consider this array:

     Integer[] iarray = new Integer[10];
     for (int i = 0; i < iarray.length; i++) {
      iarray[i] = new Integer(i);
     }

By looking at the source code for the Integer class, you discover that each Integer object has a single non-static int field. Thus, each Integer object occupies four bytes of heap memory. When the Integer[] array iarray is created, it has space for ten references. This takes up 40 bytes. As new Integer objects are created and added to the array in the for loop, each of these takes an additional four bytes. When the loop is complete, the array occupies 80 bytes of the heap. If you then set some of the array components to null, the corresponding Integer objects would eventually be garbage-collected, and the array would shrink to somewhere between 40 and 80 bytes of the heap.

Whether the array ever really occupies more than 40 bytes is a semantic question. You could just as reasonably say that the array always occupies 40 bytes and additional space may be occupied by the objects to which the array’s components refer. As long as you understand what’s really going on, you can say it however you want.

Multidimensional arrays

Java does not have true multidimensional arrays like Fortran does. Instead, it fakes them with one-dimensional arrays of references to one-dimensional arrays, much in the fashion that C does. For example, consider a two-dimensional array of doubles like this:

     double[][] matrix = new double[4][3];

This is allocated in two parts. First, a one-dimensional, length four array of references is allocated; then, each of these is pointed at a one-dimensional, length three array of doubles. In other words:

     matrix = new double[4][];
     for (int i = 0; i < 4; i++) matrix[i] = new double[3];


Note:  As you’ll see in Chapter 5, Java can accomplish this with one byte code instruction that has the same effect as the above code.


This means that even though matrix is declared as a two-dimensional array of doubles, matrix[0], matrix[1], matrix[2], and matrix[3] are all legitimate Java entities. For example, you can copy the first row of matrix into the third row using System.arraycopy() like this:

     System.arraycopy(matrix[1], 0, matrix[3], 0, 3);

You can use matrix[0], matrix[1], matrix[2], and matrix[3] anywhere a one-dimensional array of doubles is expected. This also means you can create ragged arrays — that is, arrays that do not have a fixed length in one dimension. For example,

     int[][] triangle = new int[12][];
     for (int i = 0; i < 12; i++) {
      triangle[i] = new int[i+1];
     }

The zeroth row of the triangle array has length one, the first row has length two, the second row has length three, and so on.

Higher dimensional arrays just have additional levels of indirection. For example,

     double[][][] Datacube = new double[4][3][7];

Datacube[0] through Datacube[3] are references to two-dimensional arrays of doubles; more precisely, they’re references to arrays of references to one-dimensional arrays of doubles. Datacube [0][2] is a reference to a one-dimensional array of doubles and can be used anywhere a one-dimensional array of doubles is needed.

Array classes and objects

Arrays are objects. They extend the java.lang.Object class, and you can call toString(), equals(), hashCode(), wait(), notify(), and all the other methods of the object class on a reference variable of array type. An array can be assigned to variables of type Object and passed to methods that expect a reference to an Object type. Arrays live in the heap like all other objects do; therefore, you use reference variables to refer to arrays.

However, there is no java.lang.Array class. Each array has an implicit type of the most primitive type it holds, followed by a number of left-bracket signs ([) equal to its dimension. Thus, an int[][] has type int[[ and a String[] array has type String[. These are not legal Java identifiers, and you won’t find them in Java source code, but you will find them in Java byte code.

As well as the methods inherited from java.lang.Object, arrays have a single field called length. This field contains the length of the array. I used this field above in the line

     for (int i = 0; i < iarray.length; i++) {

Other than this field, arrays mostly inherit the methods of java.lang.Object. The one they override is clone(). Arrays implicitly implement Cloneable, so you can call clone() without getting a CloneNotSupportException. However, the clone of a multidimensional array is a shallow copy. Only the initial references to the sub-arrays are copied, not the sub-arrays themselves.

System.arraycopy()

The System class contains one important method for working with arrays:

     public static void arraycopy(Object src_array, int src_index,
      Object dst_array, int dst_index, int length)

The System.arraycopy() method copies length values from the source array into the destination array. This is used in the StringBuffer and Vector classes as well as many other places. The basic algorithm looks like this:

     for (int i = 0; i < length; i++) {
      dst_array[dst_index+i] = src_array[src_index+i];
     }

In other words, the length components of src_array starting with component src_index are copied in order into the length components of dst_array starting at dst_index. The arrays src_array and dst_array can be the same array. If so, the copy is made as if the source components were first copied into a temporary array and then copied back into the array. This allows overlapping copies to work. If the arrays have reference types, then the copy is a shallow copy. That is, only the references are copied. The objects to which the references point are not copied.

If the arrays to be copied contain primitive data types, they must contain the same primitive data type. For example, an array of shorts can only be copied to another array of shorts, not to an array of ints or longs. This is one of the few places in Java where arithmetic promotion does not take place. If you try to copy an array of one primitive type such as short to an array of a wider primitive type such as int, then an ArrayStoreException is thrown and the destination array is left unchanged. Similarly if you try to copy an array of a reference type to an array of primitive type or vice versa, an ArrayStore-Exception is thrown, and the destination array is left unchanged.

Copies between arrays of reference types are a little more complex. As long as the reference type in one array can be converted to the reference type of the other array, the copy takes place. Thus, it is acceptable to copy a Float[] array to a Number[] array or an Object[] array to a String[] array because Floats can be cast to Numbers and Objects can be cast to Strings. (The latter cast works only if the Objects are in fact Strings. Otherwise, an ArrayStoreException will be thrown.) However if the reference types are incompatible — for example, Float and String — then an ArrayStoreException will be thrown.

It’s possible for some of the components of the source array to be compatible with the type of the destination array and some to be incompatible. For example, if src_array has type Object[], then it can contain both Floats and Strings. If dst_array has type String, then you can copy the Strings from src_array to dst_array but not the Floats. In this case, the components of the src_array that can be copied starting with component 0 are copied. However, as soon as an incompatible component is encountered, an ArrayStoreException is thrown, and no further components are copied, compatible or incompatible.

This method can also throw an ArrayIndexOutOfBoundsException, in which case the destination array is not changed. (The source array is never changed.) An ArrayIndexOutOfBoundsException is thrown if srcIndex, dstIndex, or length is negative, if srcIndex+length is greater than srcArray.length, or if dstIndex+length is greater than dst.length.

The System.arraycopy() method is normally implemented in native code for maximum performance. While it would be possible to write an equivalent routine in pure Java, most architectures have extremely efficient native instructions for copying large blocks of memory from one place to another. Because an array is natively a large block of memory, this is one of the places where native code helps a lot.

Strings

Strings are objects like any other object in Java. There is a java.lang.String class. However, the compiler has special support for a number of things you might want to do with strings. For example, you can create a new String object like this:

     String s="Hello world! ";

You don’t need to write

     String s = new String("Hello world! ");

although you could. You can't do that with any other kind of object. For example, if you write

     Double d = 7.5;

you get this compiler error:

     Error:    Incompatible type for declaration. Can't convert
      double to java.lang.Double.

Strings are the only class that can be initialized from literals without an explicit constructor call. In point of fact, what’s really happening is that the compiler figures out that it needs to call a String constructor. It translates the statement “String s=”Hello world! “;” into the following code:

     StringBuffer sb = new StringBuffer("Hello world! ");
     String s = new String(sb);

This is not the only way the String s="Hello world! "; statement could be compiled. However, somewhere there has to be a call to a String() constructor, even if you didn’t write one in the source code.


Note:  Actually, depending on what use is made later of the String objects, an optimizing compiler may include only the String literal in the byte code and not actually create an object.


The Java compiler handles many other idioms for manipulating strings. For example, you probably know that you can concatenate strings with a plus sign (+) like this:

     String s3 = s2 + s1;

The compiler translates that into a sequence of statements like this:

     StringBuffer sb = new StringBuffer();
     sb.append(s1);
     sb.append(s2);
     s3 = sb.toString();

Everything is accomplished with method calls. There are no additive operators that truly operate on strings. That’s an illusion supported by the compiler. Several other illusions are also supported by the compiler. Consider this common statement:

     System.out.println("Count is: " + i);

Here you have a String literal concatenated with an int variable. The compiler will translate this into something like this:

     StringBuffer sb = new StringBuffer("Count is ");
     sb.append(i);
     String s = sb.toString();
     System.out.println(s);

It is much simpler to just write System.out.println("Count is: " + i); than to put all the pieces together inside a StringBuffer yourself. That’s why this intelligence was built into the compiler and the language specification. Much more complex concatenations are compiled similarly.

String implementation

In Java, at least the version of Java written by Sun, Strings are implemented as arrays of chars. The String class has three non-static fields: value (an array of chars), offset (an int), and count (also an int). Every string therefore takes up at least 12 bytes of heap memory. However, you also need space for the char array.

The char array value has one space for each character in the string. In theory, using the offset and length fields, the value array can have slots for more chars than are actually present in the string. The offset field says which part of the string contains the first character of the string, and the length field says how many characters there are in the string. In practice, there’s no way for that to happen. The value array for the string “Hello world!” thus has length 12.

Strings are immutable. Once a string is created, it is never changed. A String variable may change, but a String object never changes. Consider the following code fragment:

     String s;
     s = "Hello World!";
     s = "Goodbye World";

Here the String variable s is first null. It then refers to an object with the value “Hello World!” and then to a String object with the value “Goodbye World”. However, these are two different objects, not different versions of the same object.

Similarly, if you were to concatenate another string to s, it would require the creation of still another String object. For example,

     String s;
     s = "Hello World!";
     s += " Isn't it a beautiful morning?";

This compiles to this sequence of statements:

     StringBuffer sb = new StringBuffer("Hello World!");
     sb.append(" Isn' it a beautiful morning?");
     s = sb.toString();

Note that the string itself is never changed.

However, in general, Java compilers are fairly smart about optimizing out unnecessary transformations to String objects. If you actually did something with the string between the second and third lines, then the compiler would be forced to actually create the String objects. For example, consider this code fragment:

     String s;
     s = "Hello World!";
     System.err.println(s);
     s += " Isn't it a beautiful morning?";

This compiles to this sequence of statements:

     String s1 = new String("Hello World!");
     System.out.println(s1);
     StringBuffer sb = new StringBuffer();
     sb.append(s1);
     String s2 = new String(" Isn't it a beautiful morning?");
     sb.append(s2);
     String s3 = sb.toString();

Three separate strings are created while this program runs. In the source code, however, it looks like one string is being changed. This is an illusion supported by the compiler.

StringBuffers

Java strings are immutable; that is, a string’s value may not be changed after the string is constructed. This makes strings very thread-safe and fairly fast, but also makes them excessively inefficient for many operations. For example, suppose you write

     String s = "one";
     s += " two";
     s += " three";
     s += " four";
     s += " five";

Compiling these statements with only immutable strings would require the construction of five separate strings: first “one,” then “one two,” then “one two three,” then “one two three four,” and finally “one two three four five.” There’s no way, using only strings, to create one string and then append the new parts to it.

A StringBuffer is a string that can be changed. You can add additional characters to the end or the beginning or the middle of a StringBuffer without creating new objects. On the other hand, because StringBuffers are mutable, they’re not inherently thread-safe, and thus many of the methods of the StringBuffer class are synchronized. This can slow down the execution of a program when Strings aren’t changing.

In short, the String class is optimized for strings that don’t change; the StringBuffer class is optimized for strings that do change. In general, the Java compiler is fairly smart about figuring out which class to use where. Nonetheless, the more manipulation of a string you’re doing, the more efficient it becomes to use a StringBuffer and convert it to a string only when you’re done.

The main methods of the StringBuffer class are append() and insert(). These methods are polymorphic, so they can accept any type or class of data. The append() method adds characters at the end of the StringBuffer; insert() places the new characters at a specified position in the StringBuffer. These methods are sufficiently polymorphic to handle all Java data types. There are ten different append()methods:

     public StringBuffer append(boolean  b)
     public StringBuffer append(char  c)
     public StringBuffer append(double  d)
     public StringBuffer append(float  f)
     public StringBuffer append(int  i)
     public StringBuffer append(long  l)
     public StringBuffer append(String  str)
     public StringBuffer append(Object  obj)
     public StringBuffer append(char  str[])     
     public StringBuffer append(char  str[], int  offset, int length)

Each of these methods converts its argument to a string format and appends it to the StringBuffer. The first six of these methods handles the primitive data types. Shorts and bytes are promoted to ints before conversion. Strings are also appended directly. All other object types are converted to a String object, using their toString() method, and then appended. The final append() method appends an array of chars to the StringBuffer. The second-to-last method appends the entire array, while the last method appends only the length characters in the array beginning with the character at offset. The only thing that you cannot append to a StringBuffer is an array of type other than char[].

The insert() methods are almost equally polymorphic with the exception of the methods to handle arrays of chars. These methods are

     public StringBuffer insert(int  offset, boolean  b)
     public StringBuffer insert(int  offset, char  str[])
     public StringBuffer insert(int  offset, double  d)
     public StringBuffer insert(int  offset, float  f)
     public StringBuffer insert(int  offset, int  i)
     public StringBuffer insert(int  offset, long  l)
     public StringBuffer insert(int  offset, Object  obj)
     public StringBuffer insert(int  offset, String  str)

These methods insert the string representation of their second argument beginning at the offset specified in the first argument. All characters after offset are shifted to the right. How far they’re shifted is completely dependent on the length of the string format of the second argument.

Like Strings, a StringBuffer is fundamentally a private array of chars called value:

     private char[] value;

It helps to keep this array in mind when considering where to insert an item. Because a StringBuffer is an array, the first character of the StringBuffer is number zero, not number one.

Also like with Strings, the length of the value array is set when the StringBuffer object is first constructed. There are three constructors:

     public StringBuffer()
     public StringBuffer(int  length)
     public StringBuffer(String  s)

The noargs constructor starts with an array of length 16. The second constructor initializes the array to the specified length. The third constructor starts with an array of length s.length() + 16 — that is, the length of the string plus 16 empty spaces. This is because of the expectation that whatever length of the string that the StringBuffer initially holds, it will be expanded later.

So far this is very much like the String class. The difference, however, is that the buffer can expand as necessary to hold more characters. Suppose you have the following code:

     StringBuffer sb = new StringBuffer(6);
     sb.append("Hello world!");

The string “Hello world!” has 12 characters. Because 12 characters is six too many for the six character StringBuffer sb, Java expands the array with the ensureCapacity() method:

     public synchronized void ensureCapacity(int minimumCapacity)

The ensureCapacity() method expands the array to two times the current capacity plus one (2 * (capacity() + 1)) or to the requested minimum capacity, whichever is greater. A new array is allocated of the appropriate size; the old value array is copied into the new value array with the System.arrayCopy()method; and the reference is set to the new array. In code, it looks something like this:

     char[] value;
     ...
     char[] newValue = new char[Math.max(minimumCapacity,
       2*(value.length + 1))];
     System.arraycopy(value, 0, newValue, 0, value.length);
     value = newValue;

Notice that the length of the value array is at least doubled. It is not nearly expanded to the minimum necessary length. Although you don’t want to waste space unnecessarily, it’s even more important not to waste CPU cycles by growing the array every time you have to add a character to it. You will see this scheme for growing arrays again very shortly. The java.util.Vector class does almost exactly the same thing.


Note:  Although ensureCapacity() is public, you rarely need to call it directly. The insert() and append() methods of the StringBuffer class will call it when they need to.


The value array is also expanded when an item is inserted into the middle of a StringBuffer with one of the insert() methods. However, the value array will not be expanded to provide additional space if you try to insert an item past the end of the StringBuffer. If you try, a StringIndexOutOf BoundsException will be thrown.

There are a few other useful methods in the StringBuffer class. The charAt(int i) method returns the ith char in the StringBuffer. This is easy to do because you can just return the ith char in the value array.

The setCharAt(int i, char c) method changes the character at index i in the StringBuffer to the char c. This differs from the insert(int i, char c)method because it actually replaces the character at i rather than shifting it and all following characters to the right. A StringIndexOutOfBounds Exception is thrown if i equals or exceeds the length of the string.

The length() method returns an int giving the number of characters currently present in the StringBuffer. This is generally not the same as the length of the value array. That number is called the StringBuffer’s capacity and is accessed with the capacity() method.

The ensureCapacity() method already discussed changes a StringBuffer’s capacity. This is not the same as a StringBuffer’s length. The capacity of a StringBuffer is the number of characters that can be stored in it without taking time to expand the internal value array. The length of a StringBuffer is the number of characters currently stored in the internal value array. To change a StringBuffer’s length, invoke its setLength(int i) method. This does one of two things: if i is less than the current length of the StringBuffer, then the StringBuffer is truncated to the length i; however, if i is greater than the current length of the StringBuffer, then it is expanded with null characters to the requested length.

The reverse() method reverses the characters in the StringBuffer in place. For example, if sb contains the string “dam”, after calling sb.reverse() it will contain the string “mad”.

The getChars() method copies characters from the StringBuffer into an array. Its signature is

     public void getChars(int srcBegin, int srcEnd, char dst[],
      int dstBegin)

A substring of characters from the StringBuffer is copied into the char array dst. The substring to be copied is delineated on the left by the index srcBegin and on the right by srcEnd-1. The characters of this substring are copied into the subsection of the char array dst beginning at dstBegin. A StringIndexOutOfBoundsException is thrown if either srcBegin or srcEnd is less than zero or greater than or equal to the length of the string. An ArrayIndexOutOfBoundsException is thrown if dstBegin is less than zero or greater than or equal to the length of the array or if the substring exceeds the bounds of the array.

Finally, like most other classes, StringBuffer contains a toString() method. This method is often invoked as the last step in a long sequence of string operations.

java.util Data Structures

The java.util class includes several abstract data structures to hold collections of objects. On one level, the programmer is supposed to be shielded from the internal workings of these classes. The whole point of a class library and abstract container classes is to shield the programmer from low-level details.

However, there are performance issues that can be addressed only by learning how a class is implemented. For example, if the Vector class is implemented as a linked list, then insertions into the list will be very fast. However, finding a particular element in the list will be rather slow. On the other hand, if the Vector class is really an array, then insertions into the middle of the list may be quite slow but retrieving an element from the list can be quite fast.

If you’re writing an applet or application for public distribution, you may not wish to rely on what I say here about the internals of these classes. Although these data structures are implemented similarly on all Java implementations to which I have access, it is possible (though unlikely) that the implementation details will change in the future. If the performance of your program depends on a specific implementation of a class, then you may wish to copy the source code for the class, change its name, and recompile it. Then you would use your modified class instead of the original. Of course, this will increase the size of the program you have to distribute. This could be important for an applet, but is unlikely to be significant for an application. As with many other things, you must make the tradeoffs that are appropriate for your situation. You have to decide what’s more important to you, guaranteed performance characteristics or smaller download size.

Vectors

A Vector is Java’s basic list class. A list is an ordered collection of items. The fundamental operations on a list are

!  Creating a new list

!  Adding an element to the list

!  Removing an element from the list

!  Finding the nth element in the list

The main difference between a list and an array is that a list generally doesn’t have a fixed size. It can grow or shrink as needed. Furthermore, a list doesn’t have empty spaces. If you remove a component from the middle of an array, you leave an empty space. All the other components stay where they are. On the other hand, if you remove an item from a list, all the items above it in the list are moved down to fill the space left.

Implementation

There are many different data structures that you can use to implement these operations. The very word list suggests a linked list data structure to many programmers. However, the Vector class is not a linked list, although you can do with a Vector anything you can do with a linked list. Instead, a Vector is a growable array.

If that sounds funny to you, it should. Java arrays are not growable. You cannot take an array that was initialized with space for ten doubles and expand it so that it has space for twenty doubles. Once an array is created, its size is fixed.

However, you can, memory permitting, create a new array with space for 20 doubles. Then you can copy the components of the original array into the new array. If you then adjust all references to the old array to point to the new array instead, it’s as if you grew the old array.

The hard part of this procedure is finding all the references to the old array. Java solves this problem by making the array you want to grow a protected field in a public class: Vector. Only the Vector class ever holds any references to the array, and it has only one reference to the array. Other objects have references only to Vector objects. Therefore, when the array needs to be grown, you need to adjust only a single reference in the Vector class. This sort of data encapsulation is one of the primary advantages of object-oriented programming.

You should recall that this is almost exactly what the StringBuffer class does when it needs to expand. StringBuffers use an array of chars instead of an array of objects like Vectors, but otherwise the logic is the same.

     Object[] elementData;
     ...
     // double the vector's capacity
     Obj ect[] newData = new Object[2*elementData.length];
     System.arraycopy(elementData, 0, newData, 0,
      elementData.length);
     elementData= newData;

The array that grows is a protected field of the Vector class called elementData; the number of elements currently stored in the array is a protected int field called elementCount. The elementCount, the number of elements in the array right now, should not be confused with the capacity of the array, which is the number of elements that can be stored in elementData before you need to grow it. In other words, the capacity is elementData.length.

The fact that vectors are really just arrays behind the scenes has many implications for the performance and efficiency of the fundamental list operations. It means that insertions at the end of the list are quick, unless the vector has run out of space, in which case they can be quite slow. Insertions into the middle of the list are always slow. Finding the item at a given position in the list is fast.

The array implementation of the Vector class also sets an upper limit on how many items you can place in a Vector. A Vector can hold up to 2,147,483,647 objects because an array is indexed with a signed int. In practice, you’d run out of memory before you ever stuffed that many objects into a Vector.

Creating a new Vector

There are three constructors that create a new Vector:

     public Vector()
     public Vector(int  initialCapacity)
     public Vector(int  initialCapacity,  int  capacityIncrement);

Every Vector has a capacity — that is, the number of objects it can hold before it must be expanded. Because it takes time to expand a Vector, you generally want to have some empty space in a Vector so you don’t need to expand it every time you add an element to it. However you don’t want to waste space if you can avoid it.

The noargs Vector() constructor creates a new Vector with an initial capacity of ten. When that capacity is exceeded, the Vector doubles in size. That is, when you add the eleventh element to the Vector, it expands itself to a capacity for 20 elements. When you add the 21st element, the Vector expands itself to a capacity for 40 elements, and so on.

If you know how many elements that the Vector will probably need to hold when you construct it, you can speed up your program by using the Vector(int initialCapacity) constructor. For example, if you know you’re going to put about 30 elements in a Vector, give or take five, then you should construct it like this:

     Vector myVector = new Vector(35);

Unless the maximum number of elements you’ll place in the Vector is substantially greater than the average number of elements you’ll put there, you should always construct a Vector with the maximum number of elements you expect. Each empty space in a Vector only takes up four bytes. Therefore, unless you’re really pressed for space, it’s much more important to avoid unnecessary resizing than to worry about the space wasted by a few empty slots.

If you don’t want a Vector’s capacity to double every time it needs to be expanded, you can also pass a capacity increment to the constructor. For example, to create a Vector whose capacity increases in blocks of ten at a time and has an initial capacity of 35, you would write:

     Vector tenVector = new Vector(35, 10);

It’s rather unusual to do this, however. Most of the time the disadvantage of the CPU time you’ll lose to the extra resizing outweighs the small intermediate space savings you’ll achieve.

Inserting and removing elements

There are two methods to put a new object into a vector, addElement() and insertElementAt(). Their signatures are

     public final void addElement(Object  o)
     public final void insertElementAt(Object  o, int  index)

addElement() places the object at the end of the Vector. This method takes essentially constant time as long as there’s empty space left in the vector. It takes slightly longer if the Vector first needs to be grown.

insertElementAt() places the object at the specified position in the vector. All elements of the vector at or beyond the specified index are moved up one to make room for the newcomer. The efficient native method System.arraycopy() is used to move the remaining elements up, but it’s still less than ideal. If a program requires frequent insertions into the middle of a list, you may well be better off writing your own linked list class.

You can also replace elements in vectors. Because the size of the vector stays the same, this is always quite fast. The relevant method is:

     public final void setElementAt(Object  o, int  index)

The object that was previously at index is no longer there. Instead, it is replaced by the new Object o. If there are no other references to the old object, then it will eventually be garbage-collected.

Elements can also be removed from a vector. This generally requires moving elements down that were above the object. Again this can be done with the System.arraycopy() method, but if you’re doing a lot of this, you should consider using a linked list instead. The method is

     public final void removeElementAt(int  index)

You can also remove an object from a Vector without knowing its index. The method to do this is

     public final boolean removeElement(Object  o)

However, this method requires Java to traverse the entire Vector, searching for the requested object. Thus, its execution time is proportional to the length of the array. In computer science terms, this is an O(n) (pronounced “order en”) operation where n is the number of elements in the Vector.

Furthermore, only the first occurrence of the object in the Vector is removed. This may or may not be what you expect. Removing all references to the object from the Vector requires multiple passes through the vector. If the object is found and removed, removeElement() returns true. Otherwise, it returns false. You can remove all references to an object o from a Vector v in one line of code like this:

     while (v.removeElement(o)) {;}

However, this is deceptively simple. If there are n references to o in v, then this single line of code requires n+1 passes through the vector to remove them all.

The removeAllElements() method deletes every element in the Vector. Its signature is

     public final void removeAllElements()

In the current implementation, Java deletes all elements from a Vector by looping through the Vector and setting each element to null. This is an O(n) operation, but it’s less than optimally efficient. It looks something like this:

     for (int j = 0; j < elementCount; j++) elementData[j] = null;
     elementCount = 0;

Other implementations are possible. For example, you could simply reallocate a new elementData array like this:

     elementData = new Object[elementData.length];
     elementCount = 0;

This only takes constant time; this is an O(1) operation.

You could even leave the elementData array untouched and change only the elementCount field. This is even quicker. The old elementData array components will simply be overwritten as necessary. However, this has the potential to cause memory leaks because references to the old components still exist in the array, and thus those objects will not be garbage-collected.

Finding Objects in Vectors

In addition to the above methods for manipulating the contents of a Vector, there are several methods to find objects in a Vector. The elementAt(int i) method returns a reference to the Object at index i in the Vector. If i is not a valid index into the Vector, an ArrayIndexOutOfBoundsException is thrown.

     public final Object elementAt(int  i)

The firstElement() and lastElement() methods return references to the first and last elements in the Vector respectively. If the Vector has no elements, and thus no first or last element, a NoSuchElementException is thrown.

     public final Object firstElement()
     public final Object lastElement()

Five methods search for a particular object. Because this requires a traversal of the array, these methods are proportional to the number of elements in the Vector. These methods are:

     public final boolean contains(Object  o)
     public final int indexOf(Object  o)
     public final int lastIndexOf(Object  o)

The contains() method returns true if o is in the Vector and false if it isn’t. The indexOf() and lastIndexOf() methods return the first and last indices of the specified object in the Vector respectively. If the object is not found in the Vector, then -1 is returned.

You can pass an extra int to these methods to indicate the element at which to begin searching.

     public final int indexOf(Object  o, int  index)
     public final int lastIndexOf(Object  o, int  index)

The indexOf() method searches forward beginning at that index, and lastIndexOf() searches backward beginning at that index. An ArrayIndex OutOfBoundsException is thrown if the index is less than zero or greater than or equal to the number of elements in the Vector.

Miscellaneous methods

There are three other useful methods in the Vector class. The isEmpty() method returns true if the Vector has no elements and false if it has one or more elements.

     public final boolean isEmpty()

The setSize() method either shrinks or expands the Vector to a given non-negative size. If the vector is expanded, then the new elements are set to null. If the vector is shrunk, then elements past the requested size are removed from the Vector.

     public final void setSize(int  newSize)

Finally, the elements() method returns a reference to an object which implements the Enumeration interface. This provides a convenient way for external classes to process each and every element of the Vector. For example,

     Enumeration e = v.elements();
     while (e.hasMoreElements()) {
      Object o  = e.nextElement();
      // work with o
     }

When should you use a Vector?

Vectors are one of the most useful container classes in the java.util package. However, they’re not right for everything. In particular, Vectors are slower than raw arrays; they can’t handle primitive data types, and they can be quite slow when inserting or removing objects from any place except the end of the Vector.

If you can use a fixed array instead of a Vector, you should. As a rough rule of thumb, any operation that uses a Vector is about three times slower than the same operation performed with a plain array. That’s primarily a result of the extra method calls needed to perform basic insertions, deletions, and accesses. It’s always better to do it directly if you can.

The most common reason to use a Vector is that you don’t know how many objects you’ll need to deal with at compile time, only at runtime. However, if the Vector is going to be filled only once and then not modified, you’re better off using a real array.

For example, one place common place vectors are used unnecessarily is in processing <PARAM> tags passed to an applet. It’s quite common to pass an undetermined number of parameters to an applet like this:

     <PARAM NAME="String1" VALUE="Kalel">
     <PARAM NAME="String2" VALUE="Kara">
     <PARAM NAME="String3" VALUE="Jorel">
     <PARAM NAME="String4" VALUE="Lara">
     <PARAM NAME="String5" VALUE="Lois">

You may not know when you compile an applet how many of these parameters there will be. At first glance this appears to be a perfect place for a Vector. You can read these parameters into the Vector in your applet’s init() method like this:

     Vector v = new Vector();
     String name = null;
     int i = 0;
     while ((name = getParameter("String" + ++i)) != null) {
      v.addElement(name);
     }

However, this is overkill. You do not need the full power of a Vector just to collect an undetermined number of quantities. There are several alternative solutions to this problem that don’t involve the overhead of a Vector. For example, you could create the Vector as above, then copy it into an array with an Enumeration, like this:

     String[] names = v.size();
     Enumeration e = names.elements();
     int i = 0;
     while (e.hasMoreElements()) {
      names[i++]  =(String)  e.nextElement();
     }

Alternatively, you could just use the first loop to determine how many PARAM tags you have to deal with, then use a second loop to read them, like this:

     String name = null;
     int i = 0;
     while ((name = getParameter("String" + ++i)) != null) {
     }
     String[] names = new String[i];
     for (int j = 0; j <= i; j++) {
      names[j] = getParameter("String" + (j+1));
     }

There are some other more complicated solutions, such as implementing your own growable array, but the bottom line is that you shouldn’t use a Vector as merely an array whose size will be determined at runtime. You should use a Vector only when the array is going to be constantly changing size and having elements removed and deleted. If you’re only calling addElement() and never insertElementAt(), contains(), removeElement(), or the other such methods of the Vector class, then using a Vector instead of a plain array will make your program slower than it needs to be.

Bitsets

The Bitset class is an indexed list of bits. Each component of a Bitset has a boolean value: true or false. A one or set bit is considered to be true and a zero or unset bit is considered to be false. The primary purpose of a Bitset is to provide an extremely space-efficient means of storing many binary values. Bitsets are not necessarily very fast, but they should be very small.

Java implements Bitsets as arrays of longs. The first 64 bits — that is, bits 0 through 63 — are stored in the zeroth component of the array. The second 64 bits — that is, bits 64 through 127 — are stored in the first component of the array, and so on.

Extensive manipulation of these longs with the bit-level operators discussed in Chapter 2 is used to extract and set the values of individual bits. For example, consider the process of extracting the value of bit 97 from the Bitset bs. At the high level, all you need to do is this:

     boolean b = bs.get(97);

Now consider what you have to do to get the value of the 97th bit from an array of longs called la:

     boolean b;
     long L = la[1];  // the 97th bit is in the first component
     L = L & 0x00008000; // mask off the 97th bit
     if (L == 0) b = false; // bit 97 wasn't set
     else b = true; // bit 97 was set

That’s a lot more complex, which is why this is hidden inside a class in the first place. Of course you want a general method for finding an arbitrary bit. That takes even more effort, such as this:

     long[] la;

     public boolean get(int i) {

      int index = i / 64; // find the right component of the array
      long bit = 1L << (i % 64); // find the right bit
      long result = la[index] & bit; // mask off the desired bit
      if (result == 0) return false; bit wasn't set
      else return true;

     }

Sun’s actual code is a little faster than this, but a little harder to understand.

Some operations are easier. For example, the logical operations and, or, not, and xor simply require performing the equivalent bitwise operations between each corresponding component of the arrays forming the two Bitsets. The only catch is that you may need to expand one array if it’s smaller than the other is.


 

Stack

The Stack class implements a classic stack data structure, that is a last-in-first-out (LIFO) set of objects. There are three fundamental operations you can perform with a stack:

1.  Put an object on the top of the stack. This is called pushing.

2.  Take an object off the top of the stack. This is called popping.

3.  Look at the object on the top of the stack, but leave it there. This is called peeking.

Notice that all three of these operations operate only on the top of the stack. To see what’s further down in the stack, you must first remove everything that’s on top of it.


Note:  Some people claim that there are only two fundamental stack operations: pushing and popping. Peeking can be considered to be a pop followed by a push of the object back onto the stack.


Java provides methods to perform all three fundamental operations: pushing, popping, and peeking. They are

     public Object peek()
     public Object pop()
     public Object push(Object  item)

When an object is placed in a stack, it loses its type information. Therefore, when you retrieve it from the stack, you have to cast it back. For example,

     String s = (String) theStack.pop();

Java has two more utility methods that, while useful, are not part of the minimal requirements for a stack. They are

     public boolean empty()
     public int search(Object  o)

The empty() method returns true if the Stack contains no elements or false if it does not.

The search method returns an int that tells you how deep in the stack the object o is. The object on the top of the stack is at position 0; the second to the top object is at position 1, and so on. If the object is not found in the stack search() returns -1.

Objects are tested for their presence in the stack with the equals() method, not with ==. Thus, in some sense, search() is looking for an object equivalent to the requested object, not necessarily the same object. For example, consider the following code:

     Stack theStack = new Stack();
     URL u1 = new URL("http://sunsite.unc.edu/
      javafaq/javafaq.html#xtocid1902961");
     URL u2 = new URL("http://sunsite.unc.edu/
      javafaq/javafaq.html");
     theStack.push(u1);
     theStack.push("Here's a string");
     theStack.push("Here's another string");
     theStack.push(u2);

When this code has finished, the stack looks like this:

     0 (URL u2) http://sunsite.unc.edu/javafaq/javafaq.html
     1 (String) "Here's another string"
     2 (String) "Here's a string"
     3 (URL u1) http://sunsite.unc.edu/javafaq/javafaq.html#xtocid1902961

Now suppose you search for u1 like this:

     int result = theStack.search(u1);

If you print out the result, you will see that it’s 0, not 3, even though u1 is at position 3 in the stack, not position 0. The stack is searched from top to bottom for the first object for which o.equals(u1) is true. In this example that’s u2, http://sunsite.unc.edu/javafaq/javafaq.html, because the URL class’s equals() method does not consider the ref to be part of a URL.

In Java, the Stack class extends the Vector class. Therefore, you can do anything with a stack that you can do with a Vector. Most significantly this means that you can use the at methods: elementAt(), insertElementAt(), removeElementAt(), and setElementAt(). This is unfortunate because it allows the integrity of a stack to be violated. The whole point of a stack is that operations always take place only on the top of the stack. Many algorithms depend on this assertion. It is extremely bad style to violate this directive.

It is certainly true that there are situations in which it is useful and convenient to operate on elements in the middle of a list. However, if this is what you need to do, you should use a raw Vector, not a stack.

The Stack class was written by Jonathan Payne. My best guess is that he decided to implement Stack as a subclass of Vector in order to save time through code reuse. However, what he probably should have done was to have used the Vector class via encapsulation rather than inheritance. Listing 3.2 demonstrates how the Stack class could have been quickly written without allowing too much access to the nether regions of the stack.

Listing 3-2 An alternative vision for the Stack class

     /*
      * @(#)Stack.java 1.12 96/12/09
      *
      * Copyright 1996 Elliotte Rusty Harold.
      *
      * Permission to use, copy, modify, and distribute this software
      * and its documentation for all purposes and without
      * fee is hereby granted provided that this copyright notice
      * appears in all copies.
      */

     package com.macfaq.util;

     import java.util.*;

     /**
      * A stack that guarantees its own integrity.
      *
      * @version   1.0, 12/09/96
      * @author  Elliotte Rusty Harold
      */
     public class Stack {

      Vector theStack = new Vector();

      public Object push(Object o) {

       theStack.addElement(item);
       return o;

      }

      public Object pop() {

       Object  o;

       int len = theStack.size();
       o = peek();
       theStack.removeElementAt(len - 1);
       return o;

      }

      public Object peek() {

       Object  o;

       int len = theStack.size();
       if (len == 0) throw new EmptyStackException();
        return theStack.elementAt(len - 1);

      }

      public boolean empty() {

       if (theStack.size() == 0) return true;
       else return false;

      }

      public int search(Object o) {

       int i = theStack.lastIndexOf(o);

       if (i >= 0) return theStack.size() - i;
    return -1;

       }

     }

This Stack class performs all the necessary stack operations; it’s still based on the Vector class, so it can take advantage of any optimizations made there, but it does not allow other classes to violate the stack structure.

Summary

This chapter is really about distinctions between concepts and words that are very closely related.

!  You learn what Java objects really are: chunks of memory in the heap. You also learn that Java uses references to locate those areas of memory in the heap and that all access to objects takes place through references. You learn how a reference to an object differs from the object itself.

!  There is a class for objects (java.lang.Object), and classes are themselves objects of type java.lang.Class. A little-c class is not quite the same thing as a capital-C Class, nor is a little o-object quite the same thing as a capital-O Object. You learn about the methods of the java.lang.Object class and how you use and override those in classes you write.

!  You learn that Strings are immutable arrays of Unicode characters. You also learn that StringBuffers are growable arrays of Unicode characters and that behind the scenes, many String operations are performed with StringBuffers.

!  One-dimensional arrays of primitive data types are contiguous blocks of memory containing primitive values. However, one-dimensional arrays of reference data types contain only references to objects that live elsewhere in the heap; Multidimensional arrays are arrays of arrays, and all arrays implicitly subclass java.lang.Object.

!  Finally, you learn about the internal structure of several common Java container classes: Vector, Stack, and Bitset. By understanding how these classes operate, you can now make intelligent decisions about when and whether these classes are appropriate for your needs and when you should rewrite or replace them with classes of your own devising.

 

Hosted by www.Geocities.ws

1