Chapter 5. The Class Loader and Class File Verifier

The Class Loader and Class File Verifier

In this chapter we explore a number of topics:

How the components of the Java virtual machine work together to implement the Java security model
How the class loader locates and loads class files
How the class file verifier ensures that class files are legal prior to execution

In addition, we discuss issues to keep in mind when designing your own ClassLoader.

Overview of the Java Security Model

Before examining the components of the security model in detail, we'll take a high-level look at the whole process involved in loading and running a class.

See Steps in Loading a Class illustrates the steps involved in loading a class into the JVM.

Steps in Loading a Class

When an applet or application requests a class file, the execution environment, whether it be a browser or the Java VM running from a command line, invokes a class loader to locate and load the class. 1
The class loader receives the class as an array of bytes and converts it into a Class object in the class area of the JVM. The class area may be a part of the JVM heap (where all other objects are created and stored) or a separate region of memory.
Depending on the class loader which loaded the class file, the JVM may also run the class file verifier. The verifier is responsible for making sure that class files contain only legal Java bytecodes and that they behave in a consistent way (for example, they do not attempt to underflow or overflow the stack, forge illegal pointers to memory or in any other way subvert the JVM). More details of this are in See The Class File Verifier .
Assuming that the class passes verification, the JVM is handed a loaded class. It then links the class by resolving any references to other classes within it. This may result in additional calls to the class loader to locate and load other classes.
Next, static initialization of the class is performed; that is, static variables and static initializers are run. Finally, the class is available to be executed.
In the context of an applet executing within a Web browser, there will always be an instance of the SecurityManager constructed. This may also be true in a Java application. When a SecurityManager is present, calls which could result in the system's integrity being violated (such as file read and write requests, network access requests, or requests to access the environmental variables) are presented to the SecurityManager for validation. If the SecurityManager refuses access, it does so by throwing a SecurityException. Since access to these key system functions is controlled by API calls within the trusted classes, there is no way to avoid the SecurityManager other than by replacing these classes.

Class Loaders

A class loader has a number of duties. Class loaders are the gatekeepers of the JVM, controlling what bytecode may be loaded and what should be rejected. As such they have two primary responsibilities:

To separate Java code from different sources, thus preventing malicious code from corrupting known good code
To protect the boundaries of the core Java class packages (trusted classes) by refusing to load classes into these restricted packages

The class loader has another, useful, side effect. By controlling how the JVM loads code, all platform-specific file I/O is channelled through one part of the JVM, thus making porting the JVM to different platforms a much simpler task.

Let's look a little more closely at these two aims and why they are necessary. First, Java code can be loaded from a number of different sources. These include but are not limited to:

The trusted core classes which ship with the JVM (java.lang.*, java.applet.* etc.)
Classes stored in the local file store and locatable via the CLASSPATH environmental variable
Classes retrieved from Web servers (as parts of applets)

Clearly, we would not want to overwrite a trusted JVM class with an identically named class from a Web server since this would undermine the entire Java security model (the SecurityManager class is responsible for a large part of the JVM runtime security and is a trusted local class; consider what would happen to security if the SecurityManager could be replaced by an applet loaded from a remote site). The class loader must therefore ensure that trusted local classes are loaded in preference to remote classes where a name clash occurs.

Secondly, where classes are loaded from Web servers, it is possible that there could be a deliberate or unintentional collision of names (although the Sun Java naming conventions exist to prevent unintentional name collisions). If two versions of a class exist and are used by different applets from different Web sites then the JVM, through the auspices of the class loader, must ensure that the two classes can coexist without any possibility of confusion occurring. Class type confusion is a key way of attacking the JVM and is discussed later in this chapter.

The last point, that the class loader must protect the boundaries of the trusted class packages merits further explanation. The core Java class libraries that ship with the JVM reside in a series of packages which begin "java.", for example, java.lang and java.applet. Within the Java programming language, it is possible to give special access privileges to classes which reside in the same package; thus, a class which is part of the java.lang package has access to methods and fields within other classes in the java.lang package which are not accessible to classes outside of this package.

If it were possible for a programmer to add his or her own classes to the java.lang package, then those classes would also have privileged access to the core classes. This would be an exposure of the JVM and consequently must not be allowed.

The class loader must therefore ensure that classes cannot be dynamically added to the various core language packages. It achieves this by examining the name of the class which it is being asked to load and refusing to load those which start with "java."

How Class Loaders Are Implemented

The JVM architecture diagram ( See Steps in Loading a Class ) shows two class loaders. In fact, the JVM may have many class loaders operating at any point in time, each of which is responsible for locating and loading classes from different sources.

One of the class loaders, the primordial class loader, is a built-in part of the JVM; that is, it is written in C or whatever language the JVM is written in and is an integral part of the JVM. It is the root class loader and is responsible for loading trusted classes; these are classes from the core Java classes and those classes which can be found in the CLASSPATH and usually in the local filestore.

Classes loaded by the primordial class loader are regarded as special insofar as they are not subject to verification prior to execution; that is, they are assumed to be well formed, safe Java classes. Obviously if would-be attackers could somehow inveigle a malicious class into the CLASSPATH of a JVM they could cause serious damage. 2

In addition to this primordial class loader, application writers (including JVM implementors) are at liberty to build more class loaders to handle the loading of classes from different sources such as the Internet, an intranet, local storage or perhaps even from ROM in an embedded system. These class loaders are not a part of the JVM; rather, they are part of an application running on top of the JVM, written in Java and extending the java.lang .ClassLoader class.

The most obvious example of this is in the context of a Web browser which knows how to load classes from an HTTP (Web) server. The class loader which does this is generally known as the applet class loader and is itself a Java class which knows how to request and load other Java class files from a Web server across a TCP/IP network.

In addition, application writers can implement their own class loaders by subclassing the ClassLoader class (note that such behavior may be disallowed by the SecurityManager in an applet; we discuss more of this in the next chapter).

It is clear then that there can be many types of class loader within a Java environment at any one time. In addition, there may be many instances of a particular type of class loader operating at once.

To summarize the above;

There will always be one and only one primordial class loader. It is part of the JVM, like the execution engine.
There will be zero or more additional ClassLoader derivatives, written in Java and extending the ClassLoader abstract class. In a Web browser environment there will be at least one additional class loader: the applet class loader.
For each additional ClassLoader type, there will be zero or more instances of that type created as Java objects.

Let's look at this last point more closely. Why would we want to have multiple instances of the same class loader running at any one time?

To answer this question we need to examine what class loaders do with a class once it has been loaded.

Every class present in the JVM has been loaded by one and only one class loader. For any given class, the JVM "remembers" which class loader was responsible for loading it. If that class subsequently requires other classes to be loaded, the JVM uses the same class loader to load those classes.

This gives rise to the concept of a name space: the set of all classes which have been loaded by a particular instance of a class loader. Within this name space, duplicate class names are prohibited. More importantly, there is no cross name space visibility of classes; a class in one name space (loaded by a particular class loader) cannot access a class in another name space (loaded by a different class loader).

Returning to the question "Why would we want to have multiple instances of a given ClassLoader derivative?", consider the case of the applet class loader. It is responsible for loading classes from a Web server across the Internet or intranets. On most networks (and certainly the Internet) there are many Web servers from which classes could be loaded and there is nothing to prevent two Webmasters from having different classes on their sites with the same name.

Since a given instance of a class loader cannot load multiple classes with the same name, if we didn't have multiple instances of the applet class loader we would very quickly run into problems when loading classes from multiple sites. Moreover, it is essential for the security of the JVM to separate classes from different sites so that they cannot inadvertently or deliberately cross reference each other. This is achieved by having classes from separate Web sites loaded into separate name spaces which in turn is managed by having different instances of the applet class loader for each site from which applets are loaded.

The Class Loading Process

The ability to create additional class loaders is a very powerful feature of Java. This becomes particularly apparent when you realize that user- written class loaders have first refusal when it comes to loading classes; that is, they take priority over the primordial class loader. This enables a user-written class loader to replace any of the system classes, including the SecurityManager. In other words, since the class loader is Cerberus to the JVM's Hades, you had better be sure that when you replace it, you don't inadvertently install a lapdog in its place.

We have already stated that a class loader which has loaded a particular class is invoked to load any dependent classes. We also know that a class loader generally has responsibility for loading classes from one particular source such as Web servers.

What if the class first loaded requires access to a class from the trusted core classes such as java.lang.String? This class needs to be loaded from the local core class package, not from across a network. It would be possible to write code to handle this within the applet class loader but it is unnecessary. We already have a class loader in the shape of the primordial class loader which knows how to load classes from the trusted packages.

This leads us to our second observation about class loaders: they frequently interoperate, one class loader asking another to load a class for it.

To illustrate how this works, consider the PointlessButton applet. As a reminder, PointlessButton uses a second class, JamJar.examples.Button which represents a push button on the browser display. Pushing the button results in nothing happening and a display being updated to inform you how many times nothing has happened to date.

When a Web browser encounters the pointlessButton applet in a Web page the following sequence of events occurs:

The browser finds the <APPLET> tag in the Web page and determines that it needs to load PointlessButton.class from the Web server. It creates an instance of the applet class loader (specific to this Web site) to fetch the class.
The applet class loader first asks the primordial class loader to load PointlessButton.class. The primordial class loader which only knows about the trusted classes fails to locate the class and returns control to the applet class loader.
The applet class loader connects to the Web site using the HTTP and downloads the class.
The JVM begins executing the PointlessButton applet.
PointlessButton needs to create an instance of JamJar.examples.Button, a class which currently has not been loaded. It requests the JVM to load the class.
The JVM locates the applet class loader which loaded PointlessButton and invokes it to load JamJar.examples.Button.
The applet class loader again first asks the primordial class loader to load the JamJar.examples.Button class and again the primordial class loader fails to find it and returns control to the applet class loader which is able to load the class from the Web server.
JamJar.examples.Button creates a java.lang.String object as the title of the button. The String class has not yet been loaded so again the JVM is requested to load the class.
The applet class loader which loaded both PointlessButton and JamJar.examples.Button is now invoked to load the java.lang.String class.
The applet class loader requests the primordial class loader to load the String class. This time, the primordial class loader is able to locate and load the class since it is part of the trusted classes package. Since the primordial class loader was successful, the applet class loader needs look no further and returns.

There are a couple of interesting points to note here.

First, at step 7, if we were using a regular java.awt.Button class then the primordial class loader would have been able to find the class in the trusted packages and the search would have stopped.

Secondly, there are actually many references to the java.lang.String class in the code. However, only the first reference results in the class being loaded from disk. Subsequent requests to the class loader will result in it returning the class already loaded. Since it is the primordial class loader which loads the String class, if there are multiple applets on a single page, only the first one to request a String class will result in the primordial class loader loading the class from disk.

Note also the order in which the applet class loader searches for classes. An applet class loader could always search the Web server from which it loaded the applet first for any subsequent classes and this would cut out some calls to the primordial class loader. This would have been incredibly bad practice for two reasons:

Most of the class load requests for an applet will be for trusted classes from the java.* packages.
More importantly, if classes were sought on the Web server before being sought in the trusted package, it would allow subversion of built-in types, enabling malicious programmers to substitute their own implementations of core, trusted classes such as the SecurityManager or even the applet class loader itself.

For this reason all commercially available browsers have applet class loaders which implement the following search strategy: 3

Ask the primordial class loader to load the class from the trusted packages.
If this fails, request the class from the Web server from which the original class was loaded.
If this fails, report the class as not locatable by throwing a ClassNotFound exception.

This search strategy ensures that classes are loaded from the most trusted source in which they are available.

Why You Might Want to Build Your Own Class Loader

If it is done correctly, a user-built class loader can significantly enhance the security of an application deployed on an intranet, particularly if it is used in conjunction with a firewall and other local security measures.

Note that at the time of writing, Web browsers use the security manager to prohibit the creation of new derivatives of ClassLoader, although this may change with the new Java security model and the various permissions APIs which are being implemented. See Playing in the Sandbox examines the security manager in more detail.

Some of the situations in which a user-written class loader could be used are:

To restrict searches for trusted classes to a particular directory or path other than the CLASSPATH
To allow the JVM to load classes from a particular source such as from EPROM or a non-TCP/IP network
To specify paths which should be searched in advance of the CLASSPATH
To provide auditing information about access to classes

In each of these cases you will need to build your own class loader and implement your own search strategy for locating classes.

It is beyond the scope of this book to show you how to write your own extension to ClassLoader and there are other resources, both books and on-line, which will teach you the specifics. For the serious codeheads out there, there is a sample ClassLoader included on the CD accompanying this book which implements a simple audit trail for class libraries.

The Class File Verifier 4

Once a class has been located and loaded by a class loader (other than the primordial class loader), it still has another hurdle to cross before being available for execution within the JVM. At this point we can be reasonably sure that the class file in question cannot supplant any of the core classes, cannot inveigle its way into the trusted packages and cannot interfere with other safe classes already loaded.

We cannot, however, be sure that the class itself is safe. There is still the safety net of the SecurityManager which will prevent the class from accessing protected resources such as network and local hard disk, but that in itself is not enough. The class might contain illegal bytecode, forge pointers to protected memory, overflow or underflow the program stack, or in some other way corrupt the integrity of the JVM.

As we have said in earlier chapters, a well behaved Java compiler produces well behaved Java classes and we would be quite happy to run these within the JVM since the Java language itself and the compiler enforce a high degree of safety. Unfortunately we cannot guarantee that everyone is using a well behaved Java compiler. Nasty devious hacker types may be using home made compilers to produce code designed to crash the JVM or worse, subvert the security thereof. In fact, as we saw in Chapter 4, we can't even be sure that the source language was Java in the first place!

In addition to this there is the problem of release-to-release binary compatibility. Let's say that you have built an applet which uses a class called TaxCalculator from a third party. You have constructed your applet with great care and have purchased and installed the TaxCalculator class on the server with your applet code.

At this point you are certain that the methods you call in TaxCalculator are present and valid but what happens if/when you upgrade TaxCalculator? Of course you should make sure that the API exposed by TaxCalculator hasn't changed and that your class will still work, but what if you forget? In practice it is quite possible that TaxCalculator has changed between versions and methods or fields which were previously accessible have become inaccessible, been removed or changed type from dynamic to static fields. In this case, when your applet is downloaded to a browser and it tries to make method calls or access fields within TaxCalculator those calls may fail.

This is because the binary (code) compatibility between the classes has been broken between releases. These problems exist with all forms of binary distributable libraries. On most systems this results in at best a system message and the application refusing to run; at worst the entire operating system could crash. The JVM has to perform at least as well as other systems in these circumstances and preferably better.

For all of the above reasons, an extra stage of checking is required before executing Java code and this is where the class file verifier comes in.

After loading an untrusted class via a ClassLoader instance, the class file is handed over to the class file verifier which attempts to ensure that the class is fit to be run. The class file verifier is itself a part of the Java Virtual Machine and as such cannot be removed or overridden without replacing the JVM itself.

The Duties of the Class File Verifier

Before we discuss what the class file actually does we look at the possible ways in which a class file could be "unsafe." By understanding the threat, we can see better how the Java architecture goes about countering it and expose any holes in the security provided by the class file verifier.

The following are some of the things that a class file could do which could compromise the integrity of the JVM:

Forge illegal pointers. If a Java class can obtain a reference to an object of one type and treat it as an object of a different type then it effectively circumvents the access modifiers (private, protected or whatever) on the fields of that object. This type of attack is known as a class confusion attack since it relies on confusing the JVM about the class of an object.
Contain illegal bytecode instructions. The JVM's execution engine is responsible for running the bytecode of a program in the same way as a conventional processor runs machine code.

When a conventional processor encounters an illegal instruction in a program, there is nothing that it can do other than stop execution. You may have seen this in Windows programs where the operating system can at least identify that an illegal instruction has been found and display a message.

Similarly, if the execution engine finds a bytecode instruction that it cannot execute, it is forced to stop executing. In a well written execution engine this would not be good but in a poorly written version it is possible that the entire JVM, or the Web browser in which it is embedded or even the underlying operating system might be halted. This is obviously unacceptable.

Contain illegal parameters for bytecode instructions. Passing too many or too few parameters to a bytecode instruction, or passing parameters of the wrong type, can lead to class confusion or errors in executing the instruction.
Overflow or underflow the program stack. If a class file could underflow the stack (by attempting to pop more values from it than it had placed on it) or overflow the stack (by placing values on it that it did not remove) then it could at best cause the JVM to execute an instruction with illegal parameters or at worst crash the JVM by exhausting its memory.
Perform illegal casting operations. Attempting to convert from one data type to another - for example, from an integer to a floating point or from a String to an Object - is known as casting. Some types of casting can result in a loss of precision (such as converting a floating point number to an integer) or are simply illegal (such as converting a String to a DataInputStream).

The legality of other types of casts is less clear, for example, all Strings are Objects (since the String class is derived from the Object class) but not all Objects are Strings. Trying to cast from an Object to a String is legal only if the Object is originally a String or a String derivative. Allowing illegal casts to be performed will result in class confusion and thus must be prevented.

Attempt to access classes, fields or methods illegally. As discussed above, a class file may attempt to access a nonexistent class. Even if the class does exists, it may attempt to make reference to methods or fields within the class which either do not exist or to which it has no access rights. This may be part of a deliberate hacking attempt or as a result of a break in release-to-release binary compatibility.

By tagging each object with its type, the JVM could check for illegal casts. By checking the size of the stack before and after each method call, stack overflows and underflows can be caught. The JVM could also test the stack before each bytecode was executed and thus avoid illegal or wrongly numbered parameters.

In fact, all of these tests could be made at runtime but the performance impact would be significant. Any work that the class file verifier can do in advance of runtime to reduce the performance burden is welcome. With some idea of the magnitude of the task before the class file verifier, we now look at how it meets this challenge.

The Four Passes of the Class File Verifier

Before we go into any detail on how the class file verifier works it is important to note that the Java specification requires the JVM to behave in a particular way when it encounters certain problems with class files, which is usually to throw an error and refuse to use the class.

The precise implementation varies from one vendor to the next and is not specified. Thus some vendors may make all checks prior to making a class file available; others may defer some or all checks until runtime. The process described below is the way in which Sun's HotJava Web browser works; it has been adopted by most JVM writers, not least because it saves the effort of reinventing a complex process.

The class file verifier makes four passes over the newly loaded class file, each pass examining it in closer detail. Should any of the passes find fault with the code then the class file is rejected. For reasons which we explain below, not all of these tests are performed prior to executing the code. The first three passes are performed prior to execution and only if the code passes the tests here will it be made available for use.

The fourth pass, really a series of ad hoc tests, is performed at execution time, once the code has already started to run.

Pass 1 - File Integrity Check

The first and simplest pass checks the structure of the class file. It ensures that the file has the appropriate signature (first four bytes are 0x CAFEBABE) and that each of the structures within the file is of the appropriate length. It checks that the class file itself is neither too long nor too short and that the constant pool contains only valid entries. Of course class files may have varying lengths but each of the structures (such as the constant pool) has its length included as part of the file specification.

If a file is too long or too short, the class file verifier throws an error and refuses to make the class available for use.

Pass 2 - Class Integrity Check

The second pass performs all other checking which is possible without examining the actual bytecode instructions themselves. Specifically, it ensures that:

The class has a superclass (unless this class is Object).
The superclass is not a final class and that this class does not attempt to override a final method in its superclass.
Constant pool entries are well formed, and that all method and field references have legal names and signatures.

Note that in this pass, no check is made as to whether fields, methods or classes actually exist, merely that their names and signatures are legal according to the language specification.

Pass 3 - Bytecode Integrity Check

This is the pass in which the bytecode verifier runs and is the most complex pass of the class file verifier. The individual bytecodes are examined to determine how the code will actually behave at runtime. This includes data-flow analysis, stack checking and static type checking for method arguments and bytecode operands.

It is the bytecode verifier which is responsible for checking that the bytecodes have the correct number and type of operands, that datatypes are not accessed illegally, that the stack is not over or underflowed and that methods are called with the appropriate parameter types.

The precise details of how the bytecode verifier operates may be found in See The Bytecode Verifier in Detail . For now, it is important to state two points:

First, the bytecode verifier analyzes the code in a class file statically. It attempts to reconstruct the behavior of the code at runtime, but does not actually run the code.

Secondly, some very important work has been done in the past and more recently by one of the authors of this book which demonstrates that it is impossible for static analysis of code to identify all of the problems which may occur at runtime. We include this proof in See An Incompleteness Theorem for Bytecode Verifiers .

To restate this in simple terms, any class file falls into one of three categories:

Runtime behavior is demonstrably safe.
Runtime behavior is demonstrably unsafe.
Runtime behavior is neither demonstrably safe nor demonstrably unsafe.

Clearly the bytecode verifier should accept those class files in the first category and reject those in the second category. The problem arises with class files in the third category.

These class files may or may not contain code which will cause a problem at runtime, but it is impossible from static analysis of the code alone to determine which is the case.

The more complex the bytecode verifier becomes, the more it can reduce the number of cases which fall into the third category but no matter how complex the verifier, it can never completely eliminate the third category and for this reason there will always be bytecode programs which pass verification, but which may contain illegal code.

This means that simply having the bytecode verifier is not enough to prevent runtime errors in the JVM and that the JVM must perform some runtime checking of the executable code.

Lest you be panicking at this stage you should comfort yourself with the thought that the level of verification performed by the JVM prior to executing bytecode is significantly higher than that performed by traditional runtime environments for native code (that is, none at all).

Pass 4 - Runtime Integrity Check

As we have hinted, the JVM must make a tradeoff between security and efficiency. For that reason, the bytecode verifier does not exhaustively check for the existence of fields and classes in pass 3. If it did, then the JVM would need to load all classes required by an applet or application prior to running it. This would result in a very heavy overhead which is not strictly required.

We'll examine the following case with three classes, MyClass, MyOtherClass and MySubclass, which is derived from MyClass. MyOtherClass has two public methods

methodReturningMyClass() which returns an instance of MyClass (huzzah! for meaningful method names!) and
methodReturningSubclassOfMyClass( ) which returns an instance of SubclassOfMyClass.

Against this background, consider the following code snippet.

MyOtherClass x = new MyOtherClass( );

MyClass y = x.methodReturningMyClass( );

In pass 3, the class file verifier has ascertained that the method methodReturningMyClass( ) is listed in the constant pool as a method of MyOtherClass which is public (and therefore reachable from this code).

It also checks that the return type of methodReturningMyOtherClass( ) is MyClass. Having made this check and assuming that the classes and methods in question do exist, the assignment statement in the second line of code is perfectly legal. The bytecode verifier does not in fact need to load and check class MyOtherClass at this point.

Now consider this similar code:

MyOtherClass x = new MyOtherClass( );

MyClass y = x.methodReturningSubclassOfMyClass( );

In this case, the return type of the method call does not return an object of the same class as y, but the assignment is still legal since the method returns a subclass of MyClass. This is not, however, obvious from the code alone: the verifier would need to load the class file for the return type SubclassOfMyClass and check that it is indeed a subclass of MyClass.

Loading this class involves a possible network access and running the class file verifier for the class and it may well be that these lines of code are never executed in the normal course of the program's execution in which case loading and checking the subclass would be a waste of time.

For that reason, class files are only loaded when they are required, that is when a method call is executed or a field in an object of that class is modified. This is determined at runtime and so that is when the fourth pass of the verifier is executed.

Summary

You have now seen the types of checking which take place before a class file from an untrusted source can be loaded and run inside the JVM. While not perfect, this is significantly more checking than is performed on any conventional operating system (that is, none at all).

Once it is running, code from untrusted sources is subject to further checking at the hands of the security manager which we have mentioned briefly here. See Playing in the Sandbox describes how the security manager works and looks at ways in which it is possible to reduce the burden placed on the class loader and class file verifier by extending the range of classes which the JVM regards as trusted.

1. Throughout this chapter we refer to "class loaders" by which we mean the general mechanism by which class files are located and loaded into a JVM and "ClassLoader" by which we mean the specific Java ClassLoader class or classes derived from it.

2. This was the basis of one of the attacks discovered by the Secure Internet Programming team at Princeton University. Their attack, "Slash and Burn", is described more fully in Java Security, Hostile Applets, Holes and Antidotes, Gary McGraw and Ed Felten.

3. This is common practice but note that it is not enforced by the JVM architecture. Class loader writers are at liberty to implement any search strategy they choose for locating classes.

4. Important note : The class file verifier is sometimes referred to as the bytecode verifier, but as we show in this section, running the bytecode verifier is only one part of the class file verification process.