Under The Hood

The Java Virtual Machine

This part of the book is aimed primarily at people who wish to understand the inner workings of the Java security model. The level of detail is deliberately high and you should ensure that you are seated comfortably with some soothing music and a scratch pad to hand.

You should probably consult your doctor before attempting to read the whole of Part 2 at once.

Understanding how the various components of the JVM cooperate to provide a secure execution environment for Java code will enable you to implement your own extensions to the JVM in order to provide a more tailored security policy.

The Java Virtual Machine, Close Up

Later chapters in Part 2 examine the various components of the JVM in detail. In this chapter we identify the key components of the JVM and in particular, those which are found in Web browsers.

See Components of the JVM shows a simplified representation of the JVM. Those components which are highlighted are generally only found in Web browsers rather than in the stand-alone implementations required to execute Java applications.

Components of the JVM

As you will see, the additional components are required to provide the additional security needed when loading and executing Java bytecode which has been loaded from an untrusted source such as a Web server.

The Class Loader

Before the JVM can run a Java program, it needs to locate and load the classes which comprise that program into memory. In a traditional execution environment, this service is provided by the operating system which loads code from the filing system in a platform-specific way.

The operating system has access to all of the low level I/O functions and has a set of locations on the filing system which it searches for programs or shared code libraries. On PC and UNIX systems this is some combination of PATH settings which specify a list of directories to be searched for files. In mainframe environments the same function is provided by the LINKLIST.

In the Java runtime environment things are complicated a little by the fact that not all class files are loaded from the same type of location. In general classes can be divided into three categories:

Classes forming the core Java API

These are the classes shipped with the JVM which provide access to network, GUI and threading functions. They are shipped with the JVM implementation and are part of the Java specification. As such they are regarded as highly trusted classes and are not subject to the same degree of scrutiny at runtime as classes brought into the JVM from an external source.

Classes installed in the local filing system

While not a part of the core Java class set, these classes are assumed to be safe since the user has at some point installed them onto his or her machine and presumably accepted the associated risks. In many cases these classes are treated in the same way as those classes in the core Java API.

Classes loaded from other sources

In a Web browser, these would be the classes constituting an applet loaded from a remote Web server. These are the least trusted classes of all as they are being brought into the safe environment of the JVM from potentially hostile sources and often without the specific consent of the user. For this reason, these classes must be subjected to a high degree of checking before being made available for use in the JVM.

Given the diverse range of possible sources for class files and the different checking requirements of the JVM it is clear that different mechanisms will be required to locate and load classes. The class loader comes in various flavors, each responsible for locating and loading one type of class file.

Users may also implement their own class loaders to load classes from particular locations or to impose more rigorous checking of class files loaded from normally trusted sources. This allows you to implement highly flexible security policies.

The Class File Verifier

As mentioned above, some of the class files loaded by the JVM will come from untrusted sources. These files need to be checked prior to execution to ensure that they do not threaten the integrity of the JVM. The class file verifier is invoked by the class loader to perform a series of tests on class files which are regarded as potentially unsafe.

These tests check all aspects of a class file from its size and structure down to its runtime characteristics. Only when these tests have been passed is the file made available for use.

The Heap

The heap is an area of memory used by the JVM to store Java objects during the execution of a program. Precisely how objects are stored on the heap is implementation specific and this adds another level of security since it means that a hacker can have no idea of how the JVM represents objects in memory. This in turn makes it far more difficult to mount an attack that depends on accessing memory directly.

When an object is no longer used, the JVM marks it for garbage collection and at some point the memory on the heap is freed up for reuse.

The Class Area

The class area is where the JVM stores class-specific information such as methods and static fields. When a class is loaded and has been verified, the JVM creates an entry in the class area for that class.

Often the class area is simply a part of the heap. In this case classes may also be garbage-collected once they are no longer used. Alternatively, the class area may be a separate part of memory and will require additional logic on the part of the JVM implementor to clean up classes which are not being used.

When a JIT compiler (see section 3.1.10) is present, the native code generated for class methods is also stored in the class area.

The Native Method Loader

Many of the core Java classes, such as those classes representing GUI elements or networking features, require native-code implementations to access the underlying OS functions. Programmers may also implement their own native methods, assuming of course that they don't want their code to be portable. These native methods are composed of a Java wrapper - which specifies the method signature - and a native-code implementation - often a DLL or shared library.

Core Java classes aren't hindered by the fact that they use native-code; they're part of the JVM implementation for a particular operating system. Applets and applications, on the other hand, are more useful if they are portable, but they are portable only if they eschew native methods.

The native method loader is responsible for locating and loading these shared libraries into the JVM. Note that it is not possible for the JVM to perform any validation or verification of native code and installing such code exposes you to all of the risks associated with running untrusted programs on your machine.

The Native Method Area

Once native code has been loaded, it is stored in the native method area for speedy access when required.

The Security Manager

Even when untrusted code has been verified, it is still subject to runtime restrictions. The security manager is responsible for enforcing these restrictions. In a Web browser, the security manager is provided by the browser manufacturer and is the component of the JVM which prevents applets from reading or writing to the file system, accessing the network in an unsafe way, making inquiries about the runtime environment, printing and so on.

By default, in a stand-alone JVM implementation there is no security manager (since there is no mechanism to load classes from an untrusted source). It is, however, possible for an application writer to implement a security manager to enforce a particular security policy.

The Execution Engine

The execution engine is the heart of the JVM. It is the virtual processor which executes bytecode. Memory management, thread management and calls to native methods are also performed by the execution engine.

The Trusted Classes

The trusted Java classes are those classes which ship as part of the JVM implementation. This includes all classes in packages which start " java." and " sun." as well as any vendor-provided classes used to implement the platform-specific parts of core classes (such as the GUI components).

They are often stored in the filing system (usually in a ZIP archive called classes.zip) but may be held as part of the JVM executable itself.

The Just In Time ( JIT) Compiler

Since Java bytecodes are interpreted at runtime in the execution engine, Java programs generally execute more slowly than the equivalent native platform code. This performance overhead occurs because each bytecode instruction must be translated into one or more native instructions each time it is encountered.

The performance of Java is still significantly better than that of other interpreted languages because the bytecode instructions were designed to be very low level - the simplest instructions have a one-to-one correlation with native machine code instructions.

Nevertheless, Sun saw that there would be a need to improve the execution performance of Java and to do so in a way which did not compromise either the "write once run anywhere" goal and did not undermine the security of the JVM.

Since all bytecode instructions are ultimately translated to native machine code, the principal ways of speeding performance involve making this translation as quick as possible and performing it as few times as possible.

On the other hand, the security and portability of Java is dependent on the bytecode and class file format which enable code to be run on any JVM and to be rigorously tested to ensure that it is safe prior to execution. Thus, any translation must occur after a class file has been loaded and verified.

Two options present themselves:

Translate the whole class file into native code as soon as it is loaded and verified.

Translate the class file on a method-by-method basis as needed.

The first option seems quite attractive but it is possible that many of the methods in a class file will never be executed. Time to translate these methods is therefore wasted. The second option was the one selected by Sun. In this case, the first time a method is called, it is translated into native code, which is then stored in the class area. The class specification is updated so that future calls to the method run the native code rather than the original bytecode.

This meets our requirement that bytecode should be translated as few times as is necessary - once when the code is executed and not at all in the case of code which is not executed.

The process of translating the bytecode to native code on the fly is known as just in time (JIT) compilation and performed by the JIT compiler. Sun provided a specification for how and when JIT compilers should execute and vendors were left to implement their own JIT compilers as they chose.

JIT compiled code executes much more quickly than regular bytecode - between 10 to 50 times faster - without impacting portability or security.

Summary

You now have a good idea of how the various components of the JVM work together. The next four chapters examine the principle elements of the Java security architecture - the class file structure, the class loader, the bytecode verifier and the security manager - in greater detail.