- Special Edition Using Java, 2nd Edition -

Chapter 26

Content Handlers


by David W. Baker

Java's extensibility goes beyond protocol handlers, which enable you to define mechanisms of using new application protocols. Content handlers are Java's way of dealing with various data formats, such as text files, images, and sounds. By creating new content handlers, additional data types can be processed and rendered.

Writing Content Handlers

Documents on the Web are transmitted with a MIME content-type identifier, indicating to the receiving agent how the data is formatted. The client must understand how to decode and render that data. A content handler is a Java class that is called by either a URL or URLConnection object. The content handler obtains an input stream from the calling object and then receives data from that stream. It then processes the data and returns an Object which contains that data.

Java and HotJava provide a core set of content handlers to manage commonly used types. You can write your own handlers to deal with new content types. This empowers you to extend your Java applications or your HotJava browser to understand new document formats.

The process of creating new content handlers is quite similar to creating protocol handlers. If you have read the previous chapter, some of these instructions will seem quite familiar. As an example, this chapter demonstrates a content handler that processes plain text documents, overriding the existing handling.

Because this is the last in a series of chapters on networking, this example provides the somewhat frivolous task of making incoming text files appear as though spoken by a famous bald cartoon character inclined towards hunting rabbits.
 

 

See “Writing a Protocol Handler,” Chapter 25 for more information.
 

Step 1: Decide Upon a Package Name

Like protocol handlers, content handlers must reside within a specific package. This package must end with content.type where type is the MIME type of the data. For instance, the type of text/plain documents is text, while for image/gif it is image. As with the previous chapter, I append ORG.netspace.dwb to indicate the distribution source and author to obtain:

ORG.netspace.dwb.content.text

See “Step One: Decide on a Package Name,” Chapter 25 for more information.
 

Step 2: Create the Directories

The content handler class must be placed into a directory that corresponds to the package name. Such directories usually reside within a directory called classes within your home directory. For Windows NT and Windows 95 users, the following sequence of commands accomplishes this:

If you have previously installed other content handlers, protocol handlers, or personal Java classes, you may have already created some of these directories.
 

cd \users\myid
mkdir classes
mkdir classes\ORG
mkdir classes\ORG\netspace
mkdir classes\ORG\netspace\dwb
mkdir classes\ORG\netspace\dwb\content
mkdir classes\ORG\netspace\dwb\content\text

For UNIX users, the analogous commands would be:

cd ~
mkdir classes
mkdir classes/ORG
mkdir classes/ORG/netspace
mkdir classes/ORG/netspace/dwb
mkdir classes/ORG/netspace/dwb/content
mkdir classes/ORG/netspace/dwb/content/text

Step 3: Set Your CLASSPATH

The CLASSPATH environment variable tells the Java compiler and interpreter where to find Java classes, enabling the dynamic linking feature of the Java execution environment. When installing the JDK, HotJava, or a Java-aware browser, you might have set the CLASSPATH environment variable. If so, it is critical that you avoid overwriting that data. Follow these steps:

Find out what your is CLASSPATH current setting. Under Windows NT/95, just type the following command:

Look for the CLASSPATH value. Under UNIX systems, you can display the CLASSPATH value with this command:

Reset your CLASSPATH, including the previous data, if any. Under Windows 95, if your CLASSPATH was

.;C:\JAVA\LIB\CLASSES.ZIP

you can add the below line to your AUTOEXEC.BAT and reboot:

SET CLASSPATH=.;C:\USERS\MYID\CLASSES;C:\JAVA\LIB\CLASSES.ZIP

Under Windows NT, presuming that the CLASSPATH value was the same as under our Windows 95 example, you would use the System Control Panel to add a CLASSPATH environment variable with the value

.;C:\USERS\MYID\CLASSES;C:\JAVA\LIB\CLASSES.ZIP

Under UNIX, assume that your old CLASSPATH was .:/usr/java/lib. If you are using the C shell, place the following into you CSHRC file:

setenv CLASSPATH .:/home/myid/classes:/usr/java/lib

If you are on a UNIX system using the Korn or a POSIX-compliant shell, add this line to whatever file your ENV environment variable points. If ENV is unset, then you could add this line to your ~/.PROFILE file:

CLASSPATH=.:/home/myid/classes:/usr/java/lib

Step 4: Write the Content Handler

The content handler must be a class that extends java.net.ContentHandler. It must also have the same name as the subtype of the MIME content-type it processes. That is, for image/gif, the class would be called gif, while my example that overrides the normal plain/text handler will be named text.

The class must have a getContent() method that takes a URLConnection as an argument and returns a generic Object. For now, HotJava supports the following returned Object instances:

The code for the example used in this chapter is shown in listing 26.1. This content handler has only one method—getContent(). It obtains an InputStream from the URLConnection object and then enters an infinite loop. Within the loop, it reads the incoming characters and makes a number of substitutions, altering the text to appear as though spoken by our cartoon friend.

The filtered characters are placed into a StringBuffer() object. Once the last character is read, the read() method returns -1, and the content handler breaks from the loop. It closes the InputStream and then returns a String object.

If there is an exception, the method returns a String providing information about the problem.
 

Listing 26.1 plain.java

// This is the package identified for this content handler.
package ORG.netspace.dwb.content.text;
import java.lang.*; // Import the package names used.
import java.net.*;
import java.io.*;
/**
* This is a text/plain content handler which "fuddifies"
* the text it receives.
* @author David W. Baker
* @version 1.1
* @see sun.net.ContentHandler
*/
public class plain extends ContentHandler {
// Stream to receive text/plain file from.
private InputStream input;
// Some standard replacement strings.
private static final String QUIET = "(be vewy quiet, ";
private static final String HEH = ", eheheheh.";
private static final String SCREWY = "? Awe you scwewy?";
private static final String RASCAL = ", you wascal!";
private static final String MISCREANT =
", you miscweant:";
/**
* This method returns an Object containing the
* processed content from the given URLConnection.
* @param contentConn Connection used to obtain the content.
* @return The content.
* @see sun.net.ContentHandler#getContent
*/
public Object getContent(URLConnection contentConn) {
// Create a buffer to store the filtered data.
StringBuffer fuddBuff = new StringBuffer();
int intChar; // A int representation of a char.
char nextChar; // A char.
try {
// Get the input.
input = contentConn.getInputStream();
// Loop infinitely.
filter: while(true) {
// Read in next character.
intChar = input.read();
// Make sure we aren't at the end.
if (intChar == -1) {
break filter; // Break if end.
}
// Convert it to a char.
nextChar = (char)intChar;
// Substitute "(" for QUIET
if (nextChar == '(') fuddBuff.append(QUIET);
// Substitute "W" for "L"
else if (nextChar == 'L') fuddBuff.append('W');
// Substitute "w" for "l"
else if (nextChar == 'l') fuddBuff.append('w');
// Substitute "R" for "W"
else if (nextChar == 'R') fuddBuff.append('W');
// Substitute "r" for "w"
else if (nextChar == 'r') fuddBuff.append('w');
// For periods at the end of the file or periods
// followed by whitspace, substitute HEH.
else if (nextChar == '.') {
intChar = input.read();
if (intChar == -1) {
fuddBuff.append(HEH);
break filter; // Break if end.
}
nextChar = (char)intChar;
if (nextChar == ' ')
fuddBuff.append(HEH + " ");
else fuddBuff.append("." + nextChar);
}
// For ? the end of the file or ?
// followed by whitspace, substitute SCREWY.
else if (nextChar == '?') {
intChar = input.read();
if (intChar == -1) {
fuddBuff.append(SCREWY);
break filter; // Break if end.
}
nextChar = (char)intChar;
if (nextChar == ' ')
fuddBuff.append(SCREWY + " ");
else fuddBuff.append("?" + nextChar);
}
// For ! at the end of the file or !
// followed by whitspace, substitute RASCAL.
else if (nextChar == '!') {
intChar = input.read();
if (intChar == -1) {
fuddBuff.append(RASCAL);
break filter; // Break if end.
}
nextChar = (char)intChar;
if (nextChar == ' ')
fuddBuff.append(RASCAL + " ");
else fuddBuff.append("!" + nextChar);
}
// For : at the end of the file or :
// followed by whitspace, substitute MISCREANT.
else if (nextChar == ':') {
intChar = input.read();
if (intChar == -1) {
fuddBuff.append(MISCREANT);
break filter; // Break if end.
}
nextChar = (char)intChar;
if (nextChar == ' ')
fuddBuff.append(MISCREANT + " ");
else fuddBuff.append(":" + nextChar);
}
else fuddBuff.append(nextChar);
}
input.close();
} catch(IOException excpt) {
return "Unable to load document: "
+ contentConn.getURL();
}
return fuddBuff.toString();
}
}

Step 5: Compile the Source

Use javac to compile the content handler, and leave the compiled class within the directory created in step 2 (for example, CLASSES\ORG\NETSPACE\DWB\CONTENT\TEXT for Windows NT/95 or CLASSES/ORG/NETSPACE/DWB/CONTENT/TEXT for UNIX).

Using Content Handlers with HotJava

As with protocol handlers, HotJava's goal is to eventually support dynamically downloaded content handlers. For now, only manually installed handlers are supported, created as described in the section “Writing Content Handlers.” In addition, at the time of this writing, HotJava only supports content handlers that extend existing MIME types. That is, the example can override the handling of text/plain, but HotJava does not support one that handles a new content-type like text/fuddify.

HotJava also needs to deal with the conflict between MIME type names and Java class names. MIME content-types can, and under certain circumstances, should contain hyphens. However, hyphens are not allowed in Java class identifiers. Because the class of the content handler must be the same as the MIME content subtype, this presents an obvious problem.

The following steps illustrate how to use the new content handler, as created in the previous section, with the HotJava browser.

JavaSoft makes the HotJava browser and instructions for its installation available at http://www.javasoft.com/java.sun.com/HotJava/CurrentRelease/installation.html.
 

Step 1: Disable Special MIME Handling

On certain systems, a file called MAILCAP may have been created to indicate that a special helper application should be used for an incoming MIME type, regardless of which browser is loading the data. If such a file exists, ensure that any line indicating special processing is removed for the content-type you want your handler to process. Thus, remove any entry for text/plain for this example.

Step 2: Update the PROPERTIES File

HotJava stores per-user customizations in a file called PROPERTIES. This file is located within a directory named .HOTJAVA that resides within your home directory. Edit this file to set the java.content.handler.pkgs property. You want to add everything up to the content token in the content handler's package. If this property has not been set, add the following line to use the example handler:

java.content.handler.pkgs=ORG.netspace.dwb.content

If that property has already been set, append a pipe character (|) and ORG.netspace.dwb.content. For example:

java.content.handler.pkgs=COM.company.content|ORG.netspace.dwb.content

Step 3: Run HotJava

Execute HotJava and load up a text file to see the "fuddified" information. Figure 26.1 demonstrates this effect upon the HTML RFC. To view this page yourself, choose File, Open Page, and then enter the following:


FIG. 26.1

When HotJava uses the Fuddify content handler, the HTML spec looks slightly more interesting.

Using Content Handlers with Your Own Applications

Content handlers can be used by your own applications, in addition to their usefulness with HotJava. Content handlers use a concept similar to protocol handlers for registering a new handler, that of a factory. The FetchFuddify application, shown in listing 26.2, demonstrates this functionality.

Listing 26.2 FetchFuddify.java

import java.net.*; // Import package names used.
import java.io.*;
/**
* This is an application which utilizes the new
* text/plain content handler which "fuddifies"
* the text.
* @author David W. Baker
* @version 1.1
*/
public class FetchFuddify {
/**
* This method starts the application.
* @param args The program arguments - should be URL.
*/
public static void main (String args[]) {
// Check the arguments.
if (args.length != 1) {
System.err.println("usage: " +
"java FetchFuddify <url of Fudd document>");
System.exit(1);
}
// Create an instance of FetchFuddify to do its stuff.
FetchFuddify app = new FetchFuddify(args[0]);
}
/**
* This constructor does all of the work of obtaining
* the data with the appropriate content handler and
* sending it to standard output.
* @param url The URL to obtain.
*/
public FetchFuddify(String url) {
URL fuddURL; // URL object to resource.
URLConnection fuddConn; // Connection to resource.
Object fuddObject; // Object returned.
// Register the content handler with our ow
// factory.
URLConnection.setContentHandlerFactory(
new fuddifyCHFactory());
try {
// Create the URL object with the command line
// argument used.
fuddURL = new URL(url);
// Open the connection.
fuddConn = fuddURL.openConnection();
// Get the content.
fuddObject = fuddConn.getContent();
// Convert the content to a String and print it.
System.out.println(fuddObject.toString());
} catch(MalformedURLException excpt) {
System.err.println("Mailformed URL: " + excpt);
} catch(IOException excpt) {
System.err.println("Failed I/O: " + excpt);
}
}
}
/**
* This class implements the ContentHandlerFactory
* interface to register our own content handler.
* @see java.net.ContentHandlerFactory
*/
class fuddifyCHFactory implements ContentHandlerFactory {
/**
* This method returns our own customer content
* handler when given a "text/plain" content type.
* @param contenttype MIME type - should be "text/plain".
* @return The content handler to use.
* @see java.net.ContentHandlerFactory#createContentHandler
*/
public ContentHandler
createContentHandler(String contenttype) {
// Ensure the content type is "text/plain".
if (contenttype.equalsIgnoreCase("text/plain")) {
// Create an instance of our content handler.
return new ORG.netspace.dwb.content.text.plain();
}
// Otherwise, print an error message and return null.
System.err.println("Unknown data type: "
+ contenttype);
return null;
}
}

Start FetchFuddify

The main() method checks to see that the program was invoked with a single argument, which corresponds to the URL of a text file to filter. Then it creates a FetchFuddify object, passing it the String command line argument.

The constructor performs the essential task in using a new content handler: invoking the static method of the URLConnection class, setContentHandlerFactory(). Factories should be a familiar concept, this time allowing the URLConnection class to choose an appropriate content handler. The setContentHandlerFactory takes an object that implements the java.net.ContentHandlerFactory interface. This example's implementation, fuddifyCHFactory, is described in the upcoming section, “The ContentHandlerFactory Implementation.”

The constructor then creates a URL object and opens a connection to the resource. It calls the getContent() method of the URLConnection class, which causes the code of the content handler to be invoked. getContent() returns an Object, which the constructor converts to a String with the toString() method and prints to standard output.

The ContentHandlerFactory Implementation

This interface enables you to register new content handlers with the URLConnection class. A class that implements this interface must have a createContentHandler() method. This method takes a String instance containing the value of the MIME content-type of the resource being accessed. This method returns a ContentHandler object.

The example first checks to see that the contenttype argument is text/plain. It then creates an instance of the content handler and returns it. If the method is called with a contenttype of other than text/plain, it returns null.


Previous Page TOC Next Page

| Previous Chapter | Next Chapter |

|Table of Contents | Book Home Page |

| Que Home Page | Digital Bookshelf | Disclaimer |


To order books from QUE, call us at 800-716-0044 or 317-361-5400.

For comments or technical support for our books and software, select Talk to Us.

© 1996, QUE Corporation, an imprint of Macmillan Publishing USA, a Simon and Schuster Company