Previous Page TOC Index Next Page

Chapter 31

Extending Java with content and protocol handlers

Java’s URL class gives applets and applications easy access to the World Wide Web using the HTTP protocol. This is fine and dandy if you can get the information you need into a format that a Web server or CGI script can access. However, wouldn’t it be nice if your code could talk directly to the server application without going through an intermediary CGI script or some sort of proxy? Wouldn’t you like your Java-based Web browser to be able to display your wonderful new image format? This is where protocol and content handlers come in.

What Are Protocol and Content Handlers?

Handlers are classes that extend the capabilities of the standard URL class. A protocol handler provides an InputStream (and OutputStream, where appropriate) that retrieves the content of a URL. Content handlers take an InputStream for a given MIME type and convert it into a Java object of the appropriate type.

MIME Types

MIME, or Multipurpose Internet Mail Extensions, is the Internet standard for specifying what type of content a resource contains. As you might have guessed from the name, it originally was proposed for the context of enclosing nontextual components in Internet e-mail. This allows different platforms (PCs, Macintoshes, UNIX workstations, and others) to exchange multimedia content in a common format.

The MIME standard, described in RFC 1521, defines an extra set of headers similar to those on Internet e-mail. The headers describe attributes such as the method of encoding the content and the MIME content type. MIME types are written as type/subtype, where type is a general category such as text or image and subtype is a more specific description of the format such as html or jpeg. For example, when a Web browser contacts an HTTP daemon to retrieve an HTML file, the daemon’s response looks something like the following:

Content-type: text/html

<HEAD><TITLE>Document moved</TITLE></HEAD>

<BODY><H1>Document moved</H1>

The Web browser parses the Content-type: header and sees that the data is text/html—an HTML document. If it was a GIF image the header would have been Content-type: image/gif.

IANA (Internet Assigned Numbers Authority), the group that maintains the lists of assigned protocol numbers and the like, is responsible for registering new content types. A current copy of the official MIME types is available from ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/. This site also has specifications or pointers to specifications for each type.

Getting Java to Load New Handlers

The exact procedure for loading a protocol or content handler depends on the Java implementation. The following instructions are based on Sun’s Developer’s Kit and should work for any implementation derived from Sun’s. If you have problems, check the documentation for your particular Java version.

In the JDK implementation, the URL class and helpers look for classes in the sun.net.www package. Protocol handlers should be in a package called sun.net.www.protocol.ProtocolName, where ProtocolName is the name of the protocol (such as ftp or http). The handler class itself should be named Handler. For example, the full name of the HTTP protocol handler class, provided by Sun with the JDK, is sun.net.www.protocol.http.Handler. In order to get your new protocol handler loaded, you need to construct a directory structure corresponding to the package names and add the directory to your CLASSPATH environment variable. Assume that you have a handler for a protocol—we’ll call it the foo protocol. Your Java library directory is …/java/lib/ (…\java\lib\ on Windows machines). You will need to take the following steps:

If you place the ZIP file of network classes from the CD-ROM in your CLASSPATH, the example handlers should load correctly.

Creating a Protocol Handler

We will start extending Java by using the fingerClient class developed in Chapter 29, “Network Programming,” to implement a handler for the Finger protocol. Our handler will take URLs of the form finger:user@hostname, where user is the user name we want information for (or all users, if omitted) and hostname is the host to query. The urlFetcher applet will be used to demonstrate the protocol handler.

Design

The first decision that needs to be made is how to structure URLs for our protocol. We’ll imitate the HTTP URL and specify that Finger URLs should be of the following form:

finger://host/user

host is the host to contact and user is an optional user to ask for information about. If the user name is omitted, we will return information about all users. This is the same behavior as the fingerClient developed in Chapter 29. The only modification to the fingerClient class that needs to be made is to insert the package statement to put the class in the correct package.

Because we already have the fingerClient, we only need to write subclasses to URLStreamHandler and URLConnection. Our stream handler will use the client object to format the returned information using HTML. The handler will write the content into a StringBuffer, which will be used to create a StringBufferInputStream. The fingerConnection, a subclass of URLConnection, will take this stream and implement the getInputStream( ) and getContent( ) methods.

In our implementation, the protocol handler object does all the work of retrieving the remote content while the connection object simply retrieves the data from the stream provided. Usually, you would have the connection object handler retrieving the content. The openConnection( ) method would open a connection to the remote location, and the getInputStream( ) would return a stream to read the contents.

Copying the fingerClient

The first thing to do is copy the fingerClient source from Chapter 29 into the right subdirectory. The only modification that needs to be made to the class is to add the following statement to the top of the file:

package sun.net.www.protocol.finger;

Everything else in the file remains the same. You just need to recompile the source so that you get a class file with the correct package information.

fingerConnection Source

We’ll go ahead and present the source for the URLConnection subclass. This class should go in the same file as the Handler class. The constructor will copy the InputStream passed and call the URLConnection constructor. It also sets the URLConnection member to indicate that the connection cannot take input. Listing 31.1 contains the source for this class.

class fingerConnection extends URLConnection {

  InputStream in;

  fingerConnection( URL u, InputStream in ) {

    super( u );

    this.in = in;

    this.setDoInput( false );

  }

  public void connect( ) {

    return;

  }

  public InputStream getInputStream( ) throws IOException {

    return in;

  }

  public Object getContent( ) throws IOException {

    String retval;

    int nbytes;

    byte buf[] = new byte[ 1024 ];

    try {

      while( (nbytes = in.read( buf, 0, 1024 )) != -1 ) {

        retval += new String( buf, 0, 0, nbytes );

      }

    } catch( Exception e ) {

      System.err.println(

        “fingerConnection::getContent: Exception\n” + e );

      e.printStackTrace( System.err );

    }

    return retval

  }

}

Handler Source

First we’ll rough out the skeleton of the Handler.java file. We need the package statement so that our classes are compiled into the package where the runtime handler will be looking for them. We also import the fingerClient object here. The outline of the class is shown in Listing 31.2.

package sun.net.www.protocol.finger;

import java.io.*;

import java.net.*;

import sun.net.www.protocol.finger.fingerClient;

// fingerConnection source goes here

public class Handler extends URLStreamHandler {

  // openConnection( ) Method

}

openConnection( ) Method

Now we’ll develop the method responsible for returning an appropriate URLConnection object to retrieve a given URL. Our method starts out by allocating a StringBuffer to hold our return data. We also will parse out the host name and user name from the URL argument. If the host was omitted, we default to localhost. The code for openConnection( ) is given in Listings 31.3 through 31.6.

public synchronized URLConnection openConnection( URL u ) {

  StringBuffer sb = new StringBuffer( );

  String host = u.getHost( );

  String user = u.getFile( ).substring( 1, u.getFile( ).length( ) );

  if( host.equals( “” ) ) {

    host = “localhost”;

  }

Next, the method will write an HTML header into the buffer. This will allow a Java-based Web browser to display the Finger information in a nice-looking format.

sb.append( “<HTML><head>\n”);

sb.append( “<title>Fingering “ );

sb.append( (user.equals(“”) ? “everyone” : user) );

sb.append( “@” + host );

sb.append( “</title></head>\n” );

sb.append( “<body>\n” );

sb.append( “<pre>\n” );

We now will use the fingerClient class to get the information into a String and then append it to our buffer. If there is an error while getting the Finger information, we will put the error message from the exception into the buffer instead.

try {

  String info = null;

  info = (new fingerClient( host, user )).getInfo( );

  sb.append( info )

} catch( Exception e ) {

  sb.append( “Error fingering: “ + e );

}

Finally we’ll close off the open HTML tags and create a fingerConnection object which will be returned to the caller, as follows:

  sb.append( “\n</pre></body>\n</html>\n” );

  return new fingerConnection( u,

    (new StringBufferInputStream( sb.toString( ) ) ) );

}

Using the Handler

Once you have all of the code compiled and in the right locations, load the urlFetcher applet from Chapter 29 and enter a Finger URL. If everything loads right, you should see something that looks like Figure 31.1. If you get an error that says Finger is an unknown protocol, check that you have your CLASSPATH set correctly.

Creating a Content Handler

Figure FIGURE 31.1.

The urlFetcher applet displaying a Finger URL.

This content handler example will be for the MIME type text/tab-separated-values. This type will be familiar if you have ever used a spreadsheet or database program. Many such applications can import and export data in an ASCII text file, with each column of data in a row separated by a Tab character (“\t”). The first line is interpreted as the names of the fields, and the remaining lines are the actual data.

Design

Our first design decision is to figure out what type of Java object or objects to map the tab-separated values. Because this is a textual content, some sort of String object would seem to be the best solution. The spreadsheet characteristics of rows and columns of data can be represented by arrays. Putting these two facts together gives us a data type of String[][], or an array of arrays of String objects. The first array is an array of String[] objects, each representing one row of the data. Each of these arrays consists of a String for each cell of the data.

Also, we’ll need to have some way of breaking the input stream into the separate fields. We will make a subclass of java.io.StreamTokenizer to handle this task. The StreamTokenizer class provides methods for breaking an InputStream into individual tokens. You might want to browse over the entry for StreamTokenizer in the API reference if you are not familiar with it.

Content Handler Skeleton

Content handlers are implemented by subclassing the java.net.ContentHandler class. These subclasses are responsible for implementing a getContent() method. We’ll start with the skeleton of the class. We’ll import the networking and I/O packages, as well as the java.util.Vector class. We also will define the skeleton for our tabStreamTokenizer class. This is shown in Listing 31.7.

/*

 * Handler for text/tab-separated-values MIME type.

 */

// This needs to go in this package for JDK-derived

// Java implementations

package sun.net.www.content.text;

import java.net.*;

import java.io.*;

class tabStreamTokenizer extends StreamTokenizer {

  public static final int TT_TAB = ‘’\t’

  // Constructor

}

import java.util.Vector;

public

  class tab_separated_values extends ContentHandler {

  // getContent method

}

The tabStreamTokenizer Class

First we will define the class that will break the input up into the separate fields. Most of the functionality we need is provided by the StreamTokenizer class, so we only need to define a constructor that will specify the character classes needed to get the behavior we want. For the purposes of this content handler there are three types of tokens: TT_TAB tokens, which will represent fields; TT_EOL tokens, which signal the end of a line (that is, the end of a row of data); and TT_EOF, which signals the end of the input file. Because this class is relatively simple it will be presented in its entirety in Listing 31.8.

class tabStreamTokenizer extends StreamTokenizer {

  public static final int TT_TAB = ‘\t’;

  tabStreamTokenizer( InputStream in ) {

    super( in );

    // Undo parseNumbers( ) and whitespaceChars(0, ‘ ‘)

    ordinaryChars( ‘0’, ‘9’ );

    ordinaryChar( ‘.’ );

    ordinaryChar( ‘-’ );

    ordinaryChars( 0, ‘ ‘ );

    // Everything but TT_EOL and TT_TAB is a word

    wordChars( 0, (‘\t’-1) );

    wordChars( (‘\t’+1), 255 );

    // Make sure TT_TAB and TT_EOL get returned verbatim.

    whitespaceChars( TT_TAB, TT_TAB );

    ordinaryChar( TT_EOL );

  }

}

The getContent Method

Subclasses of ContentHandler need to provide an implementation of getContent( ) that returns a reference to an Object. The method takes as its parameter a URLConnection object from which the class can obtain an InputStream to read the resource’s data.

getContent Skeleton

First, we’ll define the overall structure and method variables. We need a flag, which will be called done, to signal when we’ve read all of the field names from the first line of text. The number of fields (columns) in each row of data will be determined by the number of fields in the first line of text, and will be kept in an int variable numFields. We also will declare another integer, index, for use while inserting the rows of data into a String[].

We will need some method of holding an arbitrary number of objects because we cannot tell the number of data rows in advance. To do this we’ll use the java.util.Vector object, which we’ll call lines, to keep each String[]. Finally, we will declare an instance of our tabStreamTokenizer, using the getInputStream( ) method from the URLConnection passed as an argument to the constructor. Listing 31.9 shows the skeleton code for the method.

public Object getContent( URLConnection con )

  throws IOException

{

  boolean done = false;

  int numFields = 0;

  int index = 0;

  Vector lines = new Vector( );

  tabStreamTokenizer in =

    new tabStreamTokenizer( con.getInputStream( ) );

  // Read in the first line of data (Listing 31.10 & 31.11)

  // Read in the rest of the file (Listing 31.12)

  // Stuff all data into a String[][] (Listing 31.13)

}

Reading the First Line

The first line of the file will tell us the number of fields and the names of the fields in each row for the rest of the file. Because we don’t know beforehand how many fields there are, we’ll be keeping each field in a Vector firstLine. Each TT_WORD token that the tokenizer returns is the name of one field. We know we are done once it returns a TT_EOL token and can set the flag done to true. We will use a switch statement on the ttype member of our tabStreamTokenizer to decide what action to take. This is done in the code in Listing 31.10.

Vector firstLine = new Vector( );

while( !done && in.nextToken( ) != in.TT_EOF  ) {

  switch( in.ttype ) {

  case in.TT_WORD:

    firstLine.addElement( new String( in.sval ) );

    numFields++;

    break;

  case in.TT_EOL:

    done = true;

    break;

  }

}

Now that we have the first line in memory, we need to build an array of String objects from those stored in the Vector. To accomplish this we’ll first allocate the array to the size just determined. Then we will use the copyInto( ) method to transfer the strings into the array just allocated. Finally, the array will be inserted into lines. (See Listing 31.11.)

// Copy first line into array

  String curLine[] = new String[ numFields ];

  firstLine.copyInto( curLine );

  lines.addElement( curLine );

Read the Rest of the File

Before reading the remaining data, we need to allocate a new array to hold the next row. Then we loop until encountering the end of the file, signified by TT_EOF. Each time we retrieve a TT_WORD, we will insert the String into curLine and increment index.

The end of the line will let us know when a row of data is done, at which time we will copy the current line into our Vector. Then we will allocate a new String[] to hold the next line and set index back to zero (to insert the next item starting at the first element of the array). The code to implement this is given in Listing 31.12.

curLine = new String[ numFields ];

while( in.nextToken( ) != in.TT_EOF ) {

  switch( in.ttype ) {

  case in.TT_WORD:

    curLine[ index++ ] = new String( in.sval );

    break;

  case in.TT_EOL:

    lines.addElement( curLine );

    curLine = new String[ numFields ];

    index = 0;

    break;

  }

}

Stuff All Data Into String[][]

At this point all of the data has been read in. All that remains is to copy the data from lines into an array of arrays of String, as follows in Listing 31.13.

String retval[][] = new String[ lines.size( ) ][];

lines.copyInto( retval );

return retval;

Using the Content Handler

In order to show how the content handler works, we’ll be modifying the urlFetcher applet from Chapter 29. We’ll be changing it to use the getContent( ) method to retrieve the contents of a resource rather than reading the data from the stream returned by getInputStream( ). Because we’re only changing the doFetch() method, we won’t include the entire applet again, only the portions that change. The first change is to call the getContent( ) method and get an Object back rather than getting an InputStream. Listing 31.14 shows this change.

try {

  boolean displayed = false;

  URLConnection con = target.openConnection( );

  Object obj = con.getContent( );

Next come tests using the instanceof operator. We handle String objects and arrays of String objects by placing the text into the TextArea. Arrays are printed item by item. If the object is a subclass of InputStream, we read the data from the stream and display it. Image content will just be noted as being an Image. For any other content type, we simply throw our hands up and remark that we cannot display the content (because we’re not a full-fledged Web browser). The code to do this is shown below in Listing 31.15.

  if( obj instanceof String ) {

    contentArea.setText( (String) obj );

    displayed = true;

  }

  if( obj instanceof String[] ) {

    String array[] = (String []) obj;

    StringBuffer buf = new StringBuffer( );

    for( int i = 0; i < array.length; i++ )

      buf.append( “item “ + i + “: “ + array[i] + “\n” );

    contentArea.setText( buf.toString( ) );

    displayed = true;

  }

  if( obj instanceof String[][] ) {

    String array[][] = (String [][]) obj;

    StringBuffer buf = new StringBuffer( );

    for( int i = 0; i < array.length; i++ ) {

      buf.append( “Row “ + i + “:\n\t” );

      for( int j = 0; j < array[i].length; j++ )

        buf.append( “item “ + j + “: “

                   + array[i][j] + “\t” );

      buf.append( “\n” );

    }

    contentArea.setText( buf.toString( ) );

    displayed = true;

  }

  if( obj instanceof Image ) {

    contentArea.setText( “Image” );

    diplayed = true;

  }

  if( obj instanceof InputStream ) {

    int c;

    StringBuffer buf = new StringBuffer( );

    while( (c = ((InputStream) obj).read( )) != -1 )

      buf.append( (char) c );

    contentArea.setText( buf.toString( ) );

    displayed = true;

  }

  if( !displayed ) {

    contentArea.setText( “Don’t know how to display “

      obj.getClass().getName( ) );

  }

  // Same code to display content type and length

} catch( IOException e ) {

  showStatus( “Error fetching \”” + target + “\”: “ + e );

  return;

}

The complete modified applet source is on the CD as urlFetcher_Mod.java in the tsvContentHandler directory. Figure 31.2 illustrates what it will show when displaying text/tab-separated-values.

Figure FIGURE 31.2.

The urlFetcher_Mod applet.

The file displayed is included as example.tsv. Most HTTP daemons should return the correct content type for files ending in .tsv. If the data does not show up as text/tab-separated-values, you might need to try one of the following things:

Summary

After reading this chapter you should have an understanding of how Java can be extended fairly easily to deal with new application protocols and data formats. You should know what classes you have to derive your handlers from (URLConnection and URLStreamHandler for protocol handlers, ContentHandler for content handlers), and how to get Java to load the new handler classes.

If you want to try your hand at writing a handler, try something simple at first. For a protocol handler you could try the echo protocol shown in Chapter 28. A more challenging task might be writing a content handler for application/postscript which prints the file out (this would more than likely need some native code, or some way to use the java.lang.Runtime.exec( ) method to call a local printing program).


Previous Page TOC Index Next Page