Notice: This material is excerpted from Special Edition Using HTML, 2nd Edition, ISBN: 0-7897-0758-6. This material has not yet been through the final proof reading stage that it will pass through before being published in printed form. Some errors may exist here that will be corrected before the book is published. This material is provided "as is" without any warranty of any kind.
In previous chapters you have learned how to mark up content for your Web site using the HTML standard. Now, we will begin our exploration of the CGI (Common Gateway Interface), which will greatly enhance the level of interactivity on your site. With the use of CGI scripts, you can make your Web presentations more responsive to your users' needs by allowing them to have a more powerful means of interaction with your material.
In this chapter, you will learn:
Here is the answer to the hundred dollar question. What is the CGI anyway? Well, in order to answer that, you are going to need a little background information first.
Each time you sit down in your favorite chair (I hope it is anyway) and start surfing the WWW, you are a client from the Internet's point of view. Each time you click on a link to request a new Web document, you are sending a request to the document's server. The server then receives the request, gets the document, and sends it back to your browser for you to view.
The client/server
relationship that is set up between your browser and a
Web
server works very well for serving up
HTML
and image files from the server's
Web
directories. Unfortunately, there is a large flaw with this simple system.
The
Web
server is still not equipped to handle information from your favorite
database
program or from other applications that require more work than simply transmitting
a
static
document.
One option the designers
of the first Web server could have chosen was to build in an interface
for each external application from which a client may want to get information.
It is hard to imagine trying to program a server to interact with every
known application and then trying to keep the server current on each new
application as it is developed. Needless to say, it would be impossible.
So they developed a better way.
These wizened
developers anticipated this problem and solved it by designing the
Common
Gateway Interface or CGI. This gateway provides a common environment and
a set of protocols for external applications to use while interfacing with
the
Web
server. Thus, any application engineer (including yourself) can use the
CGI to allow an application to interface with the server. This extends
the range of functions the Web server has to include the features provided
by a potentially limitless number of external applications.
Now that you have read a little background, you should have a basic idea of what the CGI is, and why it is needed. The next step in furthering your understanding of the CGI is to learn the basics of how it works. To help you achieve this goal, I will break down this material into the following sections:
The CGI is the common gateway or door that is used by the server to interface -- or communicate -- with applications other than the browser. Thus, CGI scripts act as a link between whatever application is needed and the server while the server is responsible for receiving information from, and sending data back to, the browser.
As a technical note, you should be aware that some people like to use the term program to refer to longer, usually compiled, code and applications written in languages like C and C++. When this is the case, the term script is then used to indicate shorter, noncompiled code written with languages like SH and PERL. However, for the purpose of this and the following chapter, the terms program and script will be used interchangeably as the divisions between them are being rapidly broken down.
For example, when you enter a search request at your favorite search engine, a request is made by the browser to the server to execute a CGI script. At this time, the browser passes the information that was contained in the online form plus the current environment to the server. From here, the server passes the information to the script. This script provides an interface with the database archive and finds the information that you have requested. Once this information is retrieved, the script sends it to the server which feeds it back to your browser as a list of matches to your query.
There is a very nice online description of the CGI at The Common Gateway Interface:
URL address: http://hoohoo.ncsa.uiuc.edu/cgi/
Another way of looking at the CGI is to see it as a socket that attaches an extra arm on your server. This new arm, the CGI script, adds new features and abilities to the server that it was previously lacking.
The most common use for these new features is to give the server the
ability to dynamically respond to the client. One of the most often seen
examples of this is allowing the client to send a search query to a CGI
script which then queries a database and returns a list of matching topics
from the database. Besides information retrieval, another common theme
for using
CGI
scripts is to customize the user interface on the
Web
site. This commonly takes the form of counters and animations.
If you see bin or cgi-bin in the path-names of images or links, it is a good indication that the given effect was produced by a CGI script.
These and some of the other common uses for CGI
scripts will be discussed in more detail later in this chapter, so stay
tuned.
It won't be long into your CGI programming career when you will want to write a script that sends information to the server for it to process. Each file that is sent to the server must contain an output header. This header contains the information the server and other applications need to transmit and handle the file properly.
The use of output headers in CGI
scripts is an expansion of a system of protocols called MIME (Multipurpose
Internet Mail Extensions). Its use for
e-mail
began in 1992 when the
Network
Working Group published RFC (Request For Comments) 1341, which defined
this new type of
e-mail
system. This system greatly expanded the ability of
Internet
e-mail to send and receive various non-text file formats.
Since the release of RFC 1341, a series of improvements has been made to the MIME conventions. You can find some additional information about this by looking at RFC 1521 and RFC 1522. A list of all the
RFC documents can be found online at http://ds0.internic.net/rfc/. These documents contain a lot of useful information published by the Network Working Group relating to the function and structure of the Internet backbone.
Each time you, as a client, send a request to the server, it is sent in the form of a MIME message with a specially formatted header. Most of the information in the header is part of the client's protocol for interfacing with the browser. This includes the request method, a URI (Universal Resource Identifier), the protocol version, and then a MIME message. The server then responds to this request with its own message which usually includes the server's protocol version, a status code, and a different MIME message.
The bulk of this client/server
communication process is handled automatically by the
WWW
client application -- usually your Web browser -- and the server. This
makes it easier for everyone, since you don't have to know how to format
each message in order to access the server and get information. You just
need a WWW client. However, to write your own CGI scripts, you will need
to know how to format the Content-type line of the MIME header in order
for the server to know what type of document your script is sending. Also,
you will need to know how to access the server's environment variables
so you can use that information in your CGI scripts. In the following sections,
you will learn everything necessary to accomplish both of these tasks.
If you decide to write your own WWW client, then you will need to understand the client/server communication process before you can begin. A good place to start your search for more information about this is the
W3C Reference Library at http://www.w3.org/hypertext/WWW/Library/.
Each document that is sent via a CGI
script to the server, whether it was created "on-the-fly" or
is simply being opened by the script, must contain a
Content-type
output header as the first part of the document so the server can process
it accordingly. In table 23.1 you will see examples of some of the more
commonly used MIME Content-types and their associated extensions.
Table 23.1 Examples of MIME types and Extensions
Content-type: | Extensions |
---|---|
application/octet-stream | bin exe |
application/postscript | ai eps ps |
application/pdf | |
application/x-csh | csh |
application/x-sh | sh |
application/x-wais-source | src |
application/x-gtar | gtar |
application/x-gzip | gz |
application/x-tar | tar |
application/zip | zip |
audio/x-wav | wav |
image/gif | gif |
image/jpeg | jpeg jpg jpe |
text/HTML | HTML htm |
text/plain | txt |
text/richtext | rtx |
video/mpeg | mpeg mpg mpe |
video/quicktime | qt mov |
video/x-msvideo | avi |
video/x-sgi-movie | movie |
x-world/x-vrml | wrl |
To help you better understand how to properly use Content-types within a CGI script, let's work through an example. Suppose you have decided to write a CGI script that will display a GIF each time it is executed by a browser.
The first line of code you need is a special comment that contains the
path to the scripting
language that you are using to write the program. In this case it is PERL
4. The comment symbol "#" must be followed by an exclamation
point "!" then the path. This special combination of "#!"
on the first line of the file is the standard format for letting the server
know which interpreter to use to execute the script. The reason that this
special comment is used is that while UNIX servers use this line of code
to locate the script's interpreter, other types of server systems have
alternate methods of specifying the
interpreter's
location. However, since this line of code starts with a "#"
symbol, it is still a valid
PERL
comment and does not cause problems on non
UNIX
servers.
You should double check to make sure you include the correct path-name to your language's interpreter.
#!/usr/local/bin/perl
The next line you will need simply sets the variable "$gif" to the full path name of the image you wish to display.
$gif = "/file/path/your.gif";
Now it is time to let the server know that it will be receiving an image file from this script to display on the client's browser. This is done using the MIME Content-type line. The print statement prints the information between the quotation marks to the server. Each set of "\n" characters that you see on this line adds a carriage return with a line feed. This gives you the required blank line that must occur after the Content-Type information. A blank line lets the server know where the MIME header stops and where the body of information, in this case the gif, starts.
print "Content-type: image/gif\n\n";
The next line creates a file handle named IMAGE that forms a link from this script to the file contained in the variable "$gif" which we set earlier.
open(IMAGE,$gif);
Now, we create a loop that sends the entire contents of the gif to the server as the body of the MIME message we began with the Content-type line.
while(<IMAGE>) { print $_; }
To avoid being sloppy, we will close the file handle to the gif now that we are done sending the image.
close(IMAGE);
Finally, we let the PERL interpreter know that the CGI script is finished running and can be stopped.
exit;
This type of script can be modified into something a little more useful. For example, you could turn it into a random image viewer. Each time someone clicks on the link to the script, it executes and feeds a random gif to the client's browser.
Hopefully, you now have a little better understanding of what is involved
as the client and server communicate with each other. Along with the information
that I discussed earlier, a host of environment variables are sent during
the client/server
communications. Although each server can have its own set of environment
variables, for the most part, they are all subsets of a large set of standard
variables described by the
Internet
community to help promote uniform standards.
If you have bin access on a UNIX server, then you can use the following script to easily determine which environment variables your server supports. In addition, this script should also work on other server types such as Microsoft Windows NT server if you properly configure the server to recognize and execute PERL scripts.
Once again, this is the magic
line that lets the server know which type of
CGI
script this is so it can launch the appropriate interpreter.
#!/usr/local/bin/perl
This next line, as was described above, is the MIME
output header that lets the server know to expect an
HTML
document to follow.
print "Content-type: text/html\n\n";
Now that the server is expecting to receive an HTML
document, we will send it a list of each environment variable's name and
current value by using a "foreach" loop.
foreach $key (keys(%ENV)){ print "\$ENV{$key} = \"$ENV{$key}\"<br>\n"; }
Finally, we need to tell the interpreter that the script is finished.
exit;
Using the CGI script environment.pl
from a browser will generate a screen similar to this one.
If the browser you use doesn't support an environment variable, the value of the variable is set to null and is left empty.
As you can see from the example, most of the variables contain protocol version information, and location information such as the client's IP address and the server's domain. However, if you are creative, you can put some of these variables to good use in your CGI scripts.
The best example I have seen so far is the use of the environment variable "HTTP_USER_AGENT". This contains the name and version number of the client application, which is usually a Web browser. As you can see from figure 23.1, the Netscape 2.0 browser that I used when running this script has a HTTP_USER_AGENT value of Mozilla/2.0 (Win95; I).
Once you know what the values are for various browsers, it is possible
to write a CGI
script to serve different
Web
documents based on browser type. Thus, a text-only browser might receive
a text version of your
Web
page, while image-capable browsers will receive the full version.
Web sites are interactive by their very nature. Every time you click on a hyper link, you are actively involved in the site, rather than passively reading information. Most users enjoy this added level of interactivity and the feeling of participation it brings. However, hyper links are just the beginning. With CGI scripts, you have access to a whole new set of tools to make your Web site more interactive and dynamic.
The list of uses for CGI scripts is always growing. Here are but a few of the more common ones.
As you can see, you probably have already interacted with many CGI scripts, possibly without even realizing it.
Processing the information entered into a form is by far the most common
use of CGI
scripts. These scripts are activated when you press the submit/send button
on the form, that is usually found near the bottom. Once the script is
executed the server sends the script the information that was entered.
Then, the script processes this information and, if appropriate, sends
some information back to the browser via the server. This information is
then displayed on your monitor.
If you execute a script that sends nothing back to the browser, let it know this by using the following line in place of the Content-type line with a blank line.
Status: 204 No response
You can take a look at the following URL to see an example of a simple
form on the Web for adding a response to a guestbook.
URL Address: http://www.missouri.edu/~bchemkm/guestbook.htm
If you use the browser's "View Source" command (with Netscape, pull down the View menu and select the View Source option), you should be able to find a line in the HTML document that looks something like this.
<FORM ACTION="http://
absolute_path_name/CGI-bin/scriptname.type" METHOD="POST or GET">
![]()
The "
ACTION" tag tells the browser which script to execute each time the information from the form is sent to the Web server. By using the absolute pathname for the script, you provide a means for the Web server to find the desired script. It is important to remember that you should always use the absolute pathname when indicating the location of scripts on a server.
The "
METHOD" tag lets the script know what format the form's information is sent in (either GET or POST). This allows the script to process the form's data correctly. For more information on the METHOD tag, you can look in chapter 21 on forms.
See "Form Layout and Design"
Notice that you can create a nice looking form by inserting the form fields within table tags.
Fig. 23.3
Here is a sample of the source code that is used to produce the table in figure 23.2.
You can use borderless tables, as with this response page, to nicely layout the script's output.
The script that processes this form has several common features that you can find in other forms as you explore the Web.
CGI
scripts are also commonly used to collect survey information, or update
the contents of a database. Later, in Chapter 24,
you will learn exactly how each of these features works as you learn to
write your own guestbook script, much like this one.
CGI
scripts are commonly used, as is discussed in detail in Chapter
12, for running image maps. Each time you use one of these clickable
images, you are executing a
CGI
script that comes packaged with the
Web
server. This script compares the coordinates of your "click"
with those in the image map's configuration file to determine which URL
to send to the server. The server then transmits the information to the
browser.
See "Imagemaps: From Browser to Server and Back"
Think back to when you were a kid in grade school. Do you remember drawing
stick men, one on a page, and then flipping the pages quickly to animate
it, (instead of listening to what the teacher was saying)? Well, this same
kind of sequential image animation is done on Web
sites using a simple
CGI
script.
At http://www.missouri.edu/~bchemkm/guestbook.htm
you will find an example I created to demonstrate what this type of animation
looks like. Each image is one in a series of 10 gifs
from the well known
Duke
JAVA animation. This sequence is repeated so that the actual animation
plays several times.
The Duke animation that is described above was originally designed by Sun MicroSystems for use with their JAVA animation applet called ImageLoop. You can see their original version of this animation at http://java.sun.com/applets/applets/ImageLoop/index.html if you have a browser that supports Hot Java such as a version of Netscape 2.0.
By using JAVA to perform the animation instead of a different CGI language, they are able to add several key features. First, the
JAVA applet downloads onto the client's system and runs using that system's resources. This removes some of the processing overhead from the remote server. Also, since the animation applet runs locally, there is no delay in the animation while each image is downloaded to the client's system. Thus, the animation is a lot smoother.
To give you a better feel for how an animation script works, you will need to have a basic understanding of the concept of a boundary. When the script runs, it happily creates the HTML document until it comes to the boundary -- another way of saying an artificial divider. Then, the script inserts the graphic for the first animation. Once the first image is accounted for, the script generates the rest of the HTML document. However, the script remembers where the boundary is in the document and overlays each new image on top of the previous one, creating the animation. This is done using the MIME Content-type for multi-part documents.
Would you like to have this type of simple CGI animation on your own Web site? If so, all you need to do is keep reading. I have provided a very simple PERL animation script to produce these for your own pages in the next chapter. Along with this script is a more detailed discussion of how animation scripts work.
See "Sample of using CGI for Animation"
Another nifty trick using simple CGI
scripts is to generate customized
HTML
pages. These pages produced "on the fly" by the script can include
such things as the current time and date, the name and version of the user's
browser or even the user's name.
You can use a simple SH shell script, for example, to generate a little clock (with the date) and indicate which browser the client is using to view your site. To make everything look better, the output can be displayed using table formatting.
See "HTML Table 101" p.[Ch. 13]
Now, I will walk you through this short SH CGI script.
The first line of code is the special comment line that lets the server
know what language interpreter to use as it tries to execute the script.
In this case, it is the SH
shell scripting language usually located in the bin directory on the server.
#!/bin/sh
The SH command "cat << top" appears in the next
line. The cat
(which stands for concatenation) command tells the server to echo or print
to the browser everything between two identical parameters. In this case
"top" is used.
cat << top
Now, we tell the server what type of document it is receiving so that it can notify the browser. This is done using an output header with the appropriate MIME Content-type output header discussed earlier in this chapter.
Content-type: text/HTML
As a reminder, you must leave at least one blank line below the Content-type line for the command to work properly. Basically, the blank line lets the server know that the header information is finished and that the rest of the information is the message body.
These are standard HTML structural tags.
<HTML> <HEAD>
The next line is a META tag. As you learned in chapter 5, this tag can be used to reload a page after an indicated amount of time, in this case one minute. Thus, after each minute elapses, the script is executed again and the page is rebuilt on the fly. This way, the clock maintains the current time.
If the browser you use does not support META tags, then you will need to reload the page each time you wish to update the time.
<META HTTP-EQUIV="refresh" CONTENT="60"; URL=http://www.missouri.edu/bchemkm-bin/timescript.sh"> Some more vanilla HTML. <TITLE>Sample Time Script</TITLE> </HEAD> <BODY TEXT="#000000" BGCOLOR="#FFFFFF"> <HR><P> <CENTER> <TABLE BORDER=5 CELLSPACING=10 CELLPADDING=2> <TR> <TD> top
Here, we execute the built in UNIX command "date" and pass it several formatting options. The "+" command is used to send formatting information to the date command. The "%" symbol followed by a character represents a format code to tell the date command what to include in the output.
/bin/date "+ %I:%M %p %Z"
You can get a full list of formatting switches for the date command using the UNIX command "man". This will display the manual pages for the requested command. For the date command just type the following on a UNIX command line.
$ man date
The echo
commands used here print the information contained within the quotation
marks to the browser. Also, we see another use of the "date"
command with a different formatted request.
echo "<BR></TD>" echo "<TD>" /bin/date "+%A %B %d, %Y" echo "<BR></TD>" echo "</TR><TR>" echo "<TD COLSPAN=2>"
Now, here is an example of incorporating an environment variable to tell the client which browser he is using to view your page.
echo $HTTP_USER_AGENT
Now that you have created the clock and let the user know which browser she is using, it is time to finish off the HTML page. This is done with the "cat" command again, sandwiching the desired HTML between two identical parameters, this time "bottom".
cat << bottom <BR></TD> </TR> </TABLE> </CENTER><P> <HR> <P>The rest of your page's content goes here.<P> <HR> </BODY> </HTML> bottom
If you have copied everything correctly, and are using a browser that supports META tags, you should see something that looks like figure 23.5.
This is an example of a simple clock produced by using a CGI
script.
If you surf the Web much, you have probably seen several pages that
tell you what number visitor you are to the site. The way these sites keep
track of the number of visitors is by using a counter. This is a CGI
script that increments an internal counter each time the page is requested
by the server and then displays the appropriate series of
graphics
to indicate the current "count".
If you would like to have a counter on your Web site, there are several
ways you can go about setting one up. If you have root access to your server,
you can install a counter that is accessible by any user on the server.
With this option, you will use fewer system resources than if everyone
on the system has his own counter script. A nice choice for this type of
script is WWW
Homepage Access Counter [Counter Release 2.2] which can be found at http://www.semcor.com/~muquit/Count.html.
If you have a working CGI-bin directory, there are several counter scripts
you can install for your use. By placing the script in your bin directory,
you will be the only user on the system who will have access to it, but
if you don't have root access on the server, then this is your best bet.
One such script is HTML
Access Counter - Counter 4.0 located at http://www.webtools.org/counter/.
Unfortunately, your site may be hosted on a server that is not configured
for CGI
use. If you find yourself in this situation, you can still have an access
counter, but you will need to use one that is hosted by a remote site.
Each time someone visits your site, a
CGI
script is executed on the remote server that exports the count information
back to the client's browser. One of the most popular hosted access counters
for
Web
sites is The
Web
Counter at http://www.digits.com.
There is a lot of information available about access counters on the Internet already. The FAQ - How do I set up an HTML Counter at http://pantheon.cis.yale.edu/~nakamura/counterfaq.html is an excellent source for further information. Also, if you are running a WinNT server, you can take a look at ED Counters, counters... at http://charon.assert.ee/counters.htm. If you're operating a Mac server then you can try Simple Counters at http://cy-mac.welc.cam.ac.uk/CGI-simplecounter.html for more information.
Once you have your counter set up on your site, you should take a look at Counter Digits at http://www.issi.com/people/russ/digits/digits.html. Here you will find a nice collection of images for use with counter scripts.
Two of my favorite image sets from Digit Mania's counter archive.
A common stopping point on the Web is the search engine. These massive
information repositories are easily searched thanks to CGI
scripts that allow you to interface with them.
Some of the most well known search engines include:
For example, if you enter "search engine" into the Lycos search engine, as in Fig. 23.7, you should get back a list of hits. Each hit in the list is formatted as in Fig. 23.8.
The Lycos search engine's front page.
The first match of the search query "search engine".
Some of the more advanced search engines, like Lycos, will allow you to use the logical operators "and", and "or" to help widen or narrow your search. You can even control the amount of information listed for each site in the search results and the number of matches that are returned.
If one search engine fails to meet you needs, try another. No one search engine can keep a complete list of all
web sites.
If your site has a large amount of information to present, then you might want to look into getting your own search engine. This allows people using your site to quickly and efficiently locate the information they need. If you feel that a search engine is what your site needs to improve its presentation of information, then you should consider the following options:
If these search engines are not enough to satisfy your site's information
distribution
needs, you might want to consider implementing a version of WAIS (Wide
Area Information Server, pronounced "ways") like freeWAIS on
your site. One of the best features of this system is that it catalogues
many more types of information than the standard
HTML
documents that are collected by the web wanderers for use with the standard
search engines. A
WAIS
server keeps track of
gifs
and other image documents as well as several types of audio and video files.
If you have a lot of information in formats other than HTML, then this
is a great means of allowing clients to search your site for the information
they need.
The WAIS
server was originally designed to allow multi-national corporations and
other organizations the ability to search their internal databases. Each
WAIS
server forwards incoming queries to the next server on a list. As the request
passes along the chain of servers the amount of collected information grows
until all the server locations are searched and one large summary document
is sent back to the client.
Recently, the WAIS server has been successfully put to use on stand alone systems. So, you shouldn't feel the need to have multiple server and database locations before you start considering a WAIS server as a means of allowing clients quick and easy access to your site's information.
If you are interested in having these search capabilities on your site,
consider getting a current version of freeWAIS (a version of WAIS in the
public domain). For more information, you can consult the online FAQ at
http://www.cis.ohio-state.edu/hypertext/faq/usenet/wais-faq/freeWAIS-sf/faq.html.
Also, you should definitely take a look at the information on the WAIS
homepage at http://kaos.erin.gov.au/technical/retrieval/wais/wais.html.
Finally, if you would rather have a proprietary version of WAIS software,
you should visit WAIS Inc.'s homepage at http://www.wais.com/
for more information. WAIS
Inc. is now a part of
AOL
Productions, Inc.
As you have seen earlier, search engines are used to search vast archives
of information on the Web. But how does all that information get compiled?
The answer is with CGI
scripts called Web wanderers,
Web
robots, spiders, or
webcrawlers.
These robots are constantly moving from server to server, site to site,
methodically searching for links and pages to process.
You can think of a robot as an automated Web browser. In fact, these programs use the same protocols to access servers and retrieve Web documents that browsers do. They just do it much faster. Each time a robot moves to a new server, it proceeds to systematically archive each Web document's title and URL directory by directory. It may even note the outgoing links and use them to hunt down the next server to visit.
These programs are usually written for one of three major purposes.
The most obvious one is to attempt to maintain a single archive that contains
information on every document on the Web.
However, it is currently taking the fastest robots more than half a year
to travel the entire Web. So, it appears that a complete, up-to-date archive
of
Web
documents will become increasingly difficult to maintain. For this reason
most newer robots are only looking for information on a specific topic.
This helps these archives stay more current than the larger global search
sites. Finally, some robots are built to synchronize mirrored sites.
For a well kept listing of all the currently known (more than 50) robots on the Internet and a nice starting point for finding more information, see Martijn Koster's site on web wanderers at:
Hopefully, you now have a good idea of some of the more common uses for CGI scripts. As you can see, many of them provide helpful tools that you can incorporate into your personal Web site. If you would like to use some of these tools to make your site more dynamic, then you will need to consider a few things before you start.
Before you can get started writing your own CGI scripts, you need to find out if your server is specially configured to allow you to use them. The best thing to do is contact your system administrator and find out if you are allowed to run CGI scripts on the server. If you can, you also need to ask what you need to do to use them, and where you should put the scripts once they are written.
In some cases, system administrators do not allow clients to use CGI
scripts because they feel they can not afford the added
security
risks. In that case, you will have to find another means of making your
site more interactive.
If you find that you can use CGI scripts and are using a UNIX server,
then you will probably have to put your scripts into a specially configured
directory which is usually called cgibin or cgi-bin. If you are using Microsoft's
Internet Server, then you will probably put your CGI programs in a directory
called scripts. This allows the system administrator to configure
the server to recognize that the files placed in that directory are executable.
If you are using a NCSA
version of HTTPD on a
UNIX
system then this is done by adding a
ScriptAlias
line to the
conf/srm.conf
file on the server.
It is important to remember that although CGI scripts are not necessarily complex, you need to have some basic understanding of the programming language you wish to use and the server you plan to run the scripts on. Poorly written scripts can easily become more trouble than they are worth. For example, you could delete entire directories of information or shut down your server if your script were to start forking off new processes in a geometric fashion.
Before starting down the road to becoming a CGI scripter, you should do the following:
Now that you know what a CGI script is, how it works, and what it can do, the next thing you need to consider is which language you should use. You can write a CGI script in almost any language. So, if you can program in a language already, there is a good chance you can use it to write your scripts. This is usually the best way to start learning how to write CGI scripts, since you are already familiar with the basic syntax of the language. However, you still need to know which languages your Web server is configured to support.
UNIX
based NCSA and
CERN
Web servers are by far the most common. These platforms are easily configured
to support most of the major
scripting
languages including C, C++, JAVA, PERL, and the basic shell scripting languages
like SH. On the other hand, if your
Web
server is using the Mac server then you might be limited to using AppleScript
as your scripting language. Likewise, if you are using Windows NT server,
then you might need to use Visual Basic as your scripting language. However,
it is possible to configure both these systems to support other scripting
languages like C and PERL, or even Pascal.
If you are interested in finding out which scripting languages your server is configured to support, you should ask your system administrator to give you a listing of what is available on your server.
Also, if you have access to a
UNIX based server and can log into a
shell account, then you can find out which languages your system supports by using the UNIX command "which".
If you are using the SH shell, you should see the following
$ which sh
/usr/bin/sh
$ which perl5
/usr/local/bin/perl5
Many
scripting languages are freely distributable and fairly easy for an experienced administrator to install. As a last resort, you can always request that a new language be considered for addition to your local system.
If you are lucky, you may find that your server is already configured
to support several CGI scripting languages. In this case, you just need
to compare the strengths and weaknesses of each language you have available
with the programming tasks you anticipate writing the scripts for. Once
you do this, you should have a good idea of which programming
language is best suited to your specific needs.
When it comes to the CGI, anything goes. Of the vast numbers of programming languages out there, many more than you could possibly learn in a lifetime, most can work with the CGI. So, you will have to spend a little time sifting through the long list to find the one that will work best for you.
Even though there are a lot of different languages available, they tend to fall into several categories based on the way they are processed, -- compiled, interpreted, and compiled/interpreted -- and on the logic behind how the source is written -- procedural and object-oriented.
This chapter will discuss the most common
scripting languages that are available for use on a
UNIX server. All of the major languages presented here will be available for both MacHTTPD and WinHTTPD if they are not available at this time. You should note that MacHTTPD comes with AppleScript as its built-in scripting language, while WinHTTPD comes with Visual Basic.
If you would like some more information on either AppleScript or Visual Basic, you can consult the following:
Shell languages are easier to learn than robust scripting languages like C or perl. Likewise, object-oriented languages like C++, PERL 5, and JAVA are the hardest to get used to.
Some of the available programming languages are compiled rather than
being interpreted. The two most commonly used are C and C++. When using
a compiled language, the program as it appears when you write it is referred
to as the source code. This source code is then processed by the language's
compiler into a much smaller version that is in the
machine's
native language and is usually referred to as object code. Once the source
code is successfully compiled, the object code can be run by the server
without fear of syntax errors. In this more compact form, the object code
usually executes much faster than code from
scripting
languages that are compiled at
runtime.
Unfortunately, this does mean that you have to recompile the source code
each time a change is made in the script.
One of the most popular CGI
scripting language is C. It was developed by Brian Kernighan and Dennis
Ritchie in 1972 at Bell Labs. This procedural language is already familiar
to a large number of programmers and thus is their
scripting
language of choice. As such, there are many large archives of existing
C source code that you can adapt to fit your specific programming needs.
Since C is a compiled language, it must be processed into a small binary
object code before it can be executed. As was mentioned earlier, this allows
these scripts to execute very quickly. So, if a quick response from the
script is your primary consideration for picking a scripting
language, you should stick with a compiled language like C. The best use
for CGI scripts coded in C is for processing large amounts of numeric information
quickly and efficiently.
Unfortunately, most of the CGI scripts written today focus on complex
regular expressions and string data. These types of programs can be very
awkward to write in C. This is one major reason why many CGI programmers
are using PERL
instead.
All UNIX based servers come equipped with C, C++ and at least one shell language such as SH.
Like its predecessor C, C++ (developed by Bjarne Stroustrup at AT&T) is a compiled language that executes small binary object code very quickly. However, C++ is not as similar to C as you might anticipate from the name. While C is a procedural language, C++ is part of the object-oriented paradigm. What this means is that as an object-oriented language, C++ is much more concerned with the function, interaction and reusability of its objects than it is with the actual steps it takes to get the job done.
Since C++ is object-oriented, it will take quite an adjustment if you aren't already familiar with this type of programming. So, expect a large learning curve if you will be writing your first object-oriented source. However, if you do take the time to learn it, you will find that C++ objects are much easier to reuse and to expand its functionality than other procedural language's source.
The only other major drawback for using C++ for your CGI scripting is
that there is not a lot of public domain source. Only recently have software
engineers started to program object-oriented solutions for
CGI
scripting needs. Thus, you might have to wait awhile before you start to
see large archives of code for public use. However, as time goes on, this
will become much less of an issue.
A good source for more information on C++ is the Usenet
group comp.lang.c++.moderated.
Unlike C and C++, some languages are not compiled into tight binary code before they are executed. Some, like the shell language SH, are interpreted during execution. This means that any syntax errors in the script will not be detected until the program has already started to run. This, coupled with the limited power of the shell languages, means that they are not as useful for larger scripting jobs as some of the other languages dealt with in this chapter.
PERL, along with several other interpreted languages, avoids this problem by being compiled at runtime. What this means is that the PERL interpreter checks each line of code for proper syntax before the code is compiled. Then, the code is compiled and executed. However, unlike C, this doesn't result in a truly compiled object that can then be reused. PERL scripts are interpreted and compiled each time they are executed. Thus, there is no need to keep track of separate source and object files for the same script.
There are several commonly available shell scripting languages, or command interpreters as they are sometimes called. The most common ones are SH and C shell. Although these are among the most important user interfaces for the UNIX environment, they are not the best choice for a CGI scripting language.
These shell
languages are designed as
UNIX
tools and thus lack much of the power and features of true programming
languages. However, they can be put to good use when writing simple, rather
disposable CGI scripts or when you need a little job done in a hurry.
If you do decide to write a script using one of these languages, you should remember that they are not compiled. Rather, they are interpreted line by line, each line of code being executed before the next is read into the command interpreter. Thus, if you have any syntax errors in your script, you won't find them until the script has already executed part way. At that time, your application will crash and could cause serious problems with your system.
One of the most commonly used languages for CGI
scripting is
PERL
4.036. PERL, which stands for "Practical Extraction and Report Language,"
was developed by Larry Wall, who still maintains it. All the versions of
PERL except the newest one, are procedural. However, the newest release,
version 5, is object-oriented and represents a major restructuring of the
PERL language. However, most
PERL
4 programs should run fine using
PERL
5. This latest version will be discussed briefly later in this chapter.
A key feature of PERL is that it is very open ended. It doesn't confine the user to a certain rigorous set of syntax. Instead, PERL usually provides several methods of doing each task, which makes it easier to program using your own personal style. Also, PERL supports almost all the common features of C, so a C programmer can write PERL code that looks very much like the C they are used to.
Another key feature of PERL
is its powerful handling of strings and regular expressions. Using the
built in string manipulation functions of PERL, many scripts are easily
written that would be much harder to program in C. Since the overwhelming
majority of all CGI scripts handle string data, it is no wonder that so
many
CGI
scripts are written in
PERL.
Another thing to keep in mind is that PERL
is completely interpreted and compiled at
runtime.
This means that you won't get a syntax error after the program is already
running like you might programming in a shell language. At the same time,
it means that you can simply make a change in your source code and it will
take effect. You don't have to pre-compile your source into object code
each time you make a change like you do using C.
Since PERL
4 is currently the most widely used
CGI
scripting language on the Web, and as it can be run on a wide variety of
server types, I have chosen to use it for the majority of the CGI scripting
examples used in both this and the following chapter. If you would like
more information about this scripting language you should take a look at
the PERL Language Home Page at http://www.perl.com/perl/index.html.
At this point, you may be asking yourself why is this guy telling me
about PERL 5 when he just got finished making PERL 4 seem like the perfect
CGI scripting language? Well, the answer, my friend, is simple. PERL 5
is to PERL 4 what C++ is to C. What this means is that while PERL 4 is
procedural, PERL 5 is object-oriented. Also, while PERL 4 is forced mostly
to go it alone, PERL
5 comes equipped to handle reusable modules along with a lot of other new
features.
PACKAGE - A package is a programming context in which local variables are defined and used, as in a subroutine.
This description of the
PERL 5 modules comes directly from the
hypertext version of the PERL 5 manual, which can be found at http://www.phlab.missouri.edu/perl/perl5man/.
PERL Modules
In PERL 5, the notion of packages has been extended into the notion of modules. A module is a package that is defined in a library file of the same name, and is designed to be reusable. It may do this by providing a mechanism for exporting some of its symbols into the symbol table of any package using it. Or it may function as a class definition and make its semantics available implicitly through method calls on the class and its objects, without explicit exportation of any symbols. Or it can do a little of both.
For a very up-to-date list of all the PERL 5 modules, see the PERL 5 Module List at ftp://rtfm.mit.edu/pub/usenet/news.answers/perl-faq/module-list
As it stands, PERL 5 represents a total renovation of this language. Almost every line in the original code has been redone. This, coupled with the transition from a procedural to an object-oriented language with a lot of new bells and whistles, will make PERL 5 a very popular CGI scripting language for a long time to come.
For more information on this new version of PERL, see the PERL 5 WWW Page at http://www.metronet.com/1h/perlinfo/perl5.html. Or, you can subscribe to the PERL Usenet group at comp.lang.perl.
So far you have been given some examples of compiled and interpreted
languages. Recently, though, a language has been developed that is both
compiled and interpreted. This programming language is JAVA, which is first
compiled into a platform independent binary bytecode. Then, when the script
is executed, the pre-compiled bytecode
is interpreted by the local platform into a platform-specific
machine
code. Thus, as long as there is a JAVA interpreter for the platform you
are using, you can use any JAVA bytecode regardless of the platform it
was written for. This design allows these programs to become truly platform
independent. Thus, programmers will no longer have to grapple with porting
their software across platforms.
The JAVA
language is being hailed on the
Internet
as the
scripting
language of the future and a possible replacement for the CGI. When Sun
MicroSystems first started developing JAVA, they intended to write it entirely
in C++. However, as time went on, they decided that there were too many
limitations within the language for it to be optimally suited for Internet
programming. So, they struck out on their own. However, they have endeavored
to stick closely to C++ while designing the language. As a result, JAVA
is a member of the object-oriented programming paradigm and should be fairly
easy for experienced C++ programmers to pick up.
The object-oriented structure of JAVA is what makes its applications
modular while its platform independence makes it very portable. JAVA
was defined by
Sun
MicroSystems in its first white paper as follows:
JAVA: A simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multi-threaded, and dynamic language.
If JAVA can actually live up to this description, then it might very well become the dominant scripting language on the Internet.
See "Java and JavaScript"
As you advance down the path to mastery (or at least proficiency) in your favorite CGI scripting language, you need to know where to look for help and the latest online information.
My personal favorite is using listserves. These are groups of people who share a common interest. Each time someone posts a message to the list, everyone who is subscribed will get a copy. Then, any of the hundreds or even thousands of people who received your post may choose to answer your mail and give you the information you requested. The fastest way to find a news group that is right for you is to check out L-Soft's search engine for their listserves at http://www.lsoft.com/lists/LIST_Q.html. Just pick a topic like HTML, CGI, or JAVA and you will get a series of mailing lists with information on how to subscribe to each one.
If you like the idea of a listserve, but don't want your mailbox filled with mail everyday, then a news group may be for you. These are similar to a listserve except that you read the posts off of a news spool rather than out of your inbox. Also, many newsgroup applications allow you to search the posts by subject, author, or keyword. Here is a list of some of my favorite newgroups on CGI programming.
You will avoid upsetting others on listserves and Newsgroups if you remember to always try to figure out problems on your own before asking for help.
Another great source of online
CGI information is personal
Web
sites. Many individuals have amassed a
mountain
of links to key information archives on the net for their favorite
scripting
language. Finding a couple of these gems can save hours of surfing the
Web for information.
As is inevitable with most technology, the CGI for all it's worth, is
already becoming outdated. With the explosive growth of technology in this
day and age, the CGI is starting to show its age as new and exciting alternatives
to CGI
scripting are being developed. In this, the final section of this chapter,
I will discuss a few of these alternatives including SSI (Server Side Includes)
as well as JavaScript and Visual Basic Script.
If you are using an NCSA server on a UNIX system, then you have access
to a special feature of this server commonly referred to as Server
Side Includes (SSI). If you turn on this feature of the server, the server
will recognize .shtml files as html documents that need to be treated specially.
When the server sends a .shtml file it doesn't passively send the requested
document to the browser, but rather actively parses it. This means that
the server looks at the HTML document line by line as it is sending it
to see if the HTML page includes any special instructions that the server
should carry out while it is sending the page. Usually these instructions
take one of the following forms.
For example, if you have a standard footer that you need to place on every page of your Web site, with SSI you can simple place the following line of code at the bottom of each document where you want the footer to appear.
<!--#include file="footer.html"-->
or
<!--#include virtual="http://www.blah.com/footer.html"-->
Just remember that if you use file then you must include the relative path for the file to be included and that the file must be in the same directory or a subdirectory of the main document. Also, if you want you can use virtual and specify the complete URL for the file you wish to include. Or, if you have a script that generates a custom footer for each page, then you can include the output from that script by placing the following line where you would like the script's output to appear within the document.
<!--#exec cgi="/cgibin/footer.pl"-->
The main advantage for using SSI's within your Web pages is that it can allow your documents to display current information like the date and time without the use of a CGI script. Also, it can allow you to maintain only a single version of information you would have to repeat on many pages under normal circumstances.
However, there is one drawback of using SSI's that you should be aware
of. By forcing the server to parse each document it sends to the browser,
line by line, a lot of processing time is required which both slows down
the server and makes the Web pages take longer to load. If a high traffic
site were to parse every page that it sent out to check for SSI's, the
server would very likely experience a very marked decrease in efficiency.
For a more detailed discussion of SSI's you should refer to NCSA's online SSI tutorial at http://hoohoo.ncsa.uiuc.edu/docs/tutorials/includes.html.
Along with the development of the new programming
language JAVA that was briefly introduced earlier, JavaScript is providing
Web
authors with alternatives to more traditional
CGI
programming. By embedding the JavaScript code directly into the Web page,
newer browsers like
Netscape
2.0 are able to execute these scripts directly on the client's machine
without the need to make a call to the server. This can greatly increase
the speed at which the client gets feedback from their actions and reduce
the load on the Web server at the same time. It is hoped my many that this
new
scripting
language will reduce the heavy server load imposed my many traditional
CGI
programs by moving much of the processing overhead to the client's machine.
JavaScript is a simpler version of the object based JAVA
language that is interpreted at
runtime
much like PERL rather than having to be compiled before it can be executed.
Although JavaScript is a simpler version of the JAVA language, it still
retains much of its power. Also, JavaScripts can be written to recognize
and react to such things as
mouse
clicks, form field data, and the use of page
navigation.
The complete JavaScript
Authoring Guide by Netscape can be found at http://cgi.netscape.com/eng/mozilla/Gold/handbook/javascript/index.html
and is an excellent place to start your exploration of this alternative
to
CGI
programming.
Another very promising alternative to CGI
will be Visual Basic Script or VBScript which is a cross-platform subset
of
Visual
Basic 4.0 by
Microsoft.
This
scripting
language will be in direct competition with JavaScript and will provide
much the same functionality as a similar
scripting
language embedded within the
HTML
pages themselves.
Like JavaScript, VBScript's
major function will be to reduce server overhead by moving the processing
load to the client's machine and in the process greatly speed up the response
to client's actions. VBScripts will be able to link and automate many types
of objects including
OLE
objects and
JAVA
applets. Currently, Microsoft plans for their VBScripting language to be
fully implemented in the 3.0 release of Microsoft Internet Explorer.
You can find the latest information on VBScript from the Visual Basic Microsoft Web site at http://www.microsoft.com/VBASIC/vbscript/vbscript.htm.
![]() ![]() |