Notice: This material is excerpted from Special Edition Using CGI, ISBN: 0-7897-0740-3. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.
If you've ever run across a Web page that says something like, "You are the 203rd visitor to this page," or "You are calling from 199.1.166.171," then you've probably seen server-side includes (SSI) at work.
If you view the source for such a page, you don't see a link to another page or an inserted GIF image or a CGI call. You just see normal text, mixed in with all the rest of the HTML code and plain text.
This chapter explains the magic behind SSI programming, shows you some examples, and teaches you how to write your own SSI programs. You explore the following:
Normally, a Web server doesn't look at the files it passes along to browsers. It checks security-that is, makes sure the caller has the right to read the file-but otherwise just hands the file over.
A Web "page" is often more than one document. The most common addition is an inline graphic or two, plus a background graphic. As you learned in chapters 2 and 3, a page can contain information about other resources to display at the same time. When the browser gets back to the first page, it scans the page, determines whether more parts exist, and sends out requests for the remaining bits. This scanning and interpretation process is called parsing in computer lingo, and it normally happens on the client's side of the connection.
Under certain circumstances, though, you can talk the server into parsing the document before it ever gets to the client. Instead of blindly handing over the document ignorant of the contents, the server can interpret the documents first. When this parsing occurs on the server's side of the connection, the process is called a server-side include, or SSI.
Why include? Because the first use of server-side parsing was to allow files to be included along with the one being referenced. Computer nerds love acronyms, and SSI was established quickly. Changing the term later on, when other abilities became popular too, seemed pointless.
If you are the Webmaster for a site, you might be responsible for 50, 100, or 250 pages. Because you're a conscientious Webmaster, you include your e-mail address at the bottom of each page so that people can tell you of any problems. What happens when your e-mail address changes? Without SSI, you need to edit 50, 100, or 250 pages individually. Hope you're a good typist!
With SSI, however, you can include your e-mail address on each page. Your e-mail address actually resides in one spot-say, a file called webmaster.email.txt somewhere on your server-and each page uses SSI to include the contents of this file. Then, when your e-mail address changes, all you have to do is update webmaster.email.txt with the new information. All 250 pages referencing it automatically have the new information instantly.
Server-side includes can do more than include files. You can use special commands to include the current date and time. Other commands let you report the last-modification date of a file or its size. Yet another command lets you execute a subprogram in the manner of CGI and incorporate its output right into the flow of the text.
Generally, the hallmark of SSI is that the end result is a text document. If you implement an SSI page hit counter, for instance, it would report the hits using text, not inline graphical images. From your browser's point of view, the document is all text, with nothing odd about it. SSI works without the browser's consent, participation, or knowledge. The magic is that the text is generated on the fly by SSI, not hard-coded when you created the HTML file.
Unlike many protocols, options, and interfaces, SSI isn't governed by an Internet RFC (Request For Comment) or other standard. Each server manufacturer is free to implement SSI on an ad hoc basis, including whichever commands suit the development team's fancy, using whatever syntax strikes them as reasonable. Some servers, such as the freeware EMWAC server for Windows NT, don't support SSI at all.
Therefore, I can't give you a list of commands and syntax rules that will apply in all situations. Most servers follow NCSA's specification up to a point. Although you may not find the exact commands, you can probably find functions similar to those in NCSA's arsenal.
Because SSI isn't defined by a standard, server developers tend to modify their implementations of SSI more frequently than they modify other things. Even if I listed all the known servers and how they implement SSI today, the list would be out of date by the time this book got into your hands.
The only way to determine what SSI functions your server supports and what syntax your server uses for each command is to find and study your server's documentation. This chapter shows you the most common functions on the most common servers, and you'll probably find that the syntax is valid. On the other hand, the only authority is your particular server's documentation, so get a copy and keep it handy as you work through this chapter.
Although plenty of FAQ sheets (Frequently Asked Questions, usually with answers, too) are available on the Internet, configuring SSI to work on NCSA seems to be a common stumbling block. The other servers are a little easier to use.
On most servers, SSI must be "turned on" before it will work. By default, SSI is not enabled. This is for your protection, because mismanaged SSI can be a huge security risk. What if, for instance, you give any caller or any user on the system privileges to run any program or read any file anywhere on the server? Maybe nothing bad would happen, but that's not the safe way to bet. That's the reason that SSI comes turned off.
In an NCSA (UNIX) environment, you enable SSI by editing the configuration files. You must have administrative privileges on the server to edit these files, although you can probably look at them with ordinary user privileges.
You need to make these changes to enable SSI on NCSA:
That's really all there is to editing the configuration files. If you can puzzle through the documentation well enough to use the Options and AddType directives, then you're home free. Play around using one hand on the keyboard and the other holding the documentation until you understand. Of course, finding the files in the first place might be a challenge, but, hey, that's UNIX. You either love it or already use Windows NT.
Enabling SSI on Windows NT machines is usually a matter of naming your HTML files correctly and clicking on a check box somewhere in the Configuration dialog box. Process Software's Purveyor server uses .HTP as the default file-name extension for parsed files. Most other servers emulate NCSA and use .SHTML instead. However, changing the extension is usually pretty simple. Hunt up the MIME types dialog box and add a MIME type of text/x-server-parsed for whatever file-name extension you want. (As always, check your particular server's documentation to find out whether this technique will work.)
One last note on configuration: Many, if not most, servers either allow you to require, or require by default, that all SSI executables be located in your CGI-BIN or SCRIPTS directory. If your server doesn't require this behavior by default, hunt up the documentation and enable it. If the only programs that can be run are located in a known, controlled directory, the chances for errors (and hacking) are greatly reduced.
Now that you've gotten SSI enabled on your server (or talked your system administrator into doing it for you), you're ready to learn how to use SSI. Sit back and relax a bit: What you've done already is by far the hardest part. From here on, you simply need to hunt up syntax in your particular server's documentation (you did keep it handy, right?) and try things out.
Of special interest at this point is the one thing all SSI implementations have in common: All SSI commands are embedded within regular HTML comments.
Having embedded commands makes it easy to implement SSI while still making the HTML portable. A server that doesn't understand SSI passes the commands on to the browser, and the browser ignores them because they're formatted as comments. A server that does understand SSI, however, does not pass the commands on to the browser. Instead, the server parses the HTML from the top down, executing each comment-embedded command, replacing the comment with the output of the command.
This process is not as complicated as it sounds. You go through some step-by-step examples later in this chapter, but first you examine HTML comments.
Because anything untagged in HTML is considered displayable text, comments must be tagged like any other directive. Tags are always marked with angle brackets (< and >) and a keyword, which may be as short as a single letter. For example, the familiar paragraph marker, <p>, is a monatomic tag. Monatomic means that no closing tag is necessary. Diatomic tags, such as <a href...>...</a>, enclose displayable information between the opening and closing tags. Monatomic tags have no displayable information, so they don't need a closing tag.
The comment tag is monatomic, and the keyword is !-- for some strange reason. Thus, all comments have the form <!--comment text here-->. No one quite understands why a bang (exclamation point) and two dashes were chosen to indicate a comment. For my money, the word "comment" would have worked; or a single glitch; or the old C convention /*; or the new C convention //; or the Basic convention rem; or even the assembler convention of a semicolon. But you're stuck with !-- whether it makes sense or not, so memorize it-you certainly can't make a mnemonic for it. Notice also that comments end with --> instead of just >.
Although half the servers and browsers in the world can understand the <!--comment text here> syntax, the remaining ones want the comment to end with --> instead of just the expected closing angle bracket. Why? Because this lets you comment out sections of HTML code, including lines containing < and > symbols. Although not all servers and browsers require comments to end with -->, all of them will understand the syntax. Therefore, you're better off surrounding your comments with <!-- at the front and --> at the end.
So a comment is anything with the format <!-- -->. Browsers know to ignore this information. Servers don't even see it, unless SSI is enabled.
What happens to comments when SSI is enabled? The server looks for comments and examines the text inside them for commands. The server distinguishes comments that are really SSI commands from comments that are just comments by a simple convention: All SSI commands start with a pound sign (#).
All SSI commands thus begin with <!--# followed by information meaningful to your server. Typically, each server supports a list of keywords, and it expects to find one of these keywords snuggled up against the pound sign. After the keyword come any parameters for the command-with syntax that varies both by command and by server-and then the standard comment closing (-->).
Most SSI commands have the form <!--#command tagname="parameter" -->, where command is a keyword indicating what the server is supposed to do, tagname is a keyword indicating the type of parameter, and parameter is the user-defined value for that command.
Note that the first space character is after the command keyword. Most servers refuse to perform SSI if you don't follow this syntax exactly. SSI syntax is probably the fussiest you'll encounter.
The following sections provide step-by-step examples of SSI commands in action.
The following is the syntax for echo:
The current date is <!--#echo var="DATE_LOCAL" -->
This syntax expands to something like the following when executed by an NCSA server:
The current date is 28 Feb 1999 12:00:13 GMT-6
The command is echo, the tagname is var (short for variable), and the parameter is DATE_LOCAL. DATE_LOCAL is a variable that is defined by the NCSA server and that represents the local time on the server. When the server processes this line, it sees that the command requires it to echo (print) something. The echo command takes only one parameter, the keyword var, which is followed by a value specifying which variable you want echoed.
Most servers let you echo at least a subset of the standard CGI variables, if not all of them. You can usually find some special variables, too, that are available only to SSI. DATE_LOCAL is one of them.
Again on the NCSA server, you can change the time format using the SSI config command, as follows:
<!--#config timefmt="format string" -->
Substitute a valid time format string for "format string" in the preceding example. The syntax of the format string is compatible with the string you pass to the UNIX strftime() system call. For example, %a %d %b %y gives you Sun 28 Feb 99.
Here are some other useful variables you can echo:
You are calling from <!--#echo var="REMOTE_ADDR"-->
outputs a line like
You are calling from 199.1.166.172
Here's another example:
This page is <!--#echo var="DOCUMENT_NAME"-->
yields a line resembling
This page is /home/joeblow/ssitest.shtml
Spend some time learning which variables your server lets you echo, and the syntax for each. Often related commands (such as the config timefmt command) affect the way a variable is printed.
The include command typically takes one tag, file, with a single parameter specifying which file to include. NCSA limits the included file to something relative to, but not above, the current directory. Thus, ../ is disallowed, as is any absolute path, even if the HTTPd server process would normally have access there.
Other servers let you specify any path at all, or work with the operating system to limit access in a more flexible way than hard-coding forbidden paths. Purveyor, for instance, lets you use UNC file specifications, thus allowing your include to pull its data from anywhere reachable on the network. Regular Windows NT file permission requirements must be met, of course. Don't give the user ID under which Purveyor runs access to areas you don't want include-able.
A typical use for the include command is a closing tag line at the bottom of a page. Say you're working in the directory /home/susan, and you create a simple text file called email.htm:
Click <a href="mailto:susan@nowhere.com">here</a> to send me email.
Next, you create index.shtml, which is the default page for /home/susan. Make it short and sweet, as follows:
<html> <head><title>Susan's Home Page</title></head> <body> <h1>Susan's Home Page</h1> Hi, I'm Susan. <!--#include file="email.htm"--> See you later! </body> </html>
When index.shtml is displayed, the contents of email.htm get sucked in, resulting in the following being sent to the browser:
<html> <head><title>Susan's Home Page</title></head> <body> <h1>Susan's Home Page</h1> Hi, I'm Susan. Click <a href="mailto:susan@nowhere.com">here</a> to send me email. See you later! </body> </html>
You may use the email.htm file in as many other files as you want, thus limiting the places where you need to change Susan's e-mail address to exactly one.
You can turn off the exec command on some servers while leaving other SSI functions enabled. If you are the system administrator of your server, study your setup and security arrangements carefully before enabling exec.
exec is a very powerful and almost infinitely flexible command. An SSI exec is very much like regular CGI in that it spawns a subprocess and lets it open files, provide output, and do just about anything else an executable can do.
On Netscape and NCSA servers, your SSI executable must be named *.cgi and probably will have to live in a centrally managed CGI-BIN directory. Check your particular server's documentation and your system setup to find out. Keep the documentation handy, too-you'll need it again in just a moment.
The exec command typically takes one tag, called cgi most frequently, but also exe, script, and cmd on various servers. Some servers let you specify two different ways to execute programs. For example, <!--#exec cgi or <!--#exec exe usually means to launch a program and treat it just like a CGI program. <!--#exec cmd usually means to launch a shell script (called a batch file in the PC world). Shell scripts often, but not always, get treated specially by the server. In addition to launching the shell, or command processor, and passing the script name as the parameter, the server often forges the standard MIME headers, relieving the script of that duty. You have only one way to know how your server handles this process: If you haven't found your server's documentation yet, stop right now and get it. There are no rules of thumb, no standards, and no rational ways to figure out the syntax and behavior.
Here's a trivial example of using a shell script on a UNIX platform to add a line of text. Start with a file called myfile.shtml, which contains the following somewhere in the body:
Now is the time <!--#exec cgi="/cgi-bin/foo.cgi" --> to come to the aid of their country.
Then create the shell script foo.cgi, and place it in the /cgi-bin directory:
#!/bin/sh echo "for all good persons"
When you then access myfile.shtml, you see the following:
Now is the time for all good persons to come to the aid of their country.
Note that this example assumes you have configured your server to require SSI scripts to live in the /cgi-bin subdirectory and that you have designated .cgi as the correct extension for scripts.
Some implementations of SSI allow you to include command-line arguments. Sadly, NCSA isn't one of them. Each server has its own way of handling command-line arguments, of course; you have to consult your trusty documentation yet again to find out if, and how, your server allows this feature.
The SPRY Mosaic server from CompuServe actually uses an args key for arguments. A typical SPRY Mosaic script might be invoked this way: <!--#exec script="scriptname.exe" args="arg1 arg2 arg3" -->.
Process Software's Purveyor allows arguments, even though no documentation is available to support the mechanism. With Purveyor, you supply the arguments exactly as you would on a real command line: <!--#exec exe="\serverroot\cgi-bin\scriptname arg1 arg2 arg3" -->.
Your server probably supports as many as a dozen commands besides the three covered in the preceding sections. Here are some of the most common, with a brief explanation of each:
This section presents the complete C code for several useful SSI programs. Some of them are platform-independent; others make use of some special features in the Windows NT operating system. You can find the source code, plus compiled executables for the 32-bit Windows NT/Windows 95 environment.
The SSIDump program is a handy debugging utility that just dumps the SSI environment variables and command-line arguments back to the browser (see listing 16.1). Because the code is so short, I'll let it speak for itself.
Listing 16.1 ssidump.c: SSI Program to Dump SSI Environment Variables // SSIDUMP.C // This program dumps the SSI environment variables // to the screen. The code is platform-independent. // Compile it for your system and place it in your // CGI-BIN directory. #include <windows.h> // only required for Windows machines #include <stdio.h> void main(int argc, char * argv[]) { // First declare our variables. This program // only uses one, I, a generic integer counter. int i; // Print off some nice-looking header // information. Note that unlike a CGI // program, there is no need to include the // standard HTTP headers. printf("<h1>SSI Environment Dump</h1>\n"); printf("<b>Command-Line Arguments:</b>\n"); // Now print out the command-line arguments. // By convention, arg[0] is the path to this // program at run-time. args[1] through // arg[argc-1] are passed to the program as // parameters. Only some servers will allow // command-line arguments. We'll use a nice // bulleted list format to make it readable: printf("<ul>\n"); for (i = 0; i < argc; i++) { printf("<li>argv[%i]=%s\n",i,argv[i]); } printf("</ul>\n"); // Now print out whatever environment variables // are visible to us. We'll use the bulleted // list format again: printf("<b>Environment Variables:</b>\n<ul>\n"); i = 0; while (_environ[i]) { printf("<li>%s\n",_environ[i]); i++; } printf("</ul>\n"); // Flush the output and we're done fflush(stdout); return; }
The RQ program hunts up a random quotation (or other bit of text) from a file and outputs it. The quotation file uses a simple format: Each entry must be contiguous but can span any number of lines. Entries are separated from each other by a single blank line. Listing 16.2 a sample quotation file. The entries were chosen randomly by RQ itself. Make of that what you will.
Listing 16.2 rq.txt: Sample Text File for Use with the RQ Program KEEPING THIS A HAPPY FILE: o All entries should start flush-left. o Entries may be up to 8K in length. o Entries must be at least one line. o Entries may contain 1-9999 lines (8K max). o Line length is irrelevant; CRs are ignored. o Entries are separated by ONE blank line. o The last entry must be followed by a blank line, too. o The first entry (these lines here) will never get picked, so we use it to document the file. o Length of the file doesn't change retrieval time. o Any line beginning with "--" it is treated as a byline. It must be the last line in the block, otherwise the quotation might get cut off. o You can use HTML formatting tags. Drunk is feeling sophisticated when you can't say it. --Anon What really flatters a man is that you think him worth flattery. --George Bernard Shaw True patriotism hates injustice in its own land more than anywhere else. --Clarence Darrow If by "fundies" we mean "fanatics," that's okay with me, but in that case shouldn't we call them fannies? --Damon Knight My <i>other</i> car is <i>also</i> a Porsche. --Bumper Sticker The death sentence is a necessary and efficacious means for the Church to attain its ends when rebels against it disturb the ecclesiastical unity, especially obstinate heretics who cannot be restrained by any other penalty from continuing to disturb ecclesiastical order. --Pope Leo XIII
Note that although the preceding sample file has text quotations in it, you can just as easily use RQ for random links or graphics, too. For random links or graphics, leave off the bylines and use standard <a href> format. You can even use RQ for single words or phrases used to complete a sentence in real time. For example, the phrases in parentheses could come from an RQ file to complete this sentence: "If you don't like this page, you're (a pusillanimous slug) (a cultured person) (pond scum) (probably dead) (quite perceptive) (drunk) (an editor)." I'll leave it to you to figure out which are compliments and which are insults.
RQ has security precautions built in. RQ does not read from a file that's located anywhere other than the same directory as RQ itself or a subdirectory under it. This precaution prevents malicious users from misusing RQ to read files elsewhere on the server. RQ looks for a double-dot, in case the user tries to evade the path requirement by ascending the directory tree. RQ checks for a double-backslash, in case it finds itself on an NT server and the user tries to slip in a UNC file specification. RQ checks for a colon, in case the user tries to specify a drive letter. If RQ finds any of these situations, it spits out an error message and dies.
RQ can accept the name of a quotation file from a command-line argument. If you're unlucky enough to run RQ on a server that doesn't support command-line arguments, or if you leave the command-line arguments off, RQ tries to open RQ.TXT in the same directory as itself. You can have multiple executables, each reading a different file, simply by having copies of RQ with different names. RQ looks for its executable name at runtime, strips the extension, and adds .TXT. So if you have a copy of RQ named RQ2, it opens RQ2.TXT.
Listing 16.3 shows the code for the rq.c program.
Listing 16.3 rq.c: Source Code for the RQ Program // RQ.C // This program reads a text file and extracts a random // quotation from it. If a citation line is found, it // treats it as a citation; otherwise, all text is treated // the same. HTML tags may be embedded in the text. // RQ is mostly platform-independent. You'll have to change // path element separators to the correct slash if you // compile for Unix. There are no platform-specific system // calls, though, so a little bit of customization should // enable the code to run on any platform. #include <windows.h> // only required for Windows #include <stdio.h> #include <stdlib.h> #include <io.h> char buffer[16000]; // temp holding buffer void main(int argc, char * argv[]) { FILE *f; // file-info structure fpos_t fpos; // file-pos structure long flen; // length of the file char fname[80];// the file name long lrand; // a long random number BOOL goodpos; // switch char *p; // generic pointer char *soq; // start-of-quote pointer char *eoq; // end-of-quote pointer // Seed the random number generator srand(GetTickCount()); // Set all I/O streams to unbuffered setvbuf(stdin,NULL,_IONBF,0); setvbuf(stdout,NULL,_IONBF,0); // Open the quote file // If a command-line argument is present, treat it as // the file name. But first check it for validity! if (argc > 1) { p = strstr(argv[1],".."); if (p==NULL) p = strstr(argv[1],"\\\\"); if (p==NULL) p = strchr(argv[1],':'); // If .., \\, or : found, reject the filename if (p) { printf("Invalid relative path " "specified: %s",argv[1]); return; } // Otherwise append it to our own path strcpy(fname,argv[0]); p = strrchr(fname,'\\'); if (p) *p = '\0'; strcat(fname,"\\"); strcat(fname,argv[1]); } else { // No command-line parm found, so use our // executable name, minus our extension, plus // .txt as the filename strcpy(fname,_pgmptr); p = strrchr(fname,'.'); if (p) strcpy(p,".txt"); } // We have a filename, so try to open the file f = fopen(fname,"r"); // If open failed, die right here if (f==NULL) { printf("Could not open '%s' for read.",fname); return; } // Get total length of file in bytes. // We do this by seeking to the end and then // reading the offset of our current position. // There are other ways of getting this // information, but this way works almost // everywhere, whereas the other ways are // platform-dependent. fseek(f,0,SEEK_END); fgetpos(f,&fpos); flen = fpos; // Seek to a random point in the file. Loop through // the following section until we find a block of text // we can use. goodpos = FALSE; // goes TRUE when we're done while (!goodpos) { // Make a random offset into the file. Generate // the number based on the file's length. if (flen > 65535) { lrand = MAKELONG(rand(),rand()); } else { lrand = MAKELONG(rand(),0); } // If our random number is less than the length // of the file, use it as an offset. Seek there // and read whatever we find. if (lrand < flen) { fpos = lrand; fsetpos(f,&fpos); if (fread(buffer, sizeof(char), sizeof(buffer),f) !=0 ) { soq=NULL; eoq=NULL; soq = strstr(buffer,"\n\n"); if (soq) eoq = strstr(soq+2,"\n\n"); if (eoq) { // skip the first CR soq++; // and the one for the blank line soq++; // mark end of string *eoq='\0'; // look for citation marker p = strstr(soq,"\n--"); // if found, exempt it & remember if (p) { *p='\0'; p++; } // print the quotation printf(soq); if (p) // and citation if any printf("<br><cite>%s</cite>",p); // exit the loop goodpos=TRUE; } } } } fclose(f); fflush(stdout); return; }
The XMAS program prints out the number of days remaining until Christmas. It recognizes Christmas Day and Christmas Eve as special cases, and solves the general case problem by brute force. You can certainly find more elegant and efficient ways to calculate elapsed time, but this method doesn't rely on any platform-specific date/time routines.
The code in listing 16.4 is short enough and uncomplicated enough that it needs no further explanation.
Listing 16.4 xmas.c: Source Code for XMAS Program // XMAS.C // This program calculates the number of days between // the time of invocation and the nearest upcoming 25 // December. It reports the result as a complete sentence. // The code is platform-independent. #include <windows.h> #include <stdio.h> #include <time.h> void main() { // Some variables, all self-explanatory struct tm today; time_t now; int days; // Get the current date, first retrieving the // Universal Coordinated Time, then converting it // to local time, stored in the today tm structure. time(&now); today = *localtime(&now); mktime(&today); // month is zero-based (0=jan, 1=feb, etc); // day is one-based // year is one-based // so Christmas Eve is 11/24 // Is it Christmas Eve? if ((today.tm_mon == 11) && (today.tm_mday==24)) { printf("Today is Christmas Eve!"); } else { // Is it Christmas Day? if ((today.tm_mon == 11) && (today.tm_mday==25)) { printf("Today is Christmas Day!"); } else { // Calculate days by adding one and comparing // for 11/25 repeatedly days =0; while ( (today.tm_mon != 11) | (today.tm_mday != 25) ) { days++; today.tm_mday = today.tm_mday + 1; mktime(&today); } // Print the result using the customary // static verb formation printf("There are %i days until Christmas." ,days); } } // Flush the output and we're done fflush(stdout); return; }
The HitCount program creates that all-time favorite, a page hit count. The output is a cardinal number (1, 2, 3, and so on) and nothing else. HitCount works only on Windows NT. See listing 16.5 for the C source code.
Listing 16.5 hitcount.c: Source Code for the HitCount Program // HITCOUNT.C // This SSI program produces a cardinal number page hit // count based on the environment variable SCRIPT_NAME. #include <windows.h> #include <stdio.h> #define ERROR_CANT_CREATE "HitCount: Cannot open/create [ccc]registry key." #define ERROR_CANT_UPDATE "HitCount: Cannot update registry key." #define HITCOUNT "Software\\Greyware\\HitCount\\Pages" void main(int argc, char * argv[]) { char szHits[33]; // number of hits for this page char szDefPage[80]; // system default pagename char *p; // generic pointer char *PageName; // pointer to this page's name long dwLength=33; // length of temporary buffer long dwType; // registry value type code long dwRetCode; // generic return code from API HKEY hKey; // registry key handle // Determine where to get the page name. A command- // line argument overrides the SCRIPT_NAME variable. if ((argc==2) && ((*argv[1]=='/') | (*argv[1]=='\\'))) PageName = argv[1]; else PageName = getenv("SCRIPT_NAME"); // If invoked from without SCRIPT_NAME or args, die if (PageName==NULL) { printf("HitCount 1.0.b.960121\n" "Copyright (c) 1995,96 Greyware " "Automation Products\n\n" "Documentation available online from " "Greyware's Web server:\n" "http://www.greyware.com/" "greyware/software/freeware.htp\n\n"); } else { // Open the registry key dwRetCode = RegOpenKeyEx ( HKEY_LOCAL_MACHINE, HITCOUNT, 0, KEY_EXECUTE, &hKey); // If open failed because key doesn't exist, // create it if ((dwRetCode==ERROR_BADDB) || (dwRetCode==ERROR_BADKEY) || (dwRetCode==ERROR_FILE_NOT_FOUND)) dwRetCode = RegCreateKey( HKEY_LOCAL_MACHINE, HITCOUNT, &hKey); // If couldn't open or create, die if (dwRetCode != ERROR_SUCCESS) { printf (ERROR_CANT_CREATE); } else { // Get the default page name dwLength = sizeof(szDefPage); dwRetCode = RegQueryValueEx ( hKey, "(default)", 0, &dwType, szDefPage, &dwLength); if ((dwRetCode == ERROR_SUCCESS) && (dwType == REG_SZ) && (dwLength > 0)) { szDefPage[dwLength] = '\0'; } else { strcpy(szDefPage,"default.htm"); } // If current page uses default page name, // strip the page name _strlwr(PageName); p = strrchr(PageName,'/'); if (p==NULL) p = strrchr(PageName,'\\'); if (p) { p++; if (stricmp(p,szDefPage)==0) *p = '\0'; } // Get this page's information dwLength = sizeof(szHits); dwRetCode = RegQueryValueEx ( hKey, PageName, 0, &dwType, szHits, &dwLength); if ((dwRetCode == ERROR_SUCCESS) && (dwType == REG_SZ) && (dwLength >0)) { szHits[dwLength] = '\0'; } else { strcpy (szHits, "1"); } // Close the registry key dwRetCode = RegCloseKey(hKey); // Print this page's count printf("%s",szHits); // Bump the count by one for next call _ltoa ((atol(szHits)+1), szHits, 10); // Write the new value back to the registry dwRetCode = RegOpenKeyEx ( HKEY_LOCAL_MACHINE, HITCOUNT, 0, KEY_SET_VALUE, &hKey); if (dwRetCode==ERROR_SUCCESS) { dwRetCode = RegSetValueEx( hKey, PageName, 0, REG_SZ, szHits, strlen(szHits)); dwRetCode = RegCloseKey(hKey); } else { printf(ERROR_CANT_UPDATE); } } } fflush(stdout); return; }
HitCount takes advantage of one of NT's unsung glories, the system registry. Counters for other platforms need to worry about creating and updating a database file, file locking, concurrency, and a number of other messy issues. HitCount uses the hierarchical registry as a database, letting the operating system take care of concurrent access.
HitCount is actually remarkably simple compared to other counters. It uses the SCRIPT_NAME environment variable to determine the name of the current page. Thus, you have no worries about passing unique strings as parameters. HitCount takes the page name and either creates or updates a registry entry for it. The information is thus always available and rapidly accessed.
HitCount, like the other samples in this chapter, is freeware from (Greyware Automation Products ). You can find more extensive documentation online at their site. The code is unmodified from the code distributed by Greyware for a good reason: Because registry keys are named, having multiple versions of the software running around loose with different key names just wouldn't do. Therefore, I have retained the key names for compatibility.
The only bit of configuration you might need to do is if your server's default page name isn't default.htm. In that case, add this key to the registry before using HitCount for the first time:
HKEY_LOCAL_MACHINE \Software \Greyware \HitCount \Pages
After you've created the key, add a value under Pages. The name of the value is (default) (with the parentheses), and its type is REG_SZ. Fill in the name of your system's default page. Case doesn't matter.
HitCount uses this information to keep from falsely distinguishing between a hit to http://www.yourserver.com/ and http://www.yourserver.com/default.name. Some Web servers report these two as different URLs in the SCRIPT_NAME environment variable, even though they refer to the same physical page. By setting the default in the registry, you let HitCount know to strip the page name off, if found, thus reconciling any potential problems before they arise. The default is default.htm, so you need to set this value only if your SSI pages use a different name.
HitCntth is a variation of HitCount. Its output is an ordinal number (1st, 2nd, 3rd, and so on). You probably understand the name by now. HitCntth provides the HitCount-th number. Get it?
HitCntth is designed to work alongside HitCount. It uses the same registry keys, so you can switch from one format to the other without having to reset the counter or to worry about duplicate counts. See the HitCount documentation for configuration details.
Creating an ordinal takes a bit more work than printing a cardinal number because the English method of counting is somewhat arbitrary. HitCntth looks for exceptions and handles them separately and then throws a "th" on the end of anything left over. Otherwise, the function is identical to HitCount. Listing 16.6 shows the source code for HitCntth.
Listing 16.6 hitcntth.c: Source Code for the HitCntth Program // HITCNTTH.C // This SSI program produces an ordinal number page hit // count based on the environment variable SCRIPT_NAME. #include <windows.h> #include <stdio.h> #define ERROR_CANT_CREATE "HitCntth: Cannot open/create [ccc]registry key." #define ERROR_CANT_UPDATE "HitCntth: Cannot update registry key." #define HITCOUNT "Software\\Greyware\\HitCount\\Pages" void main(int argc, char * argv[]) { char szHits[33]; // number of hits for this page char szDefPage[80]; // system default pagename char *p; // generic pointer char *PageName; // pointer to this page's name long dwLength=33; // length of temporary buffer long dwType; // registry value type code long dwRetCode; // generic return code from API HKEY hKey; // registry key handle // Determine where to get the page name. A command- // line argument overrides the SCRIPT_NAME variable. if ((argc==2) && ((*argv[1]=='/') | (*argv[1]=='\\'))) PageName = argv[1]; else PageName = getenv("SCRIPT_NAME"); // If invoked from without SCRIPT_NAME or args, die if (PageName==NULL) { printf("HitCntth 1.0.b.960121\n" "Copyright (c) 1995,96 Greyware " "Automation Products\n\n" "Documentation available online from " "Greyware's Web server:\n" "http://www.greyware.com/" "greyware/software/freeware.htp\n\n"); } else { // Open the registry key dwRetCode = RegOpenKeyEx ( HKEY_LOCAL_MACHINE, HITCOUNT, 0, KEY_EXECUTE, &hKey); // If open failed because key doesn't exist, // create it if ((dwRetCode==ERROR_BADDB) || (dwRetCode==ERROR_BADKEY) || (dwRetCode==ERROR_FILE_NOT_FOUND)) dwRetCode = RegCreateKey( HKEY_LOCAL_MACHINE, HITCOUNT, &hKey); // If couldn't open or create, die if (dwRetCode != ERROR_SUCCESS) { printf (ERROR_CANT_CREATE); } else { // Get the default page name dwLength = sizeof(szDefPage); dwRetCode = RegQueryValueEx ( hKey, "(default)", 0, &dwType, szDefPage, &dwLength); if ((dwRetCode == ERROR_SUCCESS) && (dwType == REG_SZ) && (dwLength > 0)) { szDefPage[dwLength] = '\0'; } else { strcpy(szDefPage,"default.htm"); } // If current page uses default page name, // strip the page name _strlwr(PageName); p = strrchr(PageName,'/'); if (p==NULL) p = strrchr(PageName,'\\'); if (p) { p++; if (stricmp(p,szDefPage)==0) *p = '\0'; } // Get this page's information dwLength = sizeof(szHits); dwRetCode = RegQueryValueEx ( hKey, PageName, 0, &dwType, szHits, &dwLength); if ((dwRetCode == ERROR_SUCCESS) && (dwType == REG_SZ) && (dwLength >0)) { szHits[dwLength] = '\0'; } else { strcpy (szHits, "1\0"); } // Close the registry key dwRetCode = RegCloseKey(hKey); // Check for special cases: // look at count mod 100 first switch ((atol(szHits)) % 100) { case 11: // 11th, 111th, 211th, etc. printf("%sth",szHits); break; case 12: // 12th, 112th, 212th, etc. printf("%sth",szHits); break; case 13: // 13th, 113th, 213th, etc. printf("%sth",szHits); break; default: // no choice but to look at last // digit switch (szHits[strlen(szHits)-1]) { case '1': // 1st, 21st, 31st printf("%sst",szHits); break; case '2': // 2nd, 22nd, 32nd printf("%snd",szHits); break; case '3': // 3rd, 23rd, 33rd printf("%srd",szHits); break; default: printf("%sth",szHits); break; } } // Bump the count by one for next call _ltoa ((atol(szHits)+1), szHits, 10); // Write the new value back to the registry dwRetCode = RegOpenKeyEx ( HKEY_LOCAL_MACHINE, HITCOUNT, 0, KEY_SET_VALUE, &hKey); if (dwRetCode==ERROR_SUCCESS) { dwRetCode = RegSetValueEx( hKey, PageName, 0, REG_SZ, szHits, strlen(szHits)); dwRetCode = RegCloseKey(hKey); } else { printf(ERROR_CANT_UPDATE); } } } fflush(stdout); return; }
FirstHit is a companion program for HitCount or HitCntth. It takes care of tracking the date and time of the first hit to any page. FirstHit uses the same registry scheme as HitCount or HitCntth, but it stores its information in a different key. You have to set the (default) page name here, too, if it's something other than default.htm. The proper key is
HKEY_LOCAL_MACHINE \Software \Greyware \FirstHit \Pages
You may sense a theme in a number of areas. First, all these programs use the registry to store information. Second, they use a similar naming scheme-a hierarchical one. Third, they share great quantities of code. Some of these functions could be moved into a library, and probably should be. I leave that as an exercise for you.
You use FirstHit, typically, right after using HitCount. To produce the line "You are visitor 123 since Fri 23 Nov 1994 at 01:13" on the Purveyor server, your source would look like this:
You are visitor <!--#exec exe="cgi-bin\hitcount" --> since <!--#exec exe="cgi-bin\firsthit" -->.
Listing 16.7 shows the source code. It's no more complicated than HitCount or HitCntth, and writes to the registry only the first time any page is hit. Thereafter, it just retrieves the information it wrote before.
Listing 16.7 firsthit.c: Source Code for the FirstHit Program // FIRSTHIT.C // This SSI program keeps track of the date and time // a page was first hit. Useful in conjunction with // HitCount or HitCntth. #include <windows.h> #include <stdio.h> #define ERROR_CANT_CREATE "FirstHit: Cannot open/create [ccc]registry key." #define ERROR_CANT_UPDATE "FirstHit: Cannot update registry key." #define FIRSTHIT "Software\\Greyware\\FirstHit\\Pages" #define sdatefmt "ddd dd MMM yyyy" void main(int argc, char * argv[]) { char szDate[128]; // number of hits for this page char szDefPage[80]; // system default pagename char *p; // generic pointer char *PageName; // pointer to this page's name long dwLength=127; // length of temporary buffer long dwType; // registry value type code long dwRetCode; // generic return code from API HKEY hKey; // registry key handle SYSTEMTIME st; // system time char szTmp[128]; // temporary string storage // Determine where to get the page name. A command- // line argument overrides the SCRIPT_NAME variable. if ((argc==2) && ((*argv[1]=='/') | (*argv[1]=='\\'))) PageName = argv[1]; else PageName = getenv("SCRIPT_NAME"); // If invoked from without SCRIPT_NAME or args, die if (PageName==NULL) { printf("FirstHit 1.0.b.960121\n" "Copyright (c) 1995,96 Greyware " "Automation Products\n\n" "Documentation available online from " "Greyware's Web server:\n" "http://www.greyware.com/" "greyware/software/freeware.htp\n\n"); } else { // Open the registry key dwRetCode = RegOpenKeyEx ( HKEY_LOCAL_MACHINE, FIRSTHIT, 0, KEY_EXECUTE, &hKey); // If open failed because key doesn't exist, // create it if ((dwRetCode==ERROR_BADDB) || (dwRetCode==ERROR_BADKEY) || (dwRetCode==ERROR_FILE_NOT_FOUND)) dwRetCode = RegCreateKey( HKEY_LOCAL_MACHINE, FIRSTHIT, &hKey); // If couldn't open or create, die if (dwRetCode != ERROR_SUCCESS) { strcpy(szDate,ERROR_CANT_CREATE); } else { // Get the default page name dwLength = sizeof(szDefPage); dwRetCode = RegQueryValueEx ( hKey, "(default)", 0, &dwType, szDefPage, &dwLength); if ((dwRetCode == ERROR_SUCCESS) && (dwType == REG_SZ) && (dwLength > 0)) { szDefPage[dwLength] = '\0'; } else { strcpy(szDefPage,"default.htm"); } // If current page uses default page name, // strip the page name _strlwr(PageName); p = strrchr(PageName,'/'); if (p==NULL) p = strrchr(PageName,'\\'); if (p) { p++; if (stricmp(p,szDefPage)==0) *p = '\0'; } // Get this page's information dwLength = sizeof(szDate); dwRetCode = RegQueryValueEx ( hKey, PageName, 0, &dwType, szDate, &dwLength); if ((dwRetCode == ERROR_SUCCESS) && (dwType == REG_SZ) && (dwLength >0)) { szDate[dwLength] = '\0'; } else { GetLocalTime(&st); GetDateFormat( 0, 0, &st, sdatefmt, szTmp, sizeof(szTmp)); sprintf( szDate, "%s at %02d:%02d", szTmp, st.wHour, st.wMinute); // Write the new value back to the // registry dwRetCode = RegOpenKeyEx ( HKEY_LOCAL_MACHINE, FIRSTHIT, 0, KEY_SET_VALUE, &hKey); if (dwRetCode==ERROR_SUCCESS) { dwRetCode = RegSetValueEx( hKey, PageName, 0, REG_SZ, szDate, strlen(szDate)); dwRetCode = RegCloseKey(hKey); } else { strcpy(szDate,ERROR_CANT_UPDATE); } } // Close the registry key dwRetCode = RegCloseKey(hKey); } printf("%s",szDate); } fflush(stdout); return; }
LastHit is yet another Windows NT SSI program. It tracks visitor information (date, time, IP number, and browser type). Like FirstHit, LastHit uses the same registry scheme as HitCount or HitCntth, but it stores its information in its own key. You have to set the (default) page name here, too, if it's something other than default.htm. The proper key is
HKEY_LOCAL_MACHINE \Software \Greyware \LastHit \Pages
LastHit isn't really related to HitCount or FirstHit, other than by its common code and its nature as an SSI program. LastHit tracks and displays information about the last visitor to a page. Each time the page is hit, LastHit displays the information from the previous hit and then writes down information about the current caller for display next time.
The source code for LastHit is just a little more complicated than FirstHit's, as listing 16.8 shows. It actually uses a subroutine. If nothing else, these programs should demonstrate just how easily SSI lets you create dynamic documents. There's no rocket science here.
Listing 16.8 lasthit.c: Source Code for the LastHit Program // LASTHIT.C // This SSI program tracks visitors to a page, remembering // the most recent for display. #include <windows.h> #include <stdio.h> #define ERROR_CANT_CREATE "LastHit: Cannot open/create [ccc]registry key." #define ERROR_CANT_UPDATE "LastHit: Cannot update registry key." #define LASTHIT "Software\\Greyware\\LastHit\\Pages" // This subroutine builds the info string about the // current caller. Hence the name. It uses a pointer // to a buffer owned by the calling routine for output, // and gets its information from the standard SSI // environment variables. Since "standard" is almost // meaningless when it comes to SSI, the program // gracefully skips anything it can't find. void BuildInfo(char * szOut) { SYSTEMTIME st; char szTmp[512]; char *p; szOut[0]='\0'; GetLocalTime(&st); GetDateFormat(0, DATE_LONGDATE, &st, NULL, szTmp, 511); sprintf(szOut, "Last access on %s at %02d:%02d:%02d", szTmp, st.wHour, st.wMinute, st.wSecond); p = getenv("REMOTE_ADDR"); if (p!=NULL) { szTmp[0] = '\0'; sprintf(szTmp,"<br>Caller from %s",p); if (szTmp[0] != '\0') strcat(szOut,szTmp); } p = getenv("REMOTE_HOST"); if (p!=NULL) { szTmp[0] = '\0'; sprintf(szTmp," (%s)",p); if (szTmp[0] != '\0') strcat(szOut,szTmp); } p = getenv("HTTP_USER_AGENT"); if (p!=NULL) { szTmp[0] = '\0'; sprintf(szTmp,"<br>Using %s",p); if (szTmp[0] != '\0') strcat(szOut,szTmp); } } void main(int argc, char * argv[]) { char szOldInfo[512]; char szNewInfo[512]; char szDefPage[80]; char *p; char *PageName; // pointer to this page's name long dwLength=511; // length of temporary buffer long dwType; // registry value type code long dwRetCode; // generic return code from API HKEY hKey; // registry key handle // Determine where to get the page name. A command- // line argument overrides the SCRIPT_NAME variable. if ((argc==2) && ((*argv[1]=='/') | (*argv[1]=='\\'))) PageName = argv[1]; else PageName = getenv("SCRIPT_NAME"); // If invoked from without SCRIPT_NAME or args, die if (PageName==NULL) { printf("LastHit 1.0.b.960121\n" "Copyright (c) 1995,96 Greyware " "Automation Products\n\n" "Documentation available online from " "Greyware's Web server:\n" "http://www.greyware.com/" "greyware/software/freeware.htp\n\n"); } else { // Build info for next call BuildInfo(szNewInfo); // Open the registry key dwRetCode = RegOpenKeyEx ( HKEY_LOCAL_MACHINE, LASTHIT, 0, KEY_EXECUTE, &hKey); // If open failed because key doesn't exist, //create it if ((dwRetCode==ERROR_BADDB) || (dwRetCode==ERROR_BADKEY) || (dwRetCode==ERROR_FILE_NOT_FOUND)) dwRetCode = RegCreateKey( HKEY_LOCAL_MACHINE, LASTHIT, &hKey); // If couldn't open or create, die if (dwRetCode != ERROR_SUCCESS) { printf (ERROR_CANT_CREATE); } else { // Get the default page name dwLength = sizeof(szDefPage); dwRetCode = RegQueryValueEx ( hKey, "(default)", 0, &dwType, szDefPage, &dwLength); if ((dwRetCode == ERROR_SUCCESS) && (dwType == REG_SZ) && (dwLength > 0)) { szDefPage[dwLength] = '\0'; } else { strcpy(szDefPage,"default.htm"); } // If current page uses default page name, // strip the page name _strlwr(PageName); p = strrchr(PageName,'/'); if (p==NULL) p = strrchr(PageName,'\\'); if (p) { p++; if (stricmp(p,szDefPage)==0) *p = '\0'; } // Get this page's information dwLength = sizeof(szOldInfo); dwRetCode = RegQueryValueEx ( hKey, PageName, 0, &dwType, szOldInfo, &dwLength); if ((dwRetCode == ERROR_SUCCESS) && (dwType == REG_SZ) && (dwLength >0)) { szOldInfo[dwLength] = '\0'; } else { strcpy (szOldInfo, szNewInfo); } // Close the registry key dwRetCode = RegCloseKey(hKey); // Print this page's info printf("%s",szOldInfo); // Write the new value back to the registry dwRetCode = RegOpenKeyEx ( HKEY_LOCAL_MACHINE, LASTHIT, 0, KEY_SET_VALUE, &hKey); if (dwRetCode==ERROR_SUCCESS) { dwRetCode = RegSetValueEx( hKey, PageName, 0, REG_SZ, szNewInfo, strlen(szNewInfo)); dwRetCode = RegCloseKey(hKey); } else { printf(ERROR_CANT_UPDATE); } } } fflush(stdout); return; }
In Chapter 15, "Generating HTML Documents in Real Time," you examined the issue of how real-time programs can affect server performance. SSI doesn't bring anything new to the table in that regard.
See "Server Performance Considerations," p. xxx, for more information on how CGI and SSI affect server performance.
In general, SSI programs tend to be less of a drain on the server than full-fledged CGI. SSI programs are usually small and simple-they only have to produce text, after all-and seldom do much of any significance with files. Page hit counters that rely on generating inline graphics put far more stress on a server than an SSI counter does.
Still, a dozen-or a hundred-instances of your SSI program running at once could steal memory and processor slices needed by the server to satisfy client requests. Imagine that you are Webmaster of a large site. On each of the 250 pages for which you're responsible, you include not one, but all the SSI examples in this chapter. Each page hit would produce seven separate processes, each of which has to jostle with the others in resource contention. In a worst-case scenario, with 100 pages being hit a minute, you would have 700 scripts running each minute, 10 or more simultaneously at all times. This kind of load would seriously affect your server's capability to do anything else-like serve up pages to those users who stop by to see your wonderful SSI handiwork.
You don't find much difference among platforms either. Some SSI utilities run more efficiently in UNIX, others work better under Windows NT, and in the end, everything balances out. Programs that use the NT registry have a distinct advantage over programs that hit the file system to save data. The registry functions like a back-end database-always open, always ready for queries and updates. The code for handling concurrency is already loaded and running as part of the operating system, so your program can be smaller and tighter. On the other hand, pipes and forks tend to run more efficiently under some flavors of UNIX, so if your program does that sort of thing, you are better off in that environment.
In short, don't pick your server operating system based on what SSI programs you plan to run. If you run into performance problems, adding RAM will usually give your server the extra head room it needs to handle the load imposed by SSI.
For technical support for our books and software contact support@mcp.com
Copyright ©1996, Que Corporation