Copyright ©1996, Que Corporation. All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this book for any purpose other than your own personal use is a violation of United States copyright laws. For information, address Que Corporation, 201 West 103rd Street, Indianapolis, IN 46290 or at support@mcp .com.

Notice: This material is excerpted from Special Edition Using CGI, ISBN: 0-7897-0740-3. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.

CHAPTER 27-Understanding CGI Security Issues

After you test and debug your CGI script and it runs successfully for the first time, you'll probably be tempted to put it up immediately on your Web site. You're understandably proud of what you've done and want the world to see your work.

This impulse, although tempting, can be dangerous. Just as there are vandals and saboteurs in the real world, the Web is populated by no end of people who would like nothing more than to crash your site, purely for the malicious pleasure of it. Though the percentage of surfers who visit your Web site with evil intent will be a tiny fraction of the total, it takes only one person with the wrong motive and the right opportunity to cause you a lot of trouble.

The vindictive hacker is a familiar figure in computer lore-especially on the Internet-and although most Web servers are programmed to protect against his bag of tricks, a single security mistake in a CGI script can give him complete access to your machine: your password file, your private data, anything.

By following a few simple rules and by being constantly on the alert-even paranoid-you can make your CGI scripts proof against attack, giving you all their advantages and still allowing yourself a good night's sleep.

In this chapter, you'll learn:

Scripts vs. Programs

When you sit down to begin writing a CGI script, there are several considerations that go into your decision about which language to use. One of those considerations should be security.

Shell scripts, Perl programs, and C executables are the most common forms that a CGI script takes, and each has advantages and disadvantages when security is taken into account. None is the best, though-depending on other considerations (such as speed and reuse)-each has a place.

Shell scripts are usually used for small, quick, almost throwaway CGI programs, and because of this, are often written without security in mind. This carelessness can result in gaping holes that anybody with even just general knowledge of your system can walk right through.

Though shell CGI programs are often the easiest to write-to even just throw together-it can be difficult to fully control them since they usually do most of their work by executing other, external programs. This can lead to several possible pitfalls, including your CGI program instantly inheriting any of the security problems that any program it uses has.

For instance, the common UNIX utility awk has some fairly restrictive limits on the amount of data it can handle. If you use awk in a CGI script, your program now has all those limits as well.

Perl is a step up from shell scripts. It has many advantages for CGI programming and is fairly secure, just in itself. But Perl can offer CGI authors just enough flexibility and peace of mind that they might be lulled into a false sense of security.

For example, Perl is interpreted. This means that it's actually compiled and executed in a single step each time it's invoked. This makes it easier for bad user data to be included as part of the code, misinterpreted, and the cause of an abort.

Finally, there's C. C is rapidly becoming the de facto standard application development language, and almost all of UNIX and Windows NT are developed in it. This may seem comforting from the perspective of security until you realize that several C security problems are well known because of this popularity and can be exploited fairly easily.

For instance, C is very bad at string handling. It does no automatic allocation or clean up, leaving coders to handle everything on their own. A lot of C programmers, when dealing with strings, will simply set up a predefined space and hope that it will be big enough to handle whatever the user enters. This, of course, can be very dangerous. Robert T. Morris, the author of the infamous Internet Worm, exploited such an assumption in attacking the C-based sendmail program, overflowing a buffer to alter the stack and gain unauthorized access.

Of course, shell scripts, Perl, and C are far from the only languages that CGI scripts can be written in. In fact, any computer language that can interact with the Web server in a predefined way can be used to code CGI programs. With UNIX and Windows NT servers, data is delivered to scripts through environment variables and standard in (stdin), so any language that can read from these two sources and write to standard out (stdout) can be used to create CGI: awk, FORTRAN, C++, BASIC, COBOL. Windows programmers can use the popular Visual Basic, meaning that experienced VB coders don't need to learn a new language. The Macintosh uses AppleEvents and AppleScript to communicate with CGI programs, so any language that can read and write with them can be used.

But shell scripts (no matter which of the several possible shells you may use), Perl, and C remain the most popular. This doesn't mean that you have to use them, most libraries (and the most tested, most secure) will be written in these three languages. If you have a choice for your CGI programming, you could do worse than to follow those that came before you.

Trust No One

Almost all CGI security holes come from interaction with the user. By accepting input from an outside source, a simple, predictable CGI program suddenly takes on any number of new dimensions, each of which might possibly have the smallest crack through which a hacker can slip. It is interaction with the user-through forms or file paths-that give CGI scripts their power, but also make them the most potentially dangerous part of running a Web server.

Writing secure CGI scripts is largely an exercise in creativity and paranoia. You must be creative to think of all the ways that a user, either innocently or otherwise, can send you data that has the potential to cause trouble. And you must be paranoid because, somehow, they will try every one of them.

Two Roads to Trouble

When users log on to your Web site and begin to interact with it, they can cause you headaches in two ways. One is by not following the rules, by bending or breaking every limit or restriction you've tried to build into your pages; the other is by doing just what you've asked them to do.

Most CGI scripts act as the back end to HTML forms, processing the information entered by users to provide some sort of customized output. Since this is the case, most CGI scripts are written to expect data in a very specific format. They anticipate input from the user to match the form that should have collected and sent the information. This, however, doesn't always happen. A user can get around these predefined formats in many ways, sending your script seemingly random data. Your CGI programs must be prepared for it.

Secondly, users can send a CGI script exactly the type of data it expects, with each field in the form filled in, in the format you expect. This type of submission could be from an innocent user interacting with your site as you intended, or it could be from a malevolent hacker using his knowledge of your operating system and Web server software to take advantage of common CGI programming errors. These attacks, where everything seems fine, are the most dangerous and the hardest to detect. But the security of your Web site depends on preventing them.

Don't Trust Form Data

One of the most common security mistakes made in CGI programming is to trust the data that has been passed to your script from a form. Users are an unruly lot, and they're likely to find the handful of ways to send data that you never expected-that you thought was impossible. All your scripts must take this into account. For instance, each of the following situations-and many more like them-is possible:

Where Bad Data Comes From

These situations can come about in several ways-some innocent, some not. For instance, your script could receive data that it doesn't expect because somebody else wrote a form (that requests input completely different from yours), and accidentally pointed the FORM ACTION to your CGI script. Perhaps they used your form as a template and forgot to edit the ACTION URL before testing it. This would result in your script getting data that it has no idea what to do with, possibly causing unexpected-and dangerous-behavior.

The following code implements a form that sends garbage to the CGI script that searches the Yahoo database. The script is well designed and secure because it ignores the input it doesn't recognize.

<FORM METHOD="POST" ACTION="http://search.yahoo.com/bin/search">
      Enter your name, first then last:
      <INPUT TYPE="TEXT" NAME="first">
      <INPUT TYPE="TEXT" NAME="last">
</FORM>

Perhaps the user might have accidentally (or intentionally) edited the URL to your CGI script. When a browser submits form data to a CGI program, it simply appends the data entered into the form onto the CGI's URL (for GET METHODs), and as easily as the user can type a Web page address into his browser, he can freely modify the data being sent to your script.

For example, when you click the Submit button on a form, Netscape will put a long string in its Location field that's made up of the CGI's URL followed by a string of data, most of which will look like the NAMEs and VALUEs defined in the form. If you want, you can freely edit the contents of the Location field and change the data to whatever you want: add fields that the form didn't have; extend text data limited by the MAXLENGTH option; or almost anything. Figure 27.1 shows the URL that a CGI script expects submitted from a form.

Fig 27.1

When the Submit button is clicked, a browser encodes the information and sends it to a CGI script.

Figure 27.2 shows the same URL after it's modified by a user. The CGI script will still be called, but now it will receive unexpected data. To be fully secure, the script should be written to recognize this input as bad data and reject it.

Fig 27.2

A user can modify the data, however, sending the CGI script input it never anticipated.

Finally, an ambitious hacker might write a program that connects to your server over the Web and pretends to be a Web browser. This program, though, could do things that no true Web browser would do, such as send a hundred megabytes of data to your CGI script. What would a CGI script do if it didn't limit the amount of data it read from a POST METHOD because it assumed that the data came from a small form? It would probably crash, and maybe crash in a way that would allow access to the person who crashed it.

Fighting Bad Form Data

You can fight the unexpected input that can be submitted to your CGI scripts in several ways. You should use any or all of them when writing CGI.

First, your CGI script should set reasonable limits on how much data it will accept, both for the entire submission and for each NAME/VALUE pair in the submission. If your CGI script reads the POST METHOD, for instance, check the size of the CONTENT_LENGTH environment variable to make sure that it's something that you can reasonable expect. If the only data your CGI script is designed to accept is a person's first name, it might be a good idea to return an error if CONTENT_LENGTH is more than, say, 100 bytes. No reasonable first name will be that long, and by imposing the limit, you've protected your script from blindly reading anything that gets sent to it.

By happy coincidence, you don't have to worry about limiting the data submitted through the GET METHOD. GET is self-limiting and won't deliver more than about one kilobyte of data to your script. The server automatically limits the size of the data placed into the QUERY_STRING environment variable, which is how GET sends information to a CGI program.
Of course, hackers can easily get around this built-in limit simply by changing the METHOD of your FORM from GET to PUT. At the very least, your program should check that data was submitted using the method you expect; at most, it should handle both methods correctly and safely.
See  "The METHOD Attribute," p. xxx, for more information about GET and PUT. 
See  "Request-Specific Environment Variables," p. xxx, for information about how to determine the method of a request. 

Next, make sure that your script knows what to do if it receives data that it doesn't recognize. If, for example, a form asks that a user select one of two radio buttons, the script shouldn't assume that just because one isn't clicked, the other is. The following Perl code makes this mistake.

if ($form_Data{"radio_choice"} eq "button_one")
{
      # Button One has been clicked
}
else
{
      # Button Two has been clicked
}

This code makes the mistake of assuming that because the form offered only two choices and the first one wasn't selected, the second one must have been. This is not necessarily true. Although the preceding example is pretty innocuous, in some situations such assumptions can be dangerous.

Your CGI script should anticipate situations such as these and handle them accordingly. An error can be printed, for instance, if some unexpected or "impossible" situation arises, as in the following:

if ($form_Data{"radio_choice"} eq "button_one")
{
      # Button One selected
}
elsif ($form_Data{"radio_choice"} eq "button_two")
{
      # Button Two selected
}
else
{
      # Error
}

By adding the second if statement-to explicitly check that "radio_choice" was, in fact, "button_two"-the CGI script has become more secure; it no longer makes assumptions.

Of course, an error may not be what you want your script to generate in these circumstances. Overly picky scripts that validate every field and produce error messages on even the slightest unexpected data can turn users off. Having your CGI script recognize unexpected data, throw it away, and automatically select a default is a possibility too.

The balance between safety and convenience for the user is a careful one. Don't be afraid to consult with your users to find out what works best for them.

For instance, the following is C code that checks text input against several possible choices and sets a default if it doesn't find a match. This can be used to generate output that might better explain to the user what you were expecting.

if ((strcmp(help_Topic,"how_to_order.txt")) &&
 (strcmp(help_Topic,"delivery_options.txt")) &&
 (strcmp(help_Topic,"complaints.txt")))
{
      strcpy(help_Topic,"help_on_help.txt");
}

On the other hand, your script might try to do users a favor and correct any mistakes rather than simply send an error or select a default. If a form asked users to enter the secret word, your script could automatically strip off any white-space characters from the input before doing the comparison. The following is a Perl fragment that does this.

$user_Input =~ s/\s//;
# Remove white space by replacing it with an empty string
if ($user_Input eq $secret_Word)
{
      # Match!
}

Tip
Although it's nice to try to catch the user's mistakes, don't try to do too much. If your corrections aren't really what users wanted, they'll just be annoyed.

Finally, you might choose to go the extra mile and have your CGI script handle as many different forms of input as it can. Although you can't possibly anticipate everything that can be sent to a CGI program, there are often several common ways to do a particular thing, and you can check for each.

For example, just because the form you wrote uses the POST METHOD to submit data to your CGI script, that doesn't mean that the data will come in that way. Rather than assume that the data will be on standard in (stdin) where you're expecting it, you could check the REQUEST_METHOD environment variable to determine whether the GET or POST METHOD was used and read the data accordingly. A truly well-written CGI script will accept data no matter what METHOD was used to submit it and will be made more secure in the process. Listing 27.1 shows an example in Perl.

Listing 27.1 CGI_READ.PL: A Robust Reading Form Input
# Takes the maximum length allowed as a parameter
# Returns 1 and the raw form data, or "0" and the error text
sub cgi_Read
{
      local($input_Max) = 1024 unless $input_Max = $_[0];
      local($input_Method) = $ENV{'REQUEST_METHOD'};

      # Check for each possible REQUEST_METHODs
      if ($input_Method eq "GET")
      {
            # "GET"
            local($input_Size) = length($ENV{'QUERY_STRING'});

            # Check the size of the input
            if ($input_Size > $input_Max)
            {
                  return (0,"Input too big");
            }

            # Read the input from QUERY_STRING
            return (1,$ENV{'QUERY_STRING'});
      }
      elsif ($input_Method eq "POST")
      {
            # "POST"
            local($input_Size) = $ENV{'CONTENT_LENGTH'};
            local($input_Data);

            # Check the size of the input
            if ($input_Size > $input_Max)
            {
                  return (0,"Input too big");
            }

            # Read the input from stdin
            unless (read(STDIN,$input_Data,$input_Size))
            {
                  return (0,"Could not read STDIN");
            }

            return (1,$input_Data);
      }

      # Unrecognized METHOD
      return (0,"METHOD not GET or POST");
}

Tip
Many existing CGI programming libraries already offer good built-in security features. Rather than write your own routines, you may want to rely on some of the well-known, publicly available functions.
See  "Common Perl CGI Libraries," p. xxx, for more information about free CGI libraries. 

To summarize, your script should make no assumptions about the form data that it receives. You should expect the unexpected-as much as that's a contradiction in terms-and handle it in some way. Test it in as many ways as possible before you use it; reject bad input and print an error; automatically select a default if something is wrong or missing; even try to decode the input into something that makes sense to your program. Which path you choose will depend on how much effort and time you want to spend, but never blindly accept anything that's passed to your CGI script.

Don't Trust Path Data

Another type of data the user can alter is the PATH_INFO server environment variable. This variable is filled with any path information that follows the script's file name in a CGI URL. For instance, if foobar.sh is a CGI shell script, the URL will cause /extra/path/info to be placed in the PATH_INFO environment variable when foobar.sh is run.

If you use this PATH_INFO environment variable, you must be careful to completely validate its contents. Just as form data can be altered in any number of ways, so can PATH_INFO-accidentally or on purpose. A CGI script that blindly acts on the path file specified in PATH_INFO can allow malicious users to wreak havoc on the server.

For instance, if a CGI script is designed to simply print out the file that's referenced in PATH_INFO, a user who edits the CGI URL will be able to read almost any file on your computer, as in the following script:

#!/bin/sh

# Send the header
echo "Context-type: text/html"
echo ""

# Wrap the file in some HTML
#!/bin/sh
echo "<HTML><HEADER><TITLE>File</TITLE></HEADER><BODY>"
echo "Here is the file you requested:<PRE>\n"
cat $PATH_INFO
echo "</PRE></BODY></HTML>"

Although this script works fine if the user is satisfied to click only predefined links-say, this site-a more creative (or spiteful) user could use it to receive any file on your server. If he were to jump to http://www.server.com/cgi-bin/foobar.sh/etc/passwd, the preceding script would happily return your machine's password file-something you do not want to happen.

A much safer course is to use the PATH_TRANSLATED environment variable. It automatically appends the contents of PATH_INFO to the root of your server's document tree, meaning that any file specified by PATH_TRANSLATED is probably already accessible to browsers and safe.

In one case, however, files that may not be accessible through a browser can be accessed if PATH_TRANSLATED is used within a CGI script. You should be aware of it and its implications.

The .htaccess file, which can exist in each subdirectory of a document tree, controls who has access to the particular files in that directory. It can be used to limit the visibility of a group of Web pages to company employees, for example.

Whereas the server knows how to interpret .htaccess, and thus knows how to limit who can and who can't see these pages, CGI scripts don't. A program that uses PATH_TRANSLATED to access arbitrary files in the document tree may accidentally override the protection provided by the server.

Everything Seems OK, But...

Now that you've seen several ways users can provide your CGI script with data that it didn't expect and what you can do about it, the larger issue remains of how to validate legitimate data that the user has submitted.

In most cases, correctly but cleverly written form submissions can cause you more problems than out-of-bounds data. It's easy to ignore nonsense input, but determining whether legitimate, correctly formatted input will cause you problems is a much bigger challenge.

Because CGI scripts have the flexibility to do almost anything your computer can do, a small crack in their security can be exploited endlessly-and that's where the greatest danger lies.

Handling File Names

File names, for example, are simple pieces of data that may be submitted to your CGI script and cause endless amounts of trouble if you're not careful (see fig. 27.3).

Fig 27.3

Depending on how well the CGI script is written, the Webmaster for this site could get in big trouble.

Any time you try to open a file based on a name supplied by the user, you must rigorously screen that name for any number of tricks that can be played. If you asked the user for a file name and then try to open whatever was entered, you could be in big trouble.

For instance, what if the user entered a name that has path elements in it, such as directory slashes and double dots? Although you expect a simple file name-say, file.txt-you could end up with /file.txt or ../../../file.txt. Depending on how your Web server is installed and what you do with the submitted file name, you could be exposing any file on your system to a clever hacker.

Further, what if the user enters the name of an existing file or one that's important to the running of the system? What if the name entered is /etc/passwd or C:\WINNT\SYSTEM32\KRNL32.DLL? Depending on what your CGI script does with these files, they may be sent out to the user or overwritten with garbage.

Under Windows 95 and Windows NT, if you don't screen for the backslash character (\), you might allow Web browsers to gain access to files that aren't even on your Web machine through Universal Naming Convention file names. If the script that's about to run in figure 27.4 doesn't carefully screen the file name before opening it, it might give the Web browser access to any machine in the domain or workgroup.

Fig 27.4

Opening a UNC file name is one possible security hole that gives hackers access to your entire network.

What might happen if the user puts an illegal character in a file name? Under UNIX, any file name beginning with a period (.) will become invisible. Under Windows, both slashes (/ and \) are directory separators. It's possible to write a Perl program carelessly and allow external programs to be executed when you thought you were only opening a file, if the file name begins with the pipe (|). Even control characters (the Escape key or the Return key, for instance) can be sent to you as part of file names if the user knows how. (See the earlier section, "Where Bad Data Comes From.")

Worse yet, in shell script, the semicolon ends one command and starts another. If your script is designed to cat the file the user enters, a user might enter file.txt;rm -rf / as a file name, causing file.txt to be returned and then the entire hard disk to be erased, without confirmation.

In with the Good, Out with the Bad

To avoid all these problems and close all the potential security holes they open, you should screen every file name the user enters. You must make sure that the input is what you expect.

The best way to do this is to compare each character of the entered file name against a list of acceptable characters and return an error if they don't match. This turns out to be much safer than trying to maintain a list of all the illegal characters and compare against that-it's too easy to accidentally let something slip through.

Listing 27.2 is an example of how to do this comparison in Perl. It allows any letter of the alphabet (upper- or lowercase), any number, the underscore, and the period. It also checks to make sure that the file name doesn't start with a period. Thus, this fragment doesn't allow slashes to change directories; semicolons to put multiple commands on one line; or pipes to play havoc with Perl's open() call.

Listing 27.2 Making Sure That All Characters Are Legal
if (($file_Name =~ /[^a-zA-Z_\.]/) || ($file_Name =~ /^\./))
{
      # File name contains an illegal character or starts with a period
}

Tip
When you have a commonly used test, such as the code in listing 27.2, it's a good idea to make it into a subroutine, so you can call it repeatedly. This way, you can change it in only one place in your program if you think of an improvement.
Continuing that thought, if the subroutine is used commonly among several programs, it's a good idea to put it into a library so that any improvements can be instantly inherited by all your scripts.
Although the code in listing 27.2 filters out most bad file names, your operating system may have restrictions it doesn't cover. Can a file name start with a digit, for instance? Or with an underscore? What if the file name has more than one period, or if the period is followed by more than three characters? Is the entire file name short enough to fit within the restrictions of the file system?
You must constantly be asking yourself these sort of questions. The most dangerous thing you can do when writing CGI scripts is rely on the users following instructions. They won't. It's your job to make sure that they don't get away with it.

Handling HTML

Another type of seemingly innocuous input that can cause you endless trouble is getting HTML when you request text from the user. Listing 27.3 is a Perl fragment simply customizes a greeting to whomever has entered a name in the $user_Name variable, for example, John Smith (see fig. 27.5).

Listing 27.3 A Script That Sends a Customized Greeting
print("<HTML><TITLE>Greetings!</TITLE><BODY>\n");
print("Hello, $user_Name!  It's good to see you!\n");
print("</BODY></HTML>\n");
*** insert 26fig06.pcx; please border figure


Fig 27.5

When the user enters what you requested, everything works well.

But imagine if, rather than enter just a name, the user types <HR><H1><P ALIGN="CENTER">John Smith</P></H1><HR>. The result would be figure 27.6-probably not what you wanted.

Fig 27.6

Entering HTML when a script expects plain text can change a page in unexpected ways.

But entering HTML isn't just a way for smart alecks to change the way a page appears. Imagine if a hacker entered <IMG SRC="/secret/project/cutekid.gif"> when you requested the user's name. Again, if the code in listing 27.3 were part of a CGI script with this HTML in the $user_Name variable, your Web server would happily show the hacker your secret adorable toddler picture! Figure 27.7 is an example.

Fig 27.7

Allowing HTML to be entered can be dangerous. Here a secret file is shown instead of the user's name.

And even more dangerous than entering simple HTML to change pages or access pictures, a malicious hacker might enter a server-side include directive instead.

See  "Common SSI Commands," p. xxx, for information about what server-side includes can do. 

If your Web server is configured to obey server-side includes, a user could enter <!-- #include file="/secret/project/plan.txt" --> instead of his name to see the complete text of your secret plans. Or he could enter <!-- #include file="/etc/passwd" --> to get your machine's password file. And probably worst of all, a hacker might type <!-- #exec cmd="rm -rf /" --> instead of his name, and the innocent code in listing 27.3 would proceed to delete almost everything on your hard disk.

Because of how they can be misused, server-side includes are very often disabled. Although much more information is available in Chapter 16, "Using Server-Side Includes," you might want to consider this option to truly secure your site against this type of attack.

But suppose for a moment that none of this bothers you. Even if you have server-side includes turned off, and even if you don't care that users might be able to see any picture on your hard disk or that they can change the way your pages look, there's still trouble that can be caused-and not just for you, but for your other users as well.

One common use for CGI scripts is the guest book: People who visit your site can sign in and let others know that they've been there. Normally, a user simply enters his name, which appears on a list of visitors.

But what if The last signee!<FORM><SELECT> was entered as the user's name? The <SELECT> tag would cause the Web browser to ignore everything between it and a nonexistent </SELECT>, including any names that were added to the list later. Even though ten people signed the guest book shown in figure 27.8, only the first three appear because the third name contains a <FORM> and a <SELECT> tag.

Fig 27.8

Because the third signee used HTML tags in his name, nobody after him will show up.

There are two solutions to the problem of the user entering HTML rather than flat text:

$user_Input =~ s/<>//g;
$user_Input =~ s/</&lt;/g;
$user_Input =~ s/>/&gt;/g;

Handling External Processes

Finally, how your CGI script interfaces user input with any external processes is another area where you must be ever vigilant. Because executing a program outside of your CGI script means that you have no control over what it does, you must do everything you can to validate the input you send to it before the execution begins.

For instance, shell scripts often make the mistake of concatenating a command-line program with form input, then executing them together. This works fine if the user has entered what you expected, but additional commands may be snuck in and illegally executed.

The following is an example of a script that commits this error.

FINGER_OUTPUT=`finger $USER_INPUT`
echo $FINGER_OUTPUT

If the user politely enters the e-mail address of a person to finger, everything works as it should. But if he enters an e-mail address followed by semicolon and another command, that command will be executed as well. If the user enters webmaster@www.server.com;rm -rf /, you're in considerable trouble.

Even if a hidden command isn't snuck into user data, innocent input may give you something you don't expect. The following line, for instance, will give an unexpected result-a listing of all the files in the directory-if the user input is an asterisk.

echo "Your input: " $USER_INPUT

When sending user data through the shell, as both of these code snippets do, it's a good idea to screen it for shell meta-characters-things that will invoke behavior that you don't expect.

Such characters include the semicolon (which allows multiple commands on one line), the asterisk and the question mark (which perform file globbing), the exclamation point (which, under csh, references running jobs), the back quote (which executes an enclosed command), and so on. Like filtering file names, maintaining a list of allowable characters is often easier than trying to catch each that should be disallowed. The following Perl fragment validates an e-mail address:

if ($email_Address ~= /[^a-zA-Z0-9_\-\+\@\.])
{
      # Illegal character!
}
else
{
      system("finger $email_Address");
}

If you decide that you must allow shell meta-characters in your input, there are ways to make their inclusion safer-and ways that don't actually accomplish anything. Although you may be tempted to simply put quotation marks around unvalidated user input to prevent the shell from acting on special characters, this almost never works. Look at the following:

echo "Finger information:<HR><PRE>"
finger "$USER_INPUT"
echo "</PRE>

Although the quotation marks around $USER_INPUT will prevent the shell from interpreting, say, an included semicolon that would allow a hacker to simply piggyback a command, this script still has several severe security holes. For instance, the input might be `rm -rf /`, with the back quotes causing the hacker's command to be executed before finger is even considered.

A better way to handle special characters is to escape them so that the shell simply takes their values without interpreting them. By escaping the user input, all shell meta-characters are ignored and treated instead as just more data to be passed to the program.

The following line of Perl code does this for all non-alphanumeric characters.

$user_Input =~ s/([^w])/\\\1/g;

Now, if this user input were appended to a command, each character-even the special characters-would be passed through the shell to finger.

But all told, validating user input-not trusting anything sent to you-will make your code easier to read and safer to execute. Rather than try to defeat a hacker after you're already running commands, give data the once-over at the door.


Handling Internal Functions
With interpreted languages, such as shell and Perl, the user can enter data that will cause your program to generate errors that aren't there if the data is correct. If user data is being interpreted as part of the programs execution, anything he enters must adhere to the rules of the language or cause an error.
For instance, the following Perl fragment may work fine or may generate an error, depending on what the user entered.
if ($search_Text =~ /$user_Pattern/)
{
      # Match!
}
If $user_Pattern is a correct grep expression, everything will work fine. But, if $user_Pattern is something illegal, Perl will fail, causing your CGI program to fail-possibly in an unsecure way.
To prevent this, in Perl at least, the eval() operator exists, which will evaluate an expression independently of actually executing it and return if it's valid Perl code. The following code is an improved version of the preceding code.
if (eval{$search_Text =~ /$user_Pattern/})
{
      if ($search_Text =~ /$user_Pattern/)
      {
            # Match!
      }
}
Unfortunately, most shells (including the most popular, /bin/sh) have no easy way to detect errors such as this one, which is another reason to avoid them.


When executing external program, you must also be aware of how the user input you pass to those programs will affect them. You may guard your own CGI script against hacker tricks, but it's all for naught if you blithely pass anything a hacker may have entered to external programs without understanding how those programs use that data.

For instance, many CGI scripts will send e-mail to a particular person containing data collected from the user by executing the mail program.

This can be very dangerous because mail has many internal commands, any of which could be invoked by user input. For instance, if you send text entered by the user to mail and that text has a line that starts with a tilde (~), mail will interpret the next character on the line as one of the many commands it can perform. ~r /etc/passwd, for example, will cause your machine's password file to be read by mail and sent off to whomever the letter is addressed to, perhaps even the hacker himself.

In an example such as this one, rather than use mail to send e-mail from UNIX machines, you should use sendmail, the lower-level mail program that lacks many of mail's features. But, of course, you should also be aware of sendmail's commands so those can't be exploited.

As a general rule, when executing external program, you should use the one that fits your needs a closely as possible without any frills. The less an external program can do, the less it can be tricked into doing.

Here's another problem with mail and sendmail: You must be careful that the address you pass to the mail system is a legal e-mail address. Many mail systems will treat an e-mail address starting with a pipe (|) as a command to be executed, opening a huge security hole for any hacker that enters such an address.
Again, always validate your data!

Another example of how you must know your external programs well to use them effectively is grep. Grep is a simple command-line utility that searches files for a regular expression, anything from a simple string to a complex sequence of characters. Most people will tell you that you can't get into much trouble with grep, but although grep may not be able to do much damage, it can be fooled, and how it can be fooled is illustrative. The following code is an example: It's supposed to perform a case-sensitive search for a user-supplied term among many files.

print("The following lines contain your term:<HR><PRE>");
$search_Term =~ s/([^w])/\\\1/g;
system("grep $search_Term /public/files/*.txt");
print("</PRE>");

This all seems fine, unless you consider what happens if the user enters -i. It's not searched for, but functions as a switch to grep, as would any input starting with a dash. This will cause grep to either hang while waiting for the search term to be typed into standard input, or to error out when anything after the -i is interpreted as extra switch characters. This, undoubtedly, isn't what you wanted or planned for. In this case it's not dangerous, but in others it might be.

Remember, there's no such thing as a harmless command, and each must be carefully considered from every angle.

In general, you should be as familiar as possible with every external program your CGI script executes. The more you know about the programs, the more you can do to protect them from bad data-both by screening that data and by disabling options or disallowing features.

External programs are often a quick, easy solution to many of the problems of CGI programming-they're tested, available, and versatile. But they can also be a wide open door through which a hacker that knows what he's doing can quietly stroll. You shouldn't be afraid of using them-often external programs are the only way to accomplish something from a CGI program-but you should be aware of the trouble they can cause.


Security Beyond Your Own
sendmail has an almost legendary history of security problems. Almost from the beginning, hackers have found clever ways to exploit sendmail and gain unauthorized access to the computers that run it.
But sendmail is hardly unique. Dozens-if not hundreds-of popular, common tools have security problems, with more being discovered each year.
The point is that it's not only the security of your own CGI script that you must worry about, but the security of all the programs your CGI script uses. Knowing sendmail's full range of documented capabilities is important, but perhaps more so is knowing what's not documented, probably because it wasn't intended.
Keeping up with security issues in general is a necessary step to maintain the ongoing integrity of your Web site. One of the easiest ways to do this is on Usenet, in the newsgroups comp.security.announce (where important information about computer security is broadcast) and comp.security.unix (where has a continuing discussion of UNIX security issues). A comprehensive history of security problems, including attack-prevention software, is available through the Computer Emergency Response Team (CERT) at ftp.cert.org.


Inside Attacks

Up to this point, you've considered only the people who browse your site through the Web-from thousands of miles away-as potential security risks. But another source of problems exists a lot closer to home.

A common mistake in CGI security is to forget local users. Although people browsing your site over the Web don't have access to local security considerations, such as file permissions and owners, local users of your Web server machine do, and you must guard against these threats even more than those from the Web. On most multiuser systems, such as UNIX, the Web server is run as just another program while the machine remains in use by any number of people doing any number of things. Just because someone works with you or attends your university doesn't mean that he can resist the temptation to start poking through your Web installation, causing trouble.

Local system security is a big subject and almost any reference on it will give you good tips on protecting the integrity of your machine from local users. As a general rule, if your system as a whole is safe, your Web site is safe too.

CGI Script User

Most Web servers are installed to run CGI scripts as a special user. This is the user that owns the CGI program while it runs, and the permission he is granted limits what the script will be able to do.

Under UNIX, the server itself usually runs as root (the superuser or administrator of the system) to allow it to use socket port 80 as the place where browsers communicate with it. (Only root is allowed to use the so-called "reserved" ports between 0 and 1023; all users may use the rest.) When the server executes a CGI program, most Web servers can be configured to run that program as a different user that the Web server itself-though not all are set up this way.

It's very dangerous to let your CGI scripts run as root! Your server should be set up to use an innocuous user, such as the commonly used nobody, to run CGI scripts. The less powerful the user, the less damage a runaway CGI script can do.

Setuid Dangers

You should also be aware if the setuid bit is set on your UNIX CGI scripts. This option, when enabled on an executable, will cause the program to run with the permissions of the user who owns the file, rather than the user who executed it. If the setuid bit is set on your CGI scripts, no matter what user the server runs programs as, it will execute with the permissions of the file's owner. This, of course, has major security implications-you may lose control over the user whose permissions your script runs with.

Fortunately, the setuid bit is easy to disable. Executing chmod a-s on all your CGI scripts will guarantee that it's turned off, and your programs will run with the permissions you intended.

Of course, in some situations you may want the setuid bit set-if your script needs to run as a specific user to access a database, for example. If this is the case, you should make doubly sure that the other file permissions on the program limit access to it to those users you intend.

'Community' Web Servers

Another potential problem with the single, common user that Web servers execute scripts as is that it's not always the case that a single human being is in control of the server. If many people share control of a server, each may install CGI scripts that run as-say-the nobody user. This allows any of these people to use a CGI program to gain access to parts of the machine that they may be restricted from, but that nobody is allowed to enter.

Probably the most common solution to this potential security problem is to restrict CGI control to a single individual. Although this may seem reasonable in limited circumstances, it's often impossible for larger sites. Universities, for example, have hundreds of students, each of whom wants to experiment with writing and installing CGI scripts.

Using CGIWrap

A better solution to the problem of deciding which user a script runs as when multiple people have CGI access is the CGIWrap program. CGIWrap, which is included on the CD that accompanies this book, is a simple wrapper that executes a CGI script as the user that owns the file instead of the user that the server specifies. This simple precaution leaves the script owner responsible for the damage it can do.

For instance, if the user "joanne" owns a CGI script that's wrapped in CGIWrap, the server will execute the script as user "joanne." In this way, CGIWrap acts like a setuid bit but has the added advantage of being controlled by the Web server rather than the operating system. That means that anybody who sneaks through any security holes in the script will be limited to whatever "joanne" herself can do-the files she can read and delete, the directories she can view, and so on.

Because CGIWrap puts CGI script authors in charge of the permissions for their own scripts, it can be a powerful tool not only to protect important files owned by others, but to motivate people to write secure scripts. The realization that only their files would be in danger can be a powerful persuader to script authors.

CGI Script Permissions

You should also be aware of which user the CGI scripts are owned by and the file permissions on the scripts themselves. The permissions on the directories that contain the scripts are also very important.

If, for example, the cgi-bin directory on your Web server is world-writable, any local user will be able to delete your CGI script and replace it with another. If the script itself is world-writable, this nefarious person will be able to modify the script to do anything.

Look at the following innocuous UNIX CGI script:

#!/bin/sh
# Send the header
echo "Content-type: text/html"
echo ""
# Send some HTML
echo "<HTML><HEADER><TITLE>Fortune</TITLE></HEADER>
echo "<BODY>Your fortune:<HR><PRE>"
fortune
echo "</BODY></HTML>"

Now imagine if the permissions on the script allowed an evil local user to change the program to the following:

#!/bin/sh
# Send the header
echo "Content-type: text/html"
echo ""
# Do some damage!
rm -rf /
echo "<HTML><TITLE>Got you!</TITLE><BODY>"
echo "<H1>Ha ha!</H1></BODY></HTML>"

The next user to access the script over the Web would cause huge amounts of damage, even though that person had done nothing wrong! Checking the integrity of user input over the Web is important, but even more so is making sure that the scripts themselves remain unaltered and unalterable!

Local File Security

Equally important is the integrity of the files that your scripts create on the local hard disk. After you feel comfortable that you've got a good file name from the Web user, how you actually go about using that name is also important. Depending on which operating system your Web server is running, permissions and ownership information can be stored on the file along with the data inside it.

UNIX, for instance, keeps track of file access permissions for the user that created the file, the group that user belongs to, and everybody else on the system. Windows NT uses a more complex system of access control lists, but accomplishes largely the same thing. Users of your Web server machine may be able to cause havoc depending on how these flags are set and what permissions are granted or reserved.

For instance, you should be aware of the permissions you give a file when you create it. Most Web server software sets the umask, or permission restrictions, to 0000, meaning that it's possible to create a file that anybody can read or write. Although the permissions on a file probably don't make any difference to people browsing on the Web, people with local access can take advantage of loose permissions to cause you and your users trouble.

Given that fact, you should always specify the most restrictive permissions that will allow your program to work when creating files.

This isn't only a good idea for CGI programs, but for all the code you write.

The simplest way to make sure that each file-open call has a set of minimum restrictions is to set your script's umask. umask() is a UNIX call that restricts permissions on every subsequent file creation. The parameter passed to umask() is a number that's "masked" against the permissions mode of any later file creation. An umask of 0022 will cause any file created to be writable only by the user, no matter what explicit permissions are given to the group and other users on the actual open.

But even with the umask set, you should create files with explicit permissions, just to make sure that they're as restrictive as possible. If the only program that will ever be accessing a file is your CGI script, only the users that your CGI program runs as should be given access to the file-permissions 0600. If another program needs to access the file, try to make the owner of that program a member of the same group as your CGI script so that only group permissions need to be set-permissions 0660. If you must give the world access to the file, make it so that the file can only be read, not written to-permissions 0644.

Use Explicit Paths

Finally, a local user can attack your Web server in one last way-by fooling it into running an external program that he wrote instead of what you specified in your CGI script. The following is a simple program that shows a Web surfer a bit of wisdom from the UNIX fortune command.

#!/bin/sh
# Send the header
echo "Content-type: text/html"
echo ""
# Send the fortune
echo "<HTML><HEADER><TITLE>Fortune</TITLE></HEADER><BODY>"
echo "You crack open the cookie and the fortune reads:<HR><PRE>"
fortune
echo "</PRE></BODY></HTML>"

This script seems harmless enough. It accepts no input from the user, so he can't play any tricks on it that way. Because it's run only by the Web server, the permissions on the script itself can be set to be very restrictive, preventing a trouble-minded local user from changing it. And if the permissions on the directory in which it resides are set correctly, there's not much that can go wrong, is there?

Of course there is. Remember, you've got to be paranoid.

Listing 27.12 calls external programs, in this case echo and fortune. Because these scripts don't have explicit paths specifying where they are on the hard disk, the shell uses the PATH environment variable to search for them, walking through each entry in the variable looking for the programs to execute.

And this can be dangerous. If, for example, the fortune program was installed in /usr/games, but PATH listed, say, /tmp before it, then any program that happened to be named "fortune" and resided in the temporary directory would be executed instead of the true fortune (see fig. 27.9).

Fig 26.2

Although the script is unaffected, a local user has tricked the Web server into running another program instead of

This program can do anything its creator wants, from deleting files to logging information about the request and then passing the data on to the real fortune-leaving the user and you none the wiser.

You should always specify explicit paths when running external programs from your CGI scripts. The PATH environment variable is a great tool, but it can be misused just like any other.

Using Others' CGI Scripts

Picture yourself walking into a seedy bar on the edge of town-one where all the weirdoes and misfits hang out. After your eyes adjust to the gloom, you get everybody's attention and ask if anybody has a toothbrush they could spare. A guy in the back crawls out from under a table, stumbles over to you and says, "Here! Use mine!"

Would you really use his toothbrush? Of course not. And you should have the same attitude about using CGI scripts you get off the Web-unless you take a good hard look at them first.

Yes, many, many helpful archives of CGI scripts are on the Web-each stuffed with dozens of useful, valuable programs that do exactly what you need and there just for the taking. But before you start haphazardly downloading all these gems and blindly installing them on your server, you should pause and consider a few things:

If the answer to either question is no, you could be opening yourself up to a huge con game. You would do the hacker's work for him by installing a potentially dangerous CGI program on your own server. It's like bringing a bomb into your house because you thought it was a blender.

These Trojan horse scripts-so named because they contain hidden dangers-might be wonderful time savers, doing exactly what you need and functioning perfectly until a certain time is reached or a certain signal is received. Then, they will spin out of your control and execute planned behavior that could range from the silly to the disastrous.

Go to the Source

Before installing a CGI program that you didn't write yourself, you should take care to examine it closely for any potential dangers. If you don't know the language of the script or if its style is confusing, then you might be better off looking for a different solution. Dangers can lurk just beyond your sight! For example, look at this Perl fragment:

if ($ENV{"PATH_INFO"} eq "/send/passwd") system("cat /etc/passwd");

This single line of code could be hidden among thousands of others, waiting for its author or any surfer to enter the secret words that cause it to send him your password file.

If your knowledge of Perl is shaky, if you didn't take the time to completely review the script before installing it, or if a friend assured you that he's running the script with no problems, you could accidentally open your site to a huge security breech-and one that you may not know about. The most dangerous Trojan horses won't even let you know that they've gone about their work. They will continue to work correctly, silently sabotaging all your site's security.

Compiled, Schlamiled

Occasionally, you may find precompiled C CGI scripts on the Web. These are even more dangerous than prewritten programs that include the source. Because precompiled programs don't give you any way of discovering what's actually going on, their "payload" can be much more complex and much more dangerous.

For instance, a precompiled program might take the effort not only to lie in wait for some hidden trigger, but to inform the hacker who wrote it where it's installed! A cleverly written CGI program might mail its master information about your machine and its users every time the script is run (see fig. 27.10), and you would never know because all that complexity is safely out of site behind the precompiled executable.

Fig 27.10

A Trojan horse CGI script can go so far as to deliver mail to its author, letting him know that it's waiting.

Though installing interpreted shell and Perl scripts can be dangerous, running precompiled programs is just downright foolish. If you don't have the source-indeed, if you didn't compile the program yourself-you probably shouldn't trust it.

And That Goes for Your Little Library, Too!

Full-blown CGI scripts aren't the only code that can be dangerous when downloaded off the Web. Also, dozens of handy CGI libraries are available, and they pose exactly the same risks as full programs. If you never bother to look at what each library function you call does, you might end up writing the program that breaks your site's security yourself.

All a hacker needs is for you to execute one line of code that he wrote, and you've handed him the keys to the kingdom. You should review-and be sure that you understand-every line of code that will execute on your sever as a CGI script.

In fact, the entire point of this book-learning how to program CGI scripts-is a good idea, if only to sight check the programs and libraries you can download off the Web.

Remember, always look a gift horse in the mouth!


The Extremes of Paranoia and the Limits of Your Time
Although sight-checking all the code you pull off the Web is often a good idea, it can take huge amounts of time, especially if the code is complex or difficult to follow. At some point, you may be tempted to throw caution to the wind and hope for the best, installing the program and firing up your browser. The reason you downloaded a CGI program in the first place was to save time. Right?
If you do decide to give your paranoia a rest and just run a program that you didn't write, then reduce your risk by getting the CGI script from a well-known and highly regarded site.
The NCSA httpd, for instance, is far too big for the average user to go over line by line, but downloading it from its home site at Site is as close to a guarantee of its integrity as you're likely to get. In fact, anything downloaded from NCSA will be prescreened for you.
In truth, dozens of well-known sites on the Web will have done most of the paranoia-induced code checking for you. Downloading code from any of them is just another layer of protection that you can use for your own benefit. Such sites include the following:


Being Polite, Playing Nice

Finally, if you do appropriate CGI code off the Web to use either in its entirety or as a smaller part of a larger program you're writing, you should be aware of a few things.

Just because code is freely available doesn't mean that it's free, or free for you to do with as you want. Often, programs and libraries are protected by copyrights, and if the original author hasn't released the rights into the public domain, he may use them to impose restrictions on how his program may be used. He may forbid you to break up his script and use parts of it in yours, for example.

In general, before you use someone else's code (even if you've decided that it's secure), it's a good idea to contract the author and ask permission. At the very least, it's polite, and the vast majority of the time he will be overjoyed that someone is getting some use of code he wrote. And, of course, it's always courteous to cite the original authors of the pieces of your program.

Copyright ©1996, Que Corporation. All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this book for any purpose other than your own personal use is a violation of United States copyright laws. For information, address Que Corporation, 201 West 103rd Street, Indianapolis, IN 46290 or at support@mcp .com.

Notice: This material is excerpted from Special Edition Using CGI, ISBN: 0-7897-0740-3. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.