In previous chapters you have learned how to mark up content for your Web site using the HTML standard. Now, we will begin our exploration of the CGI (Common Gateway Interface), which will greatly enhance the level of interactivity on your site. With the use of CGI scripts, you can make your Web presentations more responsive to your users' needs by allowing them to have a more powerful means of interaction with your material.
In this chapter, you will learn about the following:
How the CGI works
Uses for CGI scripts
Seeing if you can write CGI scripts
Common CGI scripting languages
How to find CGI Resources
Here is the answer to the hundred dollar question. What is the CGI anyway? Well, in order to answer that, you are going to need a little background information first.
Each time you sit down in your favorite chair (I hope it is anyway) and start surfing the WWW, you are a client from the Internet's point of view. Each time you click on a link to request a new Web document, you are sending a request to the document's server. The server then receives the request, gets the document, and sends it back to your browser for you to view.
The client/server relationship that is set up between your browser and a Web server works very well for serving up HTML and image files from the server's Web directories. Unfortunately, there is a large flaw with this simple system. The Web server is still not equipped to handle information from your favorite database program or from other applications that require more work than simply transmitting a static document.
One option the designers of the first Web server could have chosen was to build in an interface for each external application from which a client may want to get information. It is hard to imagine trying to program a server to interact with every known application and then trying to keep the server current on each new application as it is developed. Needless to say, it would be impossible. So they developed a better way.
These wizened developers anticipated this problem and solved it by designing the Common Gateway Interface or CGI. This gateway provides a common environment and a set of protocols for external applications to use while interfacing with the Web server. Thus, any application engineer (including yourself) can use the CGI to allow an application to interface with the server. This extends the range of functions the Web server has to include the features provided by a potentially limitless number of external applications.
Now that you have read a little background, you should have a basic idea of what the CGI is, and why it is needed. The next step in furthering your understanding of the CGI is to learn the basics of how it works. To help you achieve this goal, I will break down this material into the following sections:
The Process
Characteristics
The output Header and MIME Types
Environment Variables
The CGI is the common gateway or door that is used by the server
to interface-or communicate-with applications other than the browser. Thus, CGI
scripts act as a link between whatever application is needed and the server
while the server is responsible for receiving information from, and sending data
back to, the browser.
|
NOTE |
|
For example, when you enter a search request at your favorite
search engine, a request is made by the browser to the server to execute a CGI
script. At this time, the browser passes the information that was contained in
the online form plus the current environment to the server. From here, the
server passes the information to the script. This script provides an interface
with the database archive and finds the information that you have requested.
Once this information is retrieved, the script sends it to the server which
feeds it back to your browser as a list of matches to your query.
|
Tip |
|
Another way of looking at the CGI is to see it as a socket that attaches an extra arm on your server. This new arm, the CGI script, adds new features and abilities to the server that it was previously lacking.
The most common use for these new features is to give the server
the ability to dynamically respond to the client. One of the most often seen
examples of this is allowing the client to send a search query to a CGI script
which then queries a database and returns a list of matching topics from the
database. Besides information retrieval, another common theme for using CGI
scripts is to customize the user interface on the Web site. This commonly takes
the form of counters and animations.
|
Tip |
|
These and some of the other common uses for CGI scripts will be discussed in more detail later in this chapter, so stay tuned.
It won't be long into your CGI programming career when you will want to write a script that sends information to the server for it to process. Each file that is sent to the server must contain an output header. This header contains the information the server and other applications need to transmit and handle the file properly.
The use of output headers in CGI scripts is an expansion of a
system of protocols called MIME (Multipurpose Internet Mail Extensions). Its use
for e-mail began in 1992 when the Network Working Group published RFC (Request
For Comments) 1341, which defined this new type of e-mail system. This system
greatly expanded the ability of Internet e-mail to send and receive various
non-text file formats.
|
NOTE |
|
Each time you, as a client, send a request to the server, it is sent in the form of a MIME message with a specially formatted header. Most of the information in the header is part of the client's protocol for interfacing with the browser. This includes the request method, a URI (Universal Resource Identifier), the protocol version, and then a MIME message. The server then responds to this request with its own message which usually includes the server's protocol version, a status code, and a different MIME message.
The bulk of this client/server communication process is handled
automatically by the WWW client application-usually your Web browser-and the
server. This makes it easier for everyone, since you don't have to know how to
format each message in order to access the server and get information. You just
need a WWW client. However, to write your own CGI scripts, you will need to know
how to format the Content-type line of the MIME header in order for the server
to know what type of document your script is sending. Also, you will need to
know how to access the server's environment variables so you can use that
information in your CGI scripts. In the following sections, you will learn
everything necessary to accomplish both of these tasks.
|
NOTE |
|
Each document that is sent via a CGI script to the server,
whether it was created "on-the-fly" or is simply being opened by the script,
must contain a Content-type output header as the first part of the document so
the server can process it accordingly. In table 23.1 you will see examples of
some of the more commonly used MIME Content-types and their associated
extensions.
Table 23.1 Examples of MIME Types and Extensions
|
Content-type: |
Extensions |
|
application/octet-stream |
bin exe |
|
application/postscript |
ai eps ps |
|
application/pdf |
|
|
application/x-csh |
csh |
|
application/x-sh |
sh |
|
application/x-wais-source |
src |
|
application/x-gtar |
gtar |
|
application/x-gzip |
gz |
|
application/x-tar |
tar |
|
application/zip |
zip |
|
audio/x-wav |
wav |
|
image/gif |
gif |
|
image/jpeg |
jpeg jpg jpe |
|
text/HTML |
HTML htm |
|
text/plain |
txt |
|
text/richtext |
rtx |
|
video/mpeg |
mpeg mpg mpe |
|
video/quicktime |
qt mov |
|
video/x-msvideo |
avi |
|
video/x-sgi-movie |
movie |
|
x-world/x-vrml |
wrl |
To help you better understand how to properly use Content-types within a CGI script, let's work through an example. Suppose you have decided to write a CGI script that will display a GIF each time it is executed by a browser.
The first line of code you need is a special comment that
contains the path to the scripting language that you are using to write the
program. In this case it is PERL 4. The comment symbol # must be followed
by an exclamation point ! then the path. This special combination of
#! on the first line of the file is the standard format for letting the
server know which interpreter to use to execute the script. The reason that this
special comment is used is that while UNIX servers use this line of code to
locate the script's interpreter, other types of server systems have alternate
methods of specifying the interpreter's location. However, since this line of
code starts with a # symbol, it is still a valid PERL comment and does
not cause problems on non-UNIX servers.
|
TIP |
|
#!/usr/local/bin/perl
The next line you will need simply sets the variable $gif to the full path name of the image you wish to display.
$gif = "/file/path/your.gif";
Now it is time to let the server know that it will be receiving an image file from this script to display on the client's browser. This is done using the MIME Content-type line. The print statement prints the information between the quotation marks to the server. Each set of \n characters that you see on this line adds a carriage return with a line feed. This gives you the required blank line that must occur after the Content-type information. A blank line lets the server know where the MIME header stops and where the body of information, in this case the gif, starts.
print "Content-type: image/gif\n\n";
The next line creates a file handle named IMAGE that forms a link from this script to the file contained in the variable $gif which we set earlier.
open(IMAGE,$gif);
Now, we create a loop that sends the entire contents of the gif to the server as the body of the MIME message we began with the Content-type line.
while(<IMAGE>) { print $_; }
To avoid being sloppy, we will close the file handle to the gif now that we are done sending the image.
close(IMAGE);
Finally, we let the PERL interpreter know that the CGI script is finished running and can be stopped.
exit;
This type of script can be modified into something a little more useful. For example, you could turn it into a random image viewer. Each time someone clicks on the link to the script, it executes and feeds a random gif to the client's browser.
Hopefully, you now have a little better understanding of what is involved as the client and server communicate with each other. Along with the information that I discussed earlier, a host of environment variables are sent during the client/server communications. Although each server can have its own set of environment variables, for the most part, they are all subsets of a large set of standard variables described by the Internet community to help promote uniform standards.
If you have bin access on a UNIX server, then you can use the following script to easily determine which environment variables your server supports. In addition, this script should also work on other server types such as Microsoft Windows NT server if you properly configure the server to recognize and execute PERL scripts.
Once again, this is the magic line that lets the server know which type of CGI script this is so it can launch the appropriate interpreter.
#!/usr/local/bin/perl
This next line, as was described above, is the MIME output header that lets the server know to expect an HTML document to follow.
print "Content-type: text/html\n\n";
Now that the server is expecting to receive an HTML document, we will send it a list of each environment variable's name and current value by using a foreach loop.
foreach $key (keys(%ENV)){ print "\$ENV{$key} = \"$ENV{$key}\"<br>\n"; }
Finally, we need to tell the interpreter that the script is finished.
exit;
|
TIP |
|
As you can see from the example, most of the variables contain protocol version information, and location information such as the client's IP address and the server's domain. However, if you are creative, you can put some of these variables to good use in your CGI scripts.
The best example I have seen so far is the use of the environment variable HTTP_USER_AGENT. This contains the name and version number of the client application, which is usually a Web browser. As you can see from figure 23.1, the Netscape 2.0 browser that I used when running this script has an HTTP_USER_AGENT value of Mozilla/2.0 (Win95; I).
Once you know what the values are for various browsers, it is possible to write a CGI script to serve different Web documents based on browser type. Thus, a text-only browser might receive a text version of your Web page, while image-capable browsers will receive the full version.
Web sites are interactive by their very nature. Every time you click on a hyper link, you are actively involved in the site, rather than passively reading information. Most users enjoy this added level of interactivity and the feeling of participation it brings. However, hyper links are just the beginning. With CGI scripts, you have access to a whole new set of tools to make your Web site more interactive and dynamic.
The list of uses for CGI scripts is always growing. Here are but a few of the more common ones.
Processing forms
Image maps
Animations
HTML "on the fly"
Counters
Search Engines
WAIS servers
Spiders, Robots, & WebCrawlers
As you can see, you probably have already interacted with many CGI scripts, possibly without even realizing it.
Processing the information entered into a form is by far the
most common use of CGI scripts. These scripts are activated when you press the
submit/send button on the form, that is usually found near the bottom. Once the
script is executed the server sends the script the information that was entered.
Then, the script processes this information and, if appropriate, sends some
information back to the browser via the server. This information is then
displayed on your monitor.
|
TIP |
|
You can take a look at the following URL to see an example of a simple form on the Web for adding a response to a guestbook.
URL Address: http://www.missouri.edu/~bchemkm/guestbook.htm
|
NOTE |
|
Figure 23.3 : Here is a sample of the source code that is used to produce the table in figure 23.2.
The script that processes this form has several common features that you can find in other forms as you explore the Web.
Contains one or more levels of error checking to insure that the form is filled out properly.
Provides an opportunity for the users to double-check the information they have entered.
Notifies you that the information was sent correctly, with a brief thank-you and then points out what you should do next.
Processes the form's information. In this case, the information is added to a response page and the owner of the guestbook-me-is notified via e-mail that the guestbook was signed.
CGI scripts are also commonly used to collect survey information, or update the contents of a database. Later, in chapter 24, you will learn exactly how each of these features works as you write your own guestbook script, much like this one.
CGI scripts are commonly used, as is discussed in detail in chapter 12, for running imagemaps. Each time you use one of these clickable images, you are executing a CGI script that comes packaged with the Web server. This script compares the coordinates of your "click" with those in the imagemap's configuration file to determine which URL to send to the server. The server then transmits the information to the browser.
Think back to when you were a kid in grade school. Do you remember drawing stick men, one on a page, and then flipping the pages quickly to animate it (instead of listening to what the teacher was saying)? Well, this same kind of sequential image animation is done on Web sites using a simple CGI script.
At http://www.missouri.edu/~bchemkm/guestbook.htm you
will find an example I created to demonstrate what this type of animation looks
like. Each image is one in a series of 10 gifs from the well-known Duke JAVA
animation. This sequence is repeated so that the actual animation plays several
times.
|
NOTE |
|
To give you a better feel for how an animation script works, you will need to have a basic understanding of the concept of a boundary. When the script runs, it happily creates the HTML document until it comes to the boundary- another way of saying an artificial divider. Then, the script inserts the graphic for the first animation. Once the first image is accounted for, the script generates the rest of the HTML document. However, the script remembers where the boundary is in the document and overlays each new image on top of the previous one, creating the animation. This is done using the MIME Content-type for multipart documents.
Would you like to have this type of simple CGI animation on your own Web site? If so, all you need to do is keep reading. I have provided a very simple PERL animation script to produce these for your own pages in the next chapter. Along with this script is a more detailed discussion of how animation scripts work.
Another nifty trick using simple CGI scripts is to generate customized HTML pages. These pages produced "on the fly" by the script can include such things as the current time and date, the name and version of the user's browser, or even the user's name.
You can use a simple SH shell script, for example, to generate a little clock (with the date) and indicate which browser the client is using to view your site. To make everything look better, the output can be displayed using table formatting.
Now, I will walk you through this short SH CGI script.
The first line of code is the special comment line that lets the server know what language interpreter to use as it tries to execute the script. In this case, it is the SH shell scripting language usually located in the bin directory on the server.
#!/bin/sh
The SH command cat << top appears in the next line. The cat (which stands for concatenation) command tells the server to echo or print to the browser everything between two identical parameters. In this case top is used.
cat << top
Now, we tell the server what type of document it is receiving so that it can notify the browser. This is done using an output header with the appropriate MIME Content-type output header discussed earlier in this chapter.
Content-type: text/HTML
|
CAUTION |
|
These are standard HTML structural tags.
<HTML> <HEAD>
The next line is a META tag. As you learned in chapter 5, this
tag can be used to reload a page after an indicated amount of time, in this case
one minute. Thus, after each minute elapses, the script is executed again and
the page is rebuilt on the fly. This way, the clock maintains the current time.
|
TIP |
|
<META HTTP-EQUIV="refresh" CONTENT="60"; URL=http://www.missouri.edu/bchemkm-bin/timescript.sh"> Some more vanilla HTML. <TITLE>Sample Time Script</TITLE> </HEAD> <BODY TEXT="#000000" BGCOLOR="#FFFFFF"> <HR><P> <CENTER> <TABLE BORDER=5 CELLSPACING=10 CELLPADDING=2> <TR> <TD> top
Here, we execute the built in UNIX command date and pass it several formatting options. The + command is used to send formatting information to the date command. The % symbol followed by a character represents a format code to tell the date command what to include in the output.
/bin/date "+ %I:%M %p %Z
|
NOTE |
|
The echo commands used here print the information contained within the quotation marks to the browser. Also, we see another use of the date command with a different formatted request.
echo "<BR></TD>" echo "<TD>" /bin/date "+%A %B %d, %Y" echo "<BR></TD>" echo "</TR><TR>" echo "<TD COLSPAN=2>"
Now, here is an example of incorporating an environment variable to tell the client which browser he is using to view your page.
echo $HTTP_USER_AGENT
Now that you have created the clock and let the user know which browser she is using, it is time to finish off the HTML page. This is done with the cat command again, sandwiching the desired HTML between two identical parameters, this time bottom.
cat << bottom <BR></TD> </TR> </TABLE> </CENTER><P> <HR> <P>The rest of your page's content goes here.<P> <HR> </BODY> </HTML> bottom
If you have copied everything correctly, and are using a browser that supports META tags, you should see something that looks like figure 23.5.
Figure 23.5 : This is an example of a simple clock produced by using a CGI script.
If you surf the Web much, you have probably seen several pages that tell you what number visitor you are to the site. The way these sites keep track of the number of visitors is by using a counter. This is a CGI script that increments an internal counter each time the page is requested by the server and then displays the appropriate series of graphics to indicate the current "count."
If you would like to have a counter on your Web site, there are several ways you can go about setting one up. If you have root access to your server, you can install a counter that is accessible by any user on the server. With this option, you will use fewer system resources than if everyone on the system has his own counter script. A nice choice for this type of script is WWW Homepage Access Counter [Counter Release 2.2] which can be found at http://www.semcor.com/~muquit/Count.html.
If you have a working CGI-bin directory, there are several counter scripts you can install for your use. By placing the script in your bin directory, you will be the only user on the system who will have access to it, but if you don't have root access on the server, then this is your best bet. One such script is HTML Access Counter - Counter 4.0 located at http://www.webtools.org/counter/.
Unfortunately, your site may be hosted on a server that is not configured for CGI use. If you find yourself in this situation, you can still have an access counter, but you will need to use one that is hosted by a remote site. Each time someone visits your site, a CGI script is executed on the remote server that exports the count information back to the client's browser. One of the most popular hosted access counters for Web sites is The Web Counter at http://www.digits.com/Web_counter/.
There is a lot of information available about access counters on the Internet already. The FAQ - How do I set up an HTML Counter at http://pantheon.cis.yale.edu/~nakamura/counterfaq.html is an excellent source for further information. Also, if you are running a WinNT server, you can take a look at ED Counters, counters... at http://charon.assert.ee/counters.htm. If you're operating a Mac server then you can try Simple Counters at http://cy-mac.welc.cam.ac.uk/CGI-simplecounter.html for more information.
Once you have your counter set up on your site, you should take a look at Counter Digits at http://www.issi.com/people/russ/digits/digits.html. Here you will find a nice collection of images for use with counter scripts.
Figure 23.6 : Two of my favorite image sets from Digit Mania's counter archive.
A common stopping point on the Web is the search engine. These massive information repositories are easily searched thanks to CGI scripts that allow you to interface with them.
Some of the most well-known search engines include:
Yahoo at http://www.yahoo.com/
Lycos at http://www.lycos.com/
WebCrawler at http://webcrawler.com/
WWW Yellow Pages at http://www.mcp.com/nrp/wwwyp/
For example, if you enter "search engine" into the Lycos search engine, as in Fig. 23.7, you should get back a list of hits. Each hit in the list is formatted as in figure 23.8.
Figure 23.7 : The Lycos search engine's front page.
Figure 23.8 : The first match of the search query "search engine".
Some of the more advanced search engines, like Lycos, will allow
you to use the logical operators and and or to help widen or
narrow your search. You can even control the amount of information listed for
each site in the search results and the number of matches that are returned.
|
TIP |
|
If your site has a large amount of information to present, then you might want to look into getting your own search engine. This allows people using your site to quickly and efficiently locate the information they need. If you feel that a search engine is what your site needs to improve its presentation of information, then you should consider the following options:
If you are a confident programmer, you can write your own search engine CGI script.
If programming is not your strong point at the moment, you can always port an existing search engine to your site from the Web. Here is a list of links to more information about some of the better freeware and shareware packages:
WILLO at http://www.washington.edu:1180/willow/home.html
GLIMPSE at http://glimpse.cs.arizona.edu:1994/glimpsehelp.html
HIDX at http://mall.turnpike.net/~jc/hidxq.html
SWISH 1.1.1 at http://www.eit.com/software/swish/swish.html
Finally, if the previous options fail to meet your needs, you can always buy a commercially available search engine.
If these search engines are not enough to satisfy your site's information distribution needs, you might want to consider implementing a version of WAIS (Wide Area Information Server, pronounced "ways") like freeWAIS on your site. One of the best features of this system is that it catalogues many more types of information than the standard HTML documents that are collected by the web wanderers for use with the standard search engines. A WAIS server keeps track of gifs and other image documents as well as several types of audio and video files. If you have a lot of information in formats other than HTML, then this is a great means of allowing clients to search your site for the information they need.
The WAIS server was originally designed to allow multinational corporations and other organizations the ability to search their internal databases. Each WAIS server forwards incoming queries to the next server on a list. As the request passes along the chain of servers the amount of collected information grows until all the server locations are searched and one large summary document is sent back to the client.
Recently, the WAIS server has been successfully put to use on stand-alone systems. So, you shouldn't feel the need to have multiple server and database locations before you start considering a WAIS server as a means of allowing clients quick and easy access to your site's information.
If you are interested in having these search capabilities on your site, consider getting a current version of freeWAIS (a version of WAIS in the public domain). For more information, you can consult the online FAQ at http://www.cis.ohio-state.edu/hypertext/faq/usenet/wais-faq/freeWAIS-sf/faq.html. Also, you should definitely take a look at the information on the WAIS homepage at http://kaos.erin.gov.au/technical/retrieval/wais/wais.html. Finally, if you would rather have a proprietary version of WAIS software, you should visit WAIS Inc.'s homepage at http://www.wais.com/ for more information. WAIS Inc. is now a part of AOL Productions, Inc.
As you have seen earlier, search engines are used to search vast archives of information on the Web. But how does all that information get compiled? The answer is with CGI scripts called Web wanderers, Web robots, spiders, or webcrawlers. These robots are constantly moving from server to server, site to site, methodically searching for links and pages to process.
You can think of a robot as an automated Web browser. In fact, these programs use the same protocols to access servers and retrieve Web documents that browsers do. They just do it much faster. Each time a robot moves to a new server, it proceeds to systematically archive each Web document's title and URL, directory by directory. It may even note the outgoing links and use them to hunt down the next server to visit.
These programs are usually written for one of three major purposes. The most obvious one is to attempt to maintain a single archive that contains information on every document on the Web. However, it is currently taking the fastest robots more than half a year to travel the entire Web. So, it appears that a complete, up-to-date archive of Web documents will become increasingly difficult to maintain. For this reason most newer robots are only looking for information on a specific topic. This helps these archives stay more current than the larger global search sites. Finally, some robots are built to synchronize mirrored sites.
For a well-kept listing of all the currently known (more than 50) robots on the Internet and a nice starting point for finding more information, see Martijn Koster's site on web wanderers at:
URL Address: http://info.webcrawler.com/mak/projects/robots/robots.html
Hopefully, you now have a good idea of some of the more common uses for CGI scripts. As you can see, many of them provide helpful tools that you can incorporate into your personal Web site. If you would like to use some of these tools to make your site more dynamic, then you will need to consider a few things before you start.
Can you write CGI scripts?
Choosing a CGI scripting language
Before you can get started writing your own CGI scripts, you need to find out if your server is specially configured to allow you to use them. The best thing to do is contact your system administrator and find out if you are allowed to run CGI scripts on the server. If you can, you also need to ask what you need to do to use them, and where you should put the scripts once they are written.
In some cases, system administrators do not allow clients to use CGI scripts because they feel they cannot afford the added security risks. In that case, you will have to find another means of making your site more interactive.
If you find that you can use CGI scripts and are using a UNIX
server, then you will probably have to put your scripts into a specially
configured directory which is usually called cgibin or cgi-bin. If you are using
Microsoft's Internet Server, then you will probably put your CGI programs in a
directory called scripts. This allows the system administrator to
configure the server to recognize that the files placed in that directory are
executable. If you are using an NCSA version of HTTPD on a UNIX system then this
is done by adding a ScriptAlias line to the conf/srm.conf file on the server.
|
Four Steps To Better Script Writing |
|
Now that you know what a CGI script is, how it works, and what it can do, the next thing you need to consider is which language you should use. You can write a CGI script in almost any language. So, if you can program in a language already, there is a good chance you can use it to write your scripts. This is usually the best way to start learning how to write CGI scripts, since you are already familiar with the basic syntax of the language. However, you still need to know which languages your Web server is configured to support.
UNIX-based NCSA and CERN Web servers are by far the most common.
These platforms are easily configured to support most of the major scripting
languages including C, C++, JAVA, PERL, and the basic shell scripting languages
like SH. On the other hand, if your Web server is using the Mac server then you
might be limited to using AppleScript as your scripting language. Likewise, if
you are using Windows NT server, then you might need to use Visual Basic as your
scripting language. However, it is possible to configure both these systems to
support other scripting languages like C and PERL, or even Pascal.
|
NOTE |
|
If you are lucky, you may find that your server is already configured to support several CGI scripting languages. In this case, you just need to compare the strengths and weaknesses of each language you have available with the programming tasks you anticipate writing the scripts for. Once you do this, you should have a good idea of which programming language is best suited to your specific needs.
When it comes to the CGI, anything goes. Of the vast numbers of programming languages out there, many more than you could possibly learn in a lifetime, most can work with the CGI. So, you will have to spend a little time sifting through the long list to find the one that will work best for you.
Even though there are a lot of different languages available,
they tend to fall into several categories based on the way they are
processed-compiled, interpreted, and compiled/interpreted-and on the logic
behind how the source is written-procedural and object-oriented.
|
NOTE |
|
|
TIP |
|
Some of the available programming languages are compiled rather than being interpreted. The two most commonly used are C and C++. When using a compiled language, the program as it appears when you write it is referred to as the source code. This source code is then processed by the language's compiler into a much smaller version that is in the machine's native language and is usually referred to as object code. Once the source code is successfully compiled, the object code can be run by the server without fear of syntax errors. In this more compact form, the object code usually executes much faster than code from scripting languages that are compiled at runtime. Unfortunately, this does mean that you have to recompile the source code each time a change is made in the script.
One of the most popular CGI scripting languages is C. It was developed by Brian Kernighan and Dennis Ritchie in 1972 at Bell Labs. This procedural language is already familiar to a large number of programmers, and thus, is their scripting language of choice. As such, there are many large archives of existing C source code that you can adapt to fit your specific programming needs.
Since C is a compiled language, it must be processed into a small binary object code before it can be executed. As was mentioned earlier, this allows these scripts to execute very quickly. So, if a quick response from the script is your primary consideration for picking a scripting language, you should stick with a compiled language like C. The best use for CGI scripts coded in C is for processing large amounts of numeric information quickly and efficiently.
Unfortunately, most of the CGI scripts written today focus on
complex regular expressions and string data. These types of programs can be very
awkward to write in C. This is one major reason why many CGI programmers are
using PERL instead.
|
TIP |
|
Like its predecessor C, C++ (developed by Bjarne Stroustrup at
AT&T) is a compiled language that executes small binary object code very
quickly. However, C++ is not as similar to C as you might anticipate from the
name. While C is a procedural language, C++ is part of the object-oriented
paradigm. What this means is that as an object-oriented language, C++, is much
more concerned with the function, interaction, and reusability of its objects
than it is with the actual steps it takes to get the job done.
|
NOTE |
|
The only other major drawback for using C++ for your CGI scripting is that there is not a lot of public domain source. Only recently have software engineers started to program object-oriented solutions for CGI scripting needs. Thus, you might have to wait awhile before you start to see large archives of code for public use. However, as time goes on, this will become much less of an issue.
A good source for more information on C++ is the Usenet group comp.lang.c++.moderated.
Unlike C and C++, some languages are not compiled into tight binary code before they are executed. Some, like the shell language SH, are interpreted during execution. This means that any syntax errors in the script will not be detected until the program has already started to run. This, coupled with the limited power of the shell languages, means that they are not as useful for larger scripting jobs as some of the other languages dealt with in this chapter.
PERL, along with several other interpreted languages, avoids this problem by being compiled at runtime. What this means is that the PERL interpreter checks each line of code for proper syntax before the code is compiled. Then, the code is compiled and executed. However, unlike C, this doesn't result in a truly compiled object that can then be reused. PERL scripts are interpreted and compiled each time they are executed. Thus, there is no need to keep track of separate source and object files for the same script.
There are several commonly available shell scripting languages, or command interpreters as they are sometimes called. The most common ones are SH and C shell. Although these are among the most important user interfaces for the UNIX environment, they are not the best choice for a CGI scripting language.
These shell languages are designed as UNIX tools and thus lack
much of the power and features of true programming languages. However, they can
be put to good use when writing simple, rather disposable CGI scripts or when
you need a little job done in a hurry.
|
CAUTION |
|
One of the most commonly used languages for CGI scripting is PERL 4.036. PERL, which stands for "Practical Extraction and Report Language," was developed by Larry Wall, who still maintains it. All the versions of PERL except the newest one, are procedural. The newest release, version 5, is object-oriented and represents a major restructuring of the PERL language. However, most PERL 4 programs should run fine using PERL 5. This latest version will be discussed briefly later in this chapter.
A key feature of PERL is that it is very open ended. It doesn't confine the user to a certain rigorous set of syntax. Instead, PERL usually provides several methods of doing each task, which makes it easier to program using your own personal style. Also, PERL supports almost all the common features of C, so a C programmer can write PERL code that looks very much like the C he or she is used to.
Another key feature of PERL is its powerful handling of strings and regular expressions. Using the built-in string manipulation functions of PERL, many scripts are easily written that would be much harder to program in C. Since the overwhelming majority of all CGI scripts handle string data, it is no wonder that so many CGI scripts are written in PERL.
Another thing to keep in mind is that PERL is completely interpreted and compiled at runtime. This means that you won't get a syntax error after the program is already running like you might programming in a shell language. At the same time, it means that you can simply make a change in your source code and it will take effect. You don't have to precompile your source into object code each time you make a change like you do using C.
Since PERL 4 is currently the most widely used CGI scripting language on the Web, and as it can be run on a wide variety of server types, I have chosen to use it for the majority of the CGI scripting examples used in both this and the following chapter. If you would like more information about this scripting language you should take a look at the PERL Language Home Page at http://www.perl.com/perl/index.html.
At this point, you may be asking yourself why this guy is
telling me about PERL 5 when he just got finished making PERL 4 seem like the
perfect CGI scripting language? Well, the answer, my friend, is simple. PERL 5
is to PERL 4 what C++ is to C. What this means is that while PERL 4 is
procedural, PERL 5 is object-oriented. Also, while PERL 4 is forced mostly to go
it alone, PERL 5 comes equipped to handle reusable modules along with a lot of
other new features.
|
NOTE |
|
As it stands, PERL 5 represents a total renovation of this language. Almost every line in the original code has been redone. This, coupled with the transition from a procedural to an object-oriented language with a lot of new bells and whistles, will make PERL 5 a very popular CGI scripting language for a long time to come.
For more information on this new version of PERL, see the PERL 5 WWW Page at http://www.metronet.com/1h/perlinfo/perl5.html. Or, you can subscribe to the PERL Usenet group at comp.lang.perl.
So far you have been given some examples of compiled and interpreted languages. Recently, though, a language has been developed that is both compiled and interpreted. This programming language is JAVA, which is first compiled into a platform independent binary bytecode. Then, when the script is executed, the precompiled bytecode is interpreted by the local platform into a platform-specific machine code. Thus, as long as there is a JAVA interpreter for the platform you are using, you can use any JAVA bytecode regardless of the platform it was written for. This design allows these programs to become truly platform-independent. Thus, programmers will no longer have to grapple with porting their software across platforms.
The JAVA language is being hailed on the Internet as the scripting language of the future and a possible replacement for the CGI. When Sun MicroSystems first started developing JAVA, it intended to write it entirely in C++. However, as time went on, it decided that there were too many limitations within the language for it to be optimally suited for Internet programming. So, it struck out on its own. However, it has endeavored to stick closely to C++ while designing the language. As a result, JAVA is a member of the object-oriented programming paradigm and should be fairly easy for experienced C++ programmers to pick up.
The object-oriented structure of JAVA is what makes its applications modular while its platform independence makes it very portable. JAVA was defined by Sun MicroSystems in its first white paper as follows:
JAVA: A simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multi-threaded, and dynamic language.
If JAVA can actually live up to this description, then it might very well become the dominant scripting language on the Internet.
As you advance down the path to mastery (or at least proficiency) in your favorite CGI scripting language, you need to know where to look for help and the latest online information.
My personal favorite is using listserves. These are groups of people who share a common interest. Each time someone posts a message to the list, everyone who is subscribed will get a copy. Then, any of the hundreds or even thousands of people who received your post may choose to answer your mail and give you the information you requested. The fastest way to find a news group that is right for you is to check out L-Soft's search engine for its listserves at http://www.lsoft.com/lists/LIST_Q.html. Just pick a topic like HTML, CGI, or JAVA and you will get a series of mailing lists with information on how to subscribe to each one.
If you like the idea of a listserve, but don't want your mailbox filled with mail every day, then a newsgroup may be for you. These are similar to a listserve except that you read the posts off of a news spool rather than out of your inbox. Also, many newsgroup applications allow you to search the posts by subject, author, or keyword. Here is a list of some of my favorite newgroups on CGI programming.
comp.infosystems.www.authoring.cgi
comp.lang.perl
comp.lang.c++
comp.lang.java
comp.lang.javascript
|
TIP |
|
Another great source of online CGI information is personal Web sites. Many individuals have amassed a mountain of links to key information archives on the net for their favorite scripting language. Finding a couple of these gems can save hours of surfing the Web for information.
As is inevitable with most technology, the CGI for all it's worth, is already becoming outdated. With the explosive growth of technology in this day and age, the CGI is starting to show its age as new and exciting alternatives to CGI scripting are being developed. In this, the final section of this chapter, I will discuss a few of these alternatives including SSI (Server Side Includes), as well as JavaScript and Visual Basic Script.
If you are using an NCSA server on a UNIX system, then you have access to a special feature of this server commonly referred to as Server Side Includes (SSI). If you turn on this feature of the server, the server will recognize .shtml files as html documents that need to be treated specially. When the server sends a .shtml file it doesn't passively send the requested document to the browser, but rather actively parses it. This means that the server looks at the HTML document line by line as it is sending it to see if the HTML page includes any special instructions that the server should carry out while it is sending the page. Usually these instructions take one of the following forms.
Adding the current date or time
Adding a file like a standard header or footer
Adding the output from a script
For example, if you have a standard footer that you need to place on every page of your Web site, with SSI you can simply place the following line of code at the bottom of each document where you want the footer to appear.
<!--#include file="footer.html"-->
or
<!--#include virtual="http://www.blah.com/footer.html"-->
Just remember that if you use file, then you must include the relative path for the file to be included, and that the file must be in the same directory or a subdirectory of the main document. Also, if you want you can use virtual and specify the complete URL for the file you wish to include. Or, if you have a script that generates a custom footer for each page, then you can include the output from that script by placing the following line where you would like the script's output to appear within the document.
<!--#exec cgi="/cgibin/footer.pl"-->
The main advantage for using SSIs within your Web pages is that it can allow your documents to display current information like the date and time without the use of a CGI script. Also, it can allow you to maintain only a single version of information you would have to repeat on many pages under normal circumstances.
However, there is one drawback of using SSIs that you should be aware of. By forcing the server to parse each document it sends to the browser, line by line, a lot of processing time is required which both slows down the server and makes the Web pages take longer to load. If a high-traffic site were to parse every page that it sent out to check for SSIs, the server would very likely experience a very marked decrease in efficiency.
For a more detailed discussion of SSIs you should refer to NCSA's online SSI tutorial at http://hoohoo.ncsa.uiuc.edu/docs/tutorials/includes.html.
Along with the development of the new programming language JAVA that was briefly introduced earlier, JavaScript is providing Web authors with alternatives to more traditional CGI programming. By embedding the JavaScript code directly into the Web page, newer browsers like Netscape 2.0 are able to execute these scripts directly on the client's machine without the need to make a call to the server. This can greatly increase the speed at which the clients get feedback from their actions and reduce the load on the Web server at the same time. It is hoped my many that this new scripting language will reduce the heavy server load imposed my many traditional CGI programs by moving much of the processing overhead to the client's machine.
JavaScript is a simpler version of the object-based JAVA language that is interpreted at runtime much like PERL, rather than having to be compiled before it can be executed. Although JavaScript is a simpler version of the JAVA language, it still retains much of its power. Also, JavaScripts can be written to recognize and react to such things as mouse clicks, form field data, and the use of page navigation.
The complete JavaScript Authoring Guide by Netscape can be found at http://cgi.netscape.com/eng/mozilla/Gold/handbook/javascript/index.html and is an excellent place to start your exploration of this alternative to CGI programming.
Another very promising alternative to CGI will be Visual Basic Script or VBScript, which is a cross-platform subset of Visual Basic 4.0 by Microsoft. This scripting language will be in direct competition with JavaScript and will provide much the same functionality as a similar scripting language embedded within the HTML pages themselves.
Like JavaScript, VBScript's major function will be to reduce server overhead by moving the processing load to the client's machine and, in the process, greatly speed up the response to clients' actions. VBScripts will be able to link and automate many types of objects including OLE objects and JAVA applets. Currently, Microsoft plans for its VBScripting language to be fully implemented in the 3.0 release of Microsoft Internet Explorer.
You can find the latest information on VBScript from the Visual Basic Microsoft Web site at http://www.microsoft.com/VBASIC/vbscript/vbscript.htm.