Venice Beach, California.

Chapter 9: Sites That Are Really Programs

by Philip Greenspun, part of Database-backed Web Sites

Note: this chapter has been superseded by its equivalent in the new edition    


Venice Beach, California. The classic (circa 1993) Web site comprises static .html files in a Unix file system. This kind of site is effective for one-way non-collaborative publishing of material that seldom changes.

You needn't turn your Web site into a program just because the body of material that you are publishing is changing. Sites like http://www.yahoo.com, for example, are sets of static files that are periodically generated by programs grinding through a dynamic database. With this sort of arrangement, the site inevitably lags behind the database but you can handle millions of hits a day without a major investment in computer hardware, custom software, or thought.

If you want to make a collaborative site, however, then at least some of your Web pages will have to be computer programs. Pages that process user submissions have to add user-supplied data to your Web server's disk. Pages that display user submissions have to look through a database on your server before delivering the relevant contributions.

Even if you want to publish completely static, non-collaborative material, at least one portion of your site will require server-side programming: the search engine. To provide full-text search over your material, your server must be able to take a query string from the user, compare it to the files on the disk, and then return a page of links to relevant documents.

This chapter discusses the options available to Web publishers who need to write program-backed pages.

CGI Scripting

Every Web server program provides a facility known as the Common-Gateway Interface (CGI). The CGI standard is an abstraction barrier that dictates what a program should expect from the Web server, for example, user form input, and how the program must return data to the Web server program for it to eventually be written back to the Web user. If you write a program with the CGI standard in mind, it will work with any Web server program. You can move your site from NCSA HTTPD 1.3 to Netscape Communications 1.1 to AOLserver 2.1 and all of your CGI scripts will still work. You can give your programs away to other webmasters who aren't running the same server program. Of course if you wrote your CGI program in C and compiled it for an HP Unix box, it isn't going to run so great on their Windows NT machine.

Oops.

Most CGI scripts are written in Perl, Tcl, or some other interpreted computer language. The systems administrator installs the Perl or Tcl interpreter once and then Web site builders on that machine can easily run any script that they write or download off the Net.

Another advantage of CGI and interpreted languages is that the software development cycle is very tight. A message shows up in the error log when a user accesses "http://yourserver.nerdu.edu/bboard/subject-lines.pl". If your Web server document root is at /web (my personal favorite location), then you know to edit the file /web/bboard/subject-lines.pl. After you've found the bug and written the file back to the disk, the next time the page is accessed the new version of the subject-lines Perl script will be interpreted. You don't have to spend time searching for the responsible piece of code. You don't have to recompile and relink any code. You don't have to restart your Web server to make it aware of the new version of the software.

This tight development cycle is essential for Web projects, which tend to be hastily thrown together by overworked programmers. It isn't worth producing a jewel-like system to sit behind a Web site because the whole service may be redesigned in six months.

Considering how straightforward this task is and what a nasty first computer language Perl is, the number of dead-trees books on how to write a CGI script is rather depressing. Among the best of them is CGI Programming on the World Wide Web (Gundavaram; O'Reilly, 1996).

If you don't want to read that book, or the numerous CGI tutorials available on the Web, or the comments in other folks' source code, then here is my basic summary of Unix CGI:

A Very Simple Perl CGI Script

#!/usr/contrib/bin/perl
# the first line in a Unix shell script says where to find the
# interpreter. If you don't know where perl lives on your system, type
# "which perl", "type perl", or "whereis perl" at any shell
# and put the result after the #!
print "Content-type: text/html\n\n";
# now we have printed a header (plus two newlines) indicating that the
# document will be HTML; whatever else we write to standard output will
# show up on the user's screen
print "<h3>Hello World</h3>";

It is that easy to write Perl CGI scripts and get server independence, a tight software development cycle, and ease of distribution to other sites. With that in mind, you might ask how many of these wonderful things do I have on my Web server? One. It was written by Architext and it looks up user query strings in the site's local full-text index. Why don't I have more?

My Unix Box Does Not Like to Fork 500,000 Times a Day

Otters.  Audubon Zoo.  New Orleans, Louisiana. Every time a CGI script is run, the Web server computer has to start a new process (fork). Think about how long it takes to start a program on a Macintosh or Windows NT machine. It is a thousand times faster to indent a paragraph in an already-running word processor than it is to fire up that word processor to view even a one-paragraph document. I don't want my users to wait for this and I don't want to buy an eight-headed DEC ALPHAserver

My RDBMS Does Not Like to Be Opened and Closed 500,000 Times a Day

Any time that I add collaboration to my site, user data is going into and out of a relational database management system (RDBMS). The RDBMS is implemented as a server that waits for requests for connections from client programs (see Chapter 11). IBM, Oracle, Sybase, and Informix have been working for two decades to make the RDBMS fast once a connection is established. Until the Web came along, however, nobody cared too much about how long it took to open a connection. With the Web came the CGI script, a program that runs for only a fraction of a second. In its brief life, it must establish a connection to the RDBMS, get the results of a query, and then close the connection. Users would get their data in about one-tenth the time if their requests could be handled by an already-connected RDBMS client.

Server APIs

Enter the server application programming interface (API). As I discussed in Chapter 6 ("So You Want to Run Your Own Server"), most Web server programs allow you to supplement their behavior with extra software that you write. This software will run inside the Web server's process, saving the overhead of forking CGI scripts. Because the Web server program will generally run for at least 24 hours, it becomes the natural candidate to be the RDBMS client.

Each server program has a different API. This is immediately a problem. Suppose that you write a collection of C code to run inside the Netscape Commerce Web server's API. Then you are forced to convert to AOLserver because you need built-in RDBMS connectivity or to Apache because you need to use a server where the source code is available. Now the C code that you wrote for the Netscape API will have to be rewritten.

The problem could get a lot worse. Suppose that you wrote your scripts for Oracle WebServer's PL/SQL API or AOLserver's Tcl API or Netscape Enterprise's JavaScript API. Competing Web servers don't even have APIs for those languages.

Does that mean it is wise to program in C, the language for which an API is most commonly provided? I don't think so. A bug in your program could result in the entire Web server crashing. Despite the fact that I'll be locked into a particular server program, I prefer to choose a server carefully and then program in a safe language.

AOLserver Example: Redirect

When my friend Brian and I were young and stupid, we installed the NCSA 1.3 Web server program on our research group's file server, martigny.ai.mit.edu. We didn't bother to make an alias for the machine like "www.brian-and-philip.org" so the URLs we distributed looked like "http://martigny.ai.mit.edu/samantha/".

Sometime in mid-1994 the people who depended on Martigny, whose load average had soared from 0.2 to 3.5, decided that a 100,000 hit per day Web site was something that might very nicely be hosted elsewhere. It was easy enough to find a neglected HP Unix box, which we called swissnet.ai.mit.edu. And we sort of learned our lesson and did not distribute this new name in the URL but rather aliases: "www-swiss.ai.mit.edu" for research publications of our group (known as "Switzerland" for obscure reasons); "webtravel.org" for my travel stuff; "photo.net" for my photo stuff; "pgp.ai.mit.edu" for Brian's public key server; "samantha.rules-the.net" for fun.

But what were we to do with all the hard-wired links out there to martigny.ai.mit.edu? We left NCSA 1.3 loaded on Martigny but changed the configuration files so that a request for "http://martigny.ai.mit.edu/foo/bar.html" would result in a 302 redirect being returned to the user's browser so that it would instead fetch http://www-swiss.ai.mit.edu/foo/bar.html.

Two years later, in August 1996, we upgraded Martigny from HP-UX 9 to HP-UX 10. Nobody bothered to install a Web server on the machine. People began to tell me "I searched for you on the Web but your server has been down since last Thursday." Eventually I figured out that the search engines were still sending people to Martigny, a machine that was in no danger of ever responding to a Web request since it no longer ran any program listening to port 80.

Rather than try to dig up a copy of NCSA 1.3, I decided it was time to get some experience with Apache, the world's most popular Web server. I couldn't get the 1.2 beta sources to compile. So I said, "This free software stuff is for the birds; I need the heavy duty iron." I installed the 80MB Netscape Enterprise Server and sat down with the frames- and JavaScript-heavy administration server. After fifteen minutes, I'd configured the port 80 server to redirect. There was only one problem: It didn't work.

So I spent a day going back and forth with Netscape tech support. "Yes, the Enterprise server definitely could do this. Probably it wasn't configured properly. Could you e-mail us the obj.conf file? Hmmm . . . it appears that your obj.conf file is correctly specifying the redirect. There seems to be a bug in the server program. You can work around this by defining custom error message .html files with Refresh: tags so that users will get popped over to the new server if they are running a Netscape browser."

I pointed out that this would redirect everyone to the swissnet server root, whereas I wanted "/foo/bar.html" on Martigny to redirect to "/foo/bar.html" on Swissnet.

"Oh."

They never got back to me.

So I finally installed AOLserver 2.1 which doesn't have a neat redirect facility, but I figured that the Tcl API was flexible enough that I could make the server do what I wanted.

First, I had to tell AOLserver to feed all requests to my Tcl procedure instead of going to look around in the file system:

ns_register_proc GET / martigny_redirect

This is a Tcl function call. The function being called is named ns_register_proc. Any function that begins with "ns_" is part of the NaviServer Tcl API (NaviServer was the name of the program before AOL bought NaviSoft in 1995). ns_register_proc takes three arguments: method, URL, and procname. In this case, I'm saying that HTTP GETs for the URL "/" (and below) are to be handled by the Tcl procedure martigny_redirect:

proc martigny_redirect {conn ignore} {
    append url_on_swissnet "http://www-swiss.ai.mit.edu" [ns_conn url $conn]
    ns_returnredirect $conn $url_on_swissnet
}

This is a Tcl procedure definition, which has the form "proc procedure-name arguments body". martigny_redirect is defined to take two arguments, conn (an AOLserver connection), and a second argument called ignore (the second argument is only useful when multiple URLs are registered to the same proc). When martigny_redirect is invoked, it first computes the full URL of the corresponding file on Swissnet. The meat of this computation is a call to the API procedure "ns_conn" asking for the URL that was part of the request line.

With the full URL computed, martigny_redirect's second body line calls the API procedure ns_returnredirect. This writes back to the connection a set of 302 redirect headers instructing the browser to rerequest the file, this time from "http://www-swiss.ai.mit.edu".

Here's what I learned from this experience:

AOLserver Example: Bill Gates Personal Wealth Clock

Academic computer scientists are the smartest people in the world. There are an average of 800 applications for every job. And every one of those applicants has a Ph.D. Anyone who has triumphed over 799 Ph.D.s in a meritocratic selection process can be pretty sure that he or she is a genius. Publishing is the most important thing in academics. Distributing one's brilliant ideas to the adoring masses. The top computer science universities have all been connected by the Internet or ARPAnet since 1970. A researcher at MIT in 1975 could send a technical paper to all of his interested colleagues in a matter of minutes. With this kind of heritage, it is natural that the preferred publishing medium of 1990s computer science academics is . . . dead trees.

Yes, dead trees.

If you aren't in a refereed journal or conference, you aren't going to get tenure. You can't expect to achieve quality without peer review. And peer review isn't just a positive feedback mechanism to enshrine mediocrity. It keeps uninteresting papers from distracting serious thinkers at important conferences. For example, there was this guy in a physics lab in Switzerland, Tim Berners-Lee. And he wrote a paper about distributing hypertext documents over the Internet. Something he called the Web. Fortunately for the integrity of academia, this paper was rejected from conferences where people were discussing truly serious hypertext systems.

Anyway, with foresight like this, it is only natural that academics like to throw stones at successful unworthies in the commercial arena. IBM and their mainframe customers provided fat targets for many years. True, IBM research labs had made many fundamental advances in computer science, but it seemed to take at least ten years for these advances to filter into products. What kind of losers would sell and buy software technology that was a decade behind the state of the art?

Then Bill Gates came along with technology that was 30 years behind the state of the art. And even more people were buying it. IBM was a faceless impediment to progress but Bill Gates gave bloated monopoly a name, a face, and a smell. And he didn't have a research lab cranking out innovations. And every non-geek friend who opened a newspaper would ask, "If you are such a computer genius, why aren't you rich like this Gates fellow?" This question was particularly depressing for graduate students earning $1,300 a month. For them, I published Career Guide for Engineers and Scientists (http://philip.greenspun.com/careers/).

I thought starving graduate students forgoing six years of income would be cheered to read the National Science Foundation report that "Median real earnings remained essentially flat for all major non-academic science and engineering occupations from 1979-1989. This trend was not mirrored among the overall work force where median income for all employed persons with a bachelor's degree or higher rose 27.5 percent from 1979-1989 (to a median salary of $28,000)."

I even did custom photography for the page (see my nude photography tutorial for an explanation).

Naturally I maintained a substantial "Why Bill Gates is Richer than You" section on my site but it didn't come into its own until the day my friend Brian showed me that the U.S. Census Bureau had put up a real-time population clock at http://www.census.gov/cgi-bin/popclock. There had been stock quote servers on the Web almost since Day 1. How hard could it be to write a program that would reach out into the Web and grab the Microsoft stock price and the population, then do the math to come up with what you see at http://www.webho.com/WealthClock

This program was easy to write because the AOLserver Tcl API contains the ns_geturl procedure (subsequently supplanted by an improved ns_httpget API call). Having my server grab a page from the Census Bureau is as easy as

ns_geturl "http://www.census.gov/cgi-bin/popclock"

Tcl the language made life easy because of its built-in regular expression matcher. The Census Bureau and the Security APL stock quote folks did not intend for their pages to be machine-parsable. Yet I don't need a long program to pull the numbers that I want out of a page designed for reading by humans.

Tcl the language made life hard because of its deficient arithmetic. Some computer languages-Pascal, for example-are strongly typed. You have to decide when you write the program whether a variable will be a floating-point number, a complex number, or a string. Lisp is weakly typed. You can write a mathematical algorithm with hundreds of variables and never specify their types. If the input is a bunch of integers, the output will be integers and rational numbers (ratios of integers). If the input is a complex double precision floating-point number, then the output will be complex double precision. The type is determined at run-time. I like to call Tcl "whimsically" typed. The type of a variable is never really determined. It could be a number or a string. It depends on the context. If you are looking for a pattern, "29" is a string. If you are adding it to another number, "29" is a decimal number. But "029" is an octal number so trying to add it to another number results in an error.

Anyway, here is the code. Look at the comments.

# this program copyright 1996, 1997 Philip Greenspun (philg@mit.edu)
# redistribution and reuse permitted under
# the standard GNU license
# this function turns "99 1/8" into "99.125"
proc wealth_RawQuoteToDecimal {raw_quote} {
    if { [regexp {(.*) (.*)} $raw_quote match whole fraction] } {
 # there was a space
 if { [regexp {(.*)/(.*)} $fraction match num denom] } {
     # there was a "/"
     set extra [expr double($num) / $denom]
     return [expr $whole + $extra]
 }
 # we couldn't parse the fraction
 return $whole
    } else {
 # we couldn't find a space, assume integer
 return $raw_quote
    }
}
###
#   done defining helpers, here's the meat of the page
###
# grab the stock quote and stuff it into QUOTE_HTML
set quote_html \
    [ns_geturl "http://qs.secapl.com/cgi-bin/qs?ticks=MSFT"]
# regexp into the returned page to get the raw_quote out
regexp {Last Traded at</a></td><td align=right><strong>([^A-z]*)</strong>} $quote_html match raw_quote
# convert whole number + fraction, e.g., "99 1/8" into decimal,
# e.g., "99.125"
set msft_stock_price [wealth_RawQuoteToDecimal $raw_quote]
set population_html [ns_geturl "http://www.census.gov/cgi-bin/popclock"]
# we have to find the population in the HTML and then split it up
# by taking out the commas
regexp {<H1>[^0-9]*([0-9]+),([0-9]+),([0-9]+).*</H1>} \
       $population_html match millions thousands units
# we have to trim the leading zeros because Tcl has such a
# brain damaged model of numbers and thinks "039" isn't a number
# this is when you kick yourself for not using Common Lisp
set trimmed_millions [string trimleft $millions 0]
set trimmed_thousands [string trimleft $thousands 0]
set trimmed_units [string trimleft $units 0]
# then we add them back together for computation
set population [expr ($trimmed_millions * 1000000) + \
                     ($trimmed_thousands * 1000) + \
                     $trimmed_units]
# and reassemble them in a string for display
set pretty_population "$millions,$thousands,$units"
# Tcl is NOT Lisp and therefore if the stock price and shares are
# both integers, you get silent overflow (because the result is too
# large to represent in a 32 bit integer) and Bill Gates comes out as a
# pauper (< $1 billion). We hammer the problem by converting to double
# precision floating point right here.
#
# (Were we using Common Lisp, the result of multiplying two big 32-bit
# integers would be a "big num", an integer represented with multiple
# words of memory; Common Lisp programs perform arithmetic correctly.
# The time taken to compute a result may change when you move from a
# 32-bit to a 64-bit computer but the result itself won't change.)
set gates_shares_pre_split [expr double(141159990)]
set gates_shares [expr $gates_shares_pre_split * 2]
set gates_wealth [expr $gates_shares * $msft_stock_price]
set gates_wealth_billions \
    [string trim [format "%10.6f" [expr $gates_wealth / 1.0e9]]]
set personal_share [expr $gates_wealth / $population]
set pretty_date [exec /usr/local/bin/date]
ns_return $conn 200 text/html "<html>
<head>
<title>Bill Gates Personal Wealth Clock</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h2>Bill Gates Personal Wealth Clock</h2>
just a small portion of 
<a href=\"http://www-swiss.ai.mit.edu/philg/humor/bill-gates.html\">Why Bill Gates is Richer than You
</a>
by
<a href=\"http://www-swiss.ai.mit.edu/philg/\">Philip Greenspun</a>
<hr>
<center>
<br>
<br>
<table>
<tr><th colspan=2 align=center>$pretty_date</th></tr>
<tr><td>Microsoft Stock Price:
    <td align=right> \$$msft_stock_price
<tr><td>Bill Gates's Wealth:
    <td align=right> \$$gates_wealth_billions billion
<tr><td>U.S. Population:
    <td align=right> $pretty_population
<tr><td><font size=+1><b>Your Personal Contribution:</b></font>
    <td align=right>  <font size=+1><b>\$$personal_share</font></b>
</table>
<p>
<blockquote>
\"If you want to know what God thinks about money, just look at the
 people He gives it to.\" <br> -- Old Irish Saying
</blockquote>
</center>
<hr>
<a href=\"http://www-swiss.ai.mit.edu/philg/\"><address>philg@mit.edu</address>
</a>
"

So is this the real code that sits behind http://www.webho.com/WealthClock?

Actually, no. You'll find the real source code linked from the above URL.

Why the differences? I was concerned that, if it became popular, the Wealth Clock might impose an unreasonable load on the subsidiary sites. It seemed like bad netiquette for me to write a program that would hammer the Census Bureau and Security APL several times a second for the same data. It always seemed to me that users shouldn't have to wait for the two subsidiary pages to be fetched if they didn't need up-to-the-minute data.

So I wrote a general purpose caching facility that can cache the results of any Tcl function call as a Tcl global variable. This means that the result is stored in the AOLserver's virtual memory space and can be accessed much faster even than a static file. Users who want a real-time answer can demand one with an extra mouse click. The calculation performed for them then updates the cache for casual users.

Does this sound like overengineering? It didn't seem that way when Netscape put the Wealth Clock on their What's New page for two weeks (summer 1996). The URL was getting two hits per second. Per second. And all of those users got an instant response. The extra load on my Web server was not noticeable. Meanwhile, all the other sites on Netscape's list were unusably slow. Popularity had killed them.

Here are the lessons that I learned from this example:

The Right Way – Extending HTML

Which of the technologies we've discussed is going to dominate server-side programming? Will it be CGI/Perl on super-fast machines? Tcl, Java, or C API code? Visual BASIC?!? I predict "none of the above."

What powerful language will sit behind the Web sites of the future?

HTML.

HTML? But didn't we spend all of Chapter 3 saying how deficient it was even as a formatting language? How can HTML function as a server-side programming language?

Server-Parsed HTML

In the beginning, there was server-parsed HTML. You added an HTML comment to a file, as, for example

<!--#include FILE="/web/author-info.txt" -->

and then reloaded the file in a browser.

Nothing changed. Anything surrounded by "<!--" and "-->" is an HTML comment. The browser ignores it.

Your intent, though, was to have the Web server notice this command and replace the comment with the contents of the file /web/author-info.txt. To do that, you have to change the file name of this URL to have an .shtml extension. Now the server knows that you are actually programming in an extended version of HTML.

The AOLserver takes this one step further. To the list of standard SHTML commands, they've added #nstcl:

<!--#nstcl script="ns_geturl "http://cirrus.sprl.umich.edu/wxnet/fcst/boston.txt" -->

which lets a basically static HTML page use the ns_geturl Tcl API function to go out on the Internet, from the server, and grab http://cirrus.sprl.umich.edu/wxnet/fcst/boston.txt before returning the page to the user. The contents of http://cirrus.sprl.umich.edu/wxnet/fcst/boston.txt are included in place of the comment tag.

This is a great system because a big Web publisher can have its programmers develop a library of custom Tcl functions that its content authors simply call from server-parsed HTML files. That makes it easy to enforce style conventions company-wide. For example,

<!--#nstcl script="webco_captioned_photo samoyed.jpg \
{This is a Samoyed, probably the best looking dog you will ever see.}" -->

might turn into

<h3>
<img src="samoyed.jpg" 
     alt="This is a Samoyed, probably the best looking dog 
          you will ever see.">
This is a Samoyed, probably the best looking dog you will ever see.
</h3>

until the day that the Webco art director decides that HTML tables would be a better way to present these images. So a programmer redefines the procedure webco_captioned_photo, and the next time they are served, thousands of image references instead turn into

<table>
<tr>
  <td><img src="samoyed.jpg" 
           alt="This is a Samoyed, probably the best looking dog
                you will ever see.">
  <td>This is a Samoyed, probably the best looking dog 
      you will ever see.
</tr>
</table>

HTML As a Programming Language

As long as we're programming our server, why not define a new language, "Webco HTML?" Any file with a .whtml extension will be interpreted as a Webco HTML program and the result, presumably standard HTML, will be served to the requesting users. Webco HTML has the same syntax as standard HTML, just more tags. Here's the captioned photo example:

<CAPTIONED-PHOTO "samoyed.jpg" 
"This is a Samoyed, probably the best looking dog you will ever see.">

Just like the Tcl function, this Webco HTML function takes two arguments, an image file name and a caption string. And just like the Tcl function, it produces HTML tags that will be recognized by standard browsers. I think it is cleaner than the "include a Tcl function call" .shtml example because the content producers don't have to switch back and forth between HTML syntax and Tcl syntax.

How far can one go with this? Pretty far. The best of the enriched HTMLs is Meta-HTML (http://www.mhtml.com). Meta-HTML is fundamentally a macro expansion language. We'd define our captioned-photo tag thusly:

<define-tag captioned-photo image-url text>
  <h3>
    <img src="<get-var image-url>" alt="<get-var text>"> <br>
    <get-var text>
  </h3>
</define-tag>

Now that we are using a real programming language, though, we'd probably not stop there. Suppose that Webco has decided that it wants to be on the leading edge as far as image format goes. So it publishes images in three formats: GIF, JPEG, and progressive JPEG. Webco is an old company so every image is available as a GIF but only some are available as JPEG and even fewer as progressive JPEG. Here's what we'd really like captioned-photo to do:

1. Change the function to take just the file name as an argument, with no extension; for example, "foobar" instead of "foobar.jpg".

2. Look at the client's user-agent header.

3. If the user-agent is Mozilla 1, then look in the file system for foobar.jpg and reference it if it exists (otherwise reference foobar.gif).

4. If the user-agent is Mozilla 2, then look in the file system for foobar-prog.jpg (progressive JPEG) and reference it; otherwise look for foobar.jpg; otherwise reference foobar.gif.

This is straightforward in Meta-HTML:

<define-function captioned-photo stem caption>
  ;;; If the user-agent is Netscape, try using a JPEG format file
  <when <match <get-var env::http_user_agent> "Mozilla">>
    ;;; this is Netscape
    <when <match <get-var env::http_user_agent> "Mozilla/[2345]">>
      ;;; this is Netscape version 2, 3, 4, or 5(!)
      <if <get-file-properties
         <get-var mhtml::document-root>/<get-var stem>-prog.jpg>
          ;;; we found the progressive JPEG in the Unix file system
         <set-var file-to-reference = <get-var stem>-prog.jpg>>
    </when>
    ;;; If we haven't defined FILE-TO-REFERENCE yet, 
    ;;; try the simpler JPEG format next.
    <when <not <get-var file-to-reference>>>
      <if <get-file-properties
            <get-var mhtml::document-root>/<get-var stem>.jpg>
          <set-var file-to-reference = <get-var stem>.jpg>>
    </when>
  </when>
  ;;; If FILE-TO-REFERENCE wasn't defined above, default to GIF file
  <when <not <get-var file-to-reference>>>
    <set-var file-to-reference <get-var stem>.gif>
  </when>
  ;;; here's the result of this function call, four lines of HTML
  <h3>
  <img src="<get-var file-to-reference>" alt="<get-var caption>"> 
  <br>
  <get-var caption>
  </h3>
</define-function>

This example only scratches the surface of Meta-HTML's capabilities. The language includes many of the powerful constructs such as session variables that you find in Netscape's LiveWire system. However, for my taste, Meta-HTML is much cleaner and better implemented than the LiveWire stuff. Universal Access offers a "pro" version of Meta-HTML compiled with the OpenLink ODBC libraries so that it can talk efficiently to any relational database (even from Linux!).

Is the whole world going to adopt this wonderful language? Meta-HTML does seem to have a lot going for it. The language and first implementation were developed by Brian Fox and Henry Minsky, two hard-core MIT computer science grads. Universal Access is giving away their source code (under a standard GNU-type license) for both a stand-alone Meta-HTML Web server and a CGI interpreter that you can use with any Web server. They distribute precompiled binaries for popular computers. They offer support contracts for $500 a year. If you don't like Universal Access support, you can hire the C programmer of your choice to maintain and extend their software. Minsky and Fox have put the language into the public domain. If you don't like any of the Universal Access stuff, you can write your own interpreter for Meta-HTML, using their source code as a model.

What then are savvy Web technologists doing?

Tripping over themselves to use crippled server-parsed HTML systems that you could implement with 100 lines of Meta-HTML code. A typical example of the genre is NetCloak from Maxum Development (http://www.maxum.com). Here's an excerpt from their user's manual:

<HIDE_DAY day1 day2 ...>
This command hides the HTML text on the specified day(s). The valid 
days are MON, TUE, WED, THU, FRI, SAT, SUN. As always, multiple days 
may be specified, as in this example, which would hide text during 
the work week:
<HIDE_DAY MON TUE WED THU FRI>

That's about as powerful as NetCloak gets. They don't really have much respect for HTML syntax and spirit; the command to close a <HIDE_DAY> is not </HIDE_DAY> as you'd expect but rather <SHOW>. NetCloak costs money, only runs on the Macintosh, and the source code is not available. The lack of source code is probably annoying to people who are using the 2.1 version on the Macintosh; it apparently crashes the server when presented with a malformed URL.

Why am I picking on NetCloak? Because it came to my attention in an article about computer language copyright. It seems that people are fighting over the rights to implement this language and sell it. They could download Meta-HTML for free, write a NetCloak compatibility package in a few hours, and spend the rest of the day building an Oracle-backed collaborative Web site. But instead they spend their time with lawyers and dream of a world where a Web page could have a different look on weekends.

If you want to try out a brilliant Meta-HTML site, visit my friend Neil's postcard system.

Summary

Server-side programming is straightforward and can be done in almost any computer language, including extended versions of HTML itself. However, making the wrong technology decisions can result in a site that requires ten times the computer hardware to support. Bad programming can also result in a site that becomes unusable as soon as you've gotten precious publicity. Finally, the most expensive asset you are developing on your Web server is content. It is worth thinking about whether your server-side programming language helps you get the most out of your investment in content.

Note: If you like this book you can move on to Chapter 10.


philg@mit.edu