Tips for Using Illustra

Tips for Using the Illustra Relational Database

by Philip Greenspun for the Web Tools Review

Note: these tips are based on experience with Illustra version 3.2. When I need to use another database management system product for comparison, I choose Oracle because it is the most popular and because I have some experience using it.

My Favorite Illustra Features

The PLS Blade provides a really convenient way of indexing character strings. You can only have one list of stop words per installation (see Doc Blade blurb below), though I think this limitation may be removed in a future release. Make sure you have the latest release of Illustra and PLS and that you've messed with the file descriptors parameter of your Solaris operating system. Otherwise you should remove all sharp and/or sledge-hammer shaped objects from your computer room.

One of the first horrors that confronts the Oracle SQL or ANSI SQL programmer is the lack of a boolean data type. Illustra has one. It is much cleaner than using CHAR(1) filled with 'f' or 't'.

Text Searching Caveats

The Doc Blade is the worst conceived and worst realized of the Illustra products I've used. It is supposed to index text. The first problem is its programmer interface. To get anything into a doc type, you first have to write it to a Unix filesystem file. This is especially painful if you are getting text input from Web users. If you go successfully go through these gymnastics, you'll then find that the Doc Blade returns completely random results when documents are short (e.g., classified ads or other smallish chunks you'd be likely to deal with on the Web). Finally, the Doc Blade provides for a list of stop words, i.e., words to not index. Unfortunately, the list has to be the same for every table and database in your installation. However, the appropriate set of stop words is completely different for documents of different types and lengths.

[Note: the Oracle text searching products provide a custom list of stopwords per index.]

"Object-Orientation"

Dave Moon, a pioneer in modern object systems, was one of the world's best programmers but he was not a patient man. He was a founder of Symbolics and I was working there in 1984 when the company moved into a new building next to MIT. The facilities manager sent around some email telling people not to tape posters to their office walls because we'd be moving to bigger quarters in a few years and didn't want the landlord to charge us for excessive wear. That night, the Foonly crashed. A Foonly was a clone of the PDP-10, a mainframe computer designed by Digital in the 1960s. MIT and Stanford people loved the PDP-10 but couldn't afford DEC's million dollar price tags. So there were these guys in a basement in California smoking dope and wirewrapping clones that were 1/3 the speed and 1/20th the cost. Nobody ever figured out why they called the machines "Foonlies".

Moon was a superb hardware engineer and nobody doubted that he would get the Foonly up and running. Still, people were a bit surprised when a huge steel cylinder came crashing through the machine room wall. The cause of the crash had been one of those washing-machine-sized Control Data T-300 disk packs. The cylindrical missile had been the spindle holding together the bad 12" platters. Moon had hurled it through the wall after determining its guilt in the crime of the Foonly crash. I went back to my office and taped up a poster.

The point of this story is to illustrate the prevailing lack of patience among early developers and users of object systems during their first decades. One of the things that drove them crazy about the object systems of the 1970s (SmallTalk, Lisp Machine Flavors) was that if you changed a class definition, the existing instances of that class did not get modified. You'd have to restart your program, maybe even reboot your computer, if you changed your mind about how to represent something. You could lose 20 minutes or even more.

So the object systems of the 1980s, e.g., Common Lisp Object System, were designed to touch up running instances of classes if the class definition changed.

When I read the Illustra User's Guide, I came to it with 15 years of experience with object systems. It took me about an hour to figure out how to take all the columns that were common to a bunch of tables and put them in a supertable, then make subtables inherit from them. In a Web site that sold magazines, videos, books, and CDs, I made a products table with columns product_id, short_name, description, etc. Then the magazines table would inherit from products plus add a column for issues_per_year.

Everything was great until six months later when I wanted to facilitate integration with the MIS department by adding a foreign_key column to the products supertable. You can't change a table with dependents! I had to dump the data out of my tables, drop all of them, and then rebuild them in order to change the schema. I ended up rebuilding them without using inheritance, which is a potentially useful feature but not when you end up with such a brittle data model.

In short, Illustra failed to aborb the object-oriented lessons of the 1970s: class definitions change. This is much worse for a database than for those old Lisp systems. It took a few minutes to restart a Lisp Machine program; you might be off the Web for days trying to rebuild a big database.

I do not want to be around Dave Moon if he ever gets hold of a copy of Illustra.

Dating

One of the longest persisting effects of having been an MIT undergrad is the famine psychology when it comes to dating. All of my friends from the Class of '82 still think up schemes for getting women. It wouldn't matter if they were Muslims with four wives. They'd still be trying to figure out how to meet and retain feminine companionship.

Illustra won't help them. It picks up the design flaws in the ANSI SQL date-time standard and then adds a few implementation bugs.

Example 1

SQL can be very frustrating to the programmer used to procedural languages. One of the things that I liked about using Oracle, though, was that I'd never had to change a single line of SQL once I'd gotten the query to function. Certainly, I never got an error from an SQL query depending on the data in the table or the phase of the moon. Consider the following query:

select classified_ad_id from classified_ads where posted > (current_date - interval '1' month)

Since posted is a column containing the date that an ad was placed, you'd expect this to return rows of ads placed within the last month. Well, it was working great for my photography magazine's classified ad section so I forgot about it. Then I got a pile of complaints from users in my mailbox. The service had been dead for a couple of days because March 30 minus 1 month = February 30 according to Illustra. This wouldn't have been so bad if Illustra didn't then check the validity of the date it just generated and then fail with an error message.

The incredible thing about this is that it is the behavior specified by the ANSI standard and documented both in the Illustra User's Guide and the "red standard" book (Date/Darwen). Oracle has nonstandard syntax, but the same semantics would have returned February 28 or February 29 depending on the year.

Workaround: Use current_date - interval '30' day. Watch it fail with another error when you try current_date - interval '100' day.

Workaround for the workaround: Manually tell Illustra that you want three digits of precision on the day field (though you'd think it could see that): current_date - interval '100' day(3).

Example 2

First I created a little table to snapshot the integer session keys I was handing out....


* create table foobar (
	sample_time	timestamp,	-- when did the cookie have the value
	sample_value	integer
);

* select * from foobar order by sample_time;
------------------------------------------
|sample_time               |sample_value |
------------------------------------------
|1996-02-15 19:02:08.000000|75000        |
|1996-02-16 19:02:08.000000|76000        |
|1996-02-17 19:02:08.000000|77000        |
|1996-02-18 19:02:08.347617|77276        |
------------------------------------------

Then I patted myself on the back for constructing a studly "self-join with subquery" to get the intervals... (yes I think this really is the simplest query that will produce the desired data)


select h1.sample_time as s1, 
       h2.sample_time as s2, 
       h2.sample_time - h1.sample_time as gap_length
from foobar h1, foobar h2
where
s2 > s1
and s2 = (select min(h3.sample_time) from foobar h3 where h3.sample_time > h1.sample_time)
order by s1;

---------------------------------------------------------------------
|s1                        |s2                        |gap_length   |
---------------------------------------------------------------------
|1996-02-15 19:02:08.000000|1996-02-16 19:02:08.000000|0-0          |
|1996-02-16 19:02:08.000000|1996-02-17 19:02:08.000000|0-0          |
|1996-02-17 19:02:08.000000|1996-02-18 19:02:08.347617|0-0          |
---------------------------------------------------------------------

I had planned to rush out into the clubs of Manhattan with a printout of this result, expecting women to melt when they saw it. But it was not to be. Subtracting two timestamps a day apart produces a gap that is 0 years and 0 months long. I never really did find a workaround.

Final Dating Caveat

Illustra tables with a date-time columns seem to cause Microsoft Access to choke.

Archiving

One of Illustra's coolest sounding features is archiving. You get to query the system to find out what your data looked like, say, 6 months ago. I relied on this feature in a classified ad system. I would DELETE the ads from the table but still have them around when I wanted to calculate statistics on, say, how many users had successfully sold their goods because of the service.

There are several problems with archived rows. The inflexibility of Illustra data models means that at some point you will probably need to dump your data out of your tables, drop them, and rebuild them. There is no easy way to preserve the archived rows during this procedure, so you lose your history. Some bugs in the Illustra storage manager and backup/restore software can be ameliorated by purging archived rows. Of course, here again you lose your history.

My advice: If you need to keep a history, do it in a "status" column like you would in Oracle or Sybase. Your queries will be slightly uglified but at least you'll have your data when you need it.

I'm Dysfunctional; You're Dysfunctional

If you've ever puked over an Oracle PL/SQL program, then you'll appreciate Illustra's functions. Unfortunately, the semantics of Illustra functions aren't clear; sometimes they work like macros. Since Illustra does not provide a sequence generator like Oracle, I had to keep my own table of sequence numbers. I wanted to centralize the handing out of new keys, so I made a little Illustra function to update the sequence table and return the new key. I then did something like the following:



insert into products values ( next_product_id(), "AL'S HAND CREAM", ... );

The Unix box started to thrash. Web services ground to a halt. Since my products table was indexed by the product_id with a primary key constraint, Illustra had to check to see if the new ID was already present in a row. In real programming systems, the semantics of functions are well-defined. For example, in Lisp, the args are evaluated and then passed down. So next_product_id would have been evaluated once and the result compared to the existing rows. Instead, Illustra sequenced through the rows, calling my next_product_id over and over again. Each call tried to insert into the sequence numbers table. For some reason, this was just too much for Illustra 2.4 and it caused a complete failure of the database. Inserts into an audit table specified by an alerter weren't done, internal errors showed up in the server log.

My advice: program anything with a side effect in C, Tcl, perl, or whatever else you're using to talk to Illustra.

[Note: PL/SQL has the elegance of a Soviet locomotive, but this ADA derivative is rock solid in my experience. Not only that, Oracle is incredibly good about caching the results of PL/SQL function calls. So you can put a complex query into PL/SQL and it will be very slow the first time it executes, but will return immediately for subsequent calls (until you update one of the tables on which it depends).]

Oh What a Friend You Have in Seagate

Illustra appears to have finally succeeded in making their on-line backup software generate a consistent snapshot of the database. Nonetheless, you might not wish to trust it if your data is precious. An off-line dump of a 100 MB database takes about 3 hours on a typical Unix box. If you don't want to be off the Web that long, I recommend buying a big collection of disk drives and mirroring them ("RAID Level 0").

Now you never have to back up to protect against media failure. However, you may still want snapshots of your database in case someone gets a little excited with a DELETE FROM statement. Or in case your facility is torched by a distraught user. The way the Oracle studs at Boston Children's Hospital do this with their 60 GB database is to break the mirror, back up from the disks that are off-line as far as the database is concerned, and then reestablish the mirror. For both performance and recovery reasons, they presumably keep their redo log on a separate disk from the rest of the database so they probably don't even risk losing the transactions that occur when the mirror is broken.

Note: this is probably good practice with any RDBMS. I've decided that one really needs at least five disks for a db-backed Web server: one for the root disk/swap; one for writing the Web server Unix file system log; one for reading static image files; one for the database rollback segment; one for the database per se. Then five more to mirror them. If this sounds excessive, note that the Oracle DBA Handbook recommends a 7x2 disk configuration as a minimum compromise for a machine doing nothing but database service (their solutions start at 9x2 disks and go up to 22x2).

Jean Anderson wrote a nice explanation of what an outer join is and how to do one in Illustra

Return to Web Tools Review home

philg@mit.edu