Go to the first, previous, next, last section, table of contents.

The Intermediate Table Format

This section describes how we store most of the information drawn from the HL7 text files. Since there are different items to extract (i.e segments and tables) which however are essentially tables, but differ in their embedding and appearance in the text, it seems appropriate to have an easy to generate interim table format, which the different extraction procedures write their information into, rather than generating the final representation directly. From these tables we can get our final representation easily by applying a common AWK script to all of them regardless of what they are derived from. This has the advantage that we are free now to decide to translate them into a different programming language than Prolog or import them into a database management system etc.

Note that if we are talking about `tables' in this section, we mean this interim format, do not confuse this with the tables found in the HL7 text. The following is a set of rules telling us how these tables are built:

  1. The first line of the table is it's name.
  2. The table is ended by at least one empty line.
  3. Each row of the table makes up exactly one line, the length of the line is however not limited to a specific number of characters. The line is terminated by the system's native <EOL> character, i.e. the one that the AWK's printf escape sequence `\n' expands into.
  4. Columns are colon separated. A colon appears between two columns, not starting the first and not ending the last one. Where there are two consecutive colons, a colon at the beginning or one at the end of the row, the corresponding field is treated as empty.
  5. The second line of the table names the titles of the columns.
  6. The third line of the table specifies the data type to expect on each column. This can be one of the following keywords:
    `sym'
    a symbol, that will become an identifier in the target language (Prolog).
    `num'
    a number
    `str'
    a string, i.e. a sequence of characters, that will appear enclosed by string delimiters `"'.
  7. The fourth line of the table is the first row of data of the table, this and any immediately following line will be treated as table data.

However, there are more complex tables, which contain subtables, all of the latter have the same format (e.g. number and types of columns). These complex tables are generated e.g. from table of `TABLE VALUES'(1), which appears in the appendix A. The complex tables are basically the same as described above. Notably the rules 1--6 of the definition above do still apply. Here are the other rules which apply to the complex tables:

  1. the forth line must start with `-' and defines the titles of the columns of every subtable.
  2. the fifth line must also start with `-' and declares the data type of the columns of every subtable.
  3. any other line that is not preceded by a `-' starts a new subtable.
  4. any other line that is preceded by a `-' is a row of the subtable that started recently.

The meaning of the rows is slightly different or extended from those of the simple table. The idea is, that we generate two relations, from the complex table. One is the main table (i.e. the table that results if we delete any line that starts with an `-'), while the other is a relation, that is constructed from the main table and the subtables.

Let R = {t1, ..., tc} be a relation of cardinality c, where each tuple is t = <ri1, ..., rin> for i running from 1 to c. R corresponds to the main table. For each ti there is a relation Si(si1, ..., sim) which corresponds to a subtable. If ri1 is a key to R then T(ri1,si1, ..., sim) is a relation which is equivalent to S, and which corresponds to the second relation we produce from the complex table.

To give examples rather than exhaust the reader with definitions, first we present a simple table:

DATA TYPE
DATA TYPE:DESCRIPTION:LENGTH
sym:str:num
AD:ADDRESS:
DT:DATE:8
...
PN:PERSON NAME:48
TX:TEXT:

Here is an example for a complex table:

TABLE
TABLE#:DESCRIPTION
num:str
-VALUE:DESCRIPTION:VALUE#
-sym:str:num
0001:SEX
-F:Female:000345
-M:Male:000344
0002:MARITAL STATUS
-D:Divorced:000350
-M:Married:000348
-S:Single:000349

From the latter table we'll produce two relations, as if they had been defined as follows, first the main table and then the constructed table:

TABLE
TABLE#:DESCRIPTION
num:str
0001:SEX
0002:MARITAL STATUS

VALUE
TABLE#:VALUE:DESCRIPTION:VALUE#
num:sym:str:num
0001:F:Female:000345
0001:M:Male:000344
0002:D:Divorced:000350
0002:M:Married:000348
0002:S:Single:000349

Note from the examples, that the title of the derived table is the title of the first column of it.

The filename conventions for these interim tables are not uniform. A simple table ends with `.tb'. However, a table which was derived from a segment definition is named with a trailing `.stb'.


Go to the first, previous, next, last section, table of contents.