Go to the first, previous, next, last section, table of contents.

Extracting message definitions

Messages are syntactically defined in the HL7 document using the formal language to be described below in this section. We can recognize message definitions by their unique layout, which comprises 3 columns, normally separated by ASCII VT characters, except for the cases where the format was damaged or originally inconsistent. The first column contains the code in the formal language, the second column contains remarks and the third column contains the Chapter. The first row contains the message id and on it's last column the word "Chapter" or "Appendix" which we use as a keyword. Finally these tables are separated from the rest by one empty line, both at the beginning and at the end.

The following is an example of message definition as we find it in the file `kap2.txt', all literal VTs have been replaced by ` - '.

WRP - Widget Report - Chapter
MSH - Message Header - II
MSA - Message Acknowledgement - II
{ WDN - Widget Description - XX
  WPN - Widget Portion - XX
  { [WPD] } - Widget Portion Detail - XX
}

The syntax of the formal language is as follows

<message> ::= <message id> <group>
<group> ::= <item> | <item> <item>
<item> ::= <segment id> | `[' <group> `]' | `{' <group> `}'
<message id> ::= <id>
<segment id> ::= <id>
<id> ::= <uppercase> <upper or digit> <upper or digit>
<upper or digit> ::= <uppercase> | <digit>
<uppercase> ::= `A' ... `Z'
<digit> ::= `0' ... `9'

At this point we can however ignore the syntax,(2) we rather make the following textual changes:

  1. remove any white space
  2. change all upper case to lower case
  3. any opening bracket (`[') is replaced by `opt('
  4. any opening curly brace (`{') is replaced by `rep('
  5. any closing bracket (`]') or curly brace (`}') is replaced by a closing parenthesis (`)')
  6. append a comma `, ' unless
  7. remove any new line character (i.e. print anything on a single line)

The first id that was read, i.e. the one that happens to be on the header line of the table becomes the message id. The rest of the first line up to the keyword `Chapter' or `Appendix' will be recognized as the description. However, for the body of the table, we recognize but the first column, which we handle as said above. Things would have been easier, if there would not be some message definition tables breaking these rules. Some tables are formatted without the VT between the columns, which made it very hard to get rid of the other columns, while keeping the integrity of the first column.

The definition of each message is then stored into a single Prolog predicate message/4:

message(wrp,","widget report",[msh, msa, rep(wdn, wpn, rep(opt(wpd)))]).

The second argument of message/4 is the event type code, which further qualifies a message type. However, an event type code is not always specified (which we have to discuss later). This code -- if given at all -- can be found in the most recent heading of a subsection. One of the keywords `Event Code' or `Trigger Event' precedes the three character id of the event type code. That one is written as the second argument to the message predicate.

We should rather have scanned the recent heading of a subsection for a description of a segment, since this may uniquely describe one message referenced by a pair of message id and event type code. For now we have redundant descriptions that poorly specify the messages, which certainly has to be fixed soon.


Go to the first, previous, next, last section, table of contents.