Go to the first, previous, next, last section, table of contents.

On the abstractness of abstract syntax in HL7

The HL7 documentation claims, that it complies to the idea of the OSI reference model with it's seven distinct layers. There are terms like `abstract syntax' vs. `encoding rules' frequently used in the specification. However, as we already stated above (see section A view on HL7), these distinctions are not always made as strict as the document claims. Let us see here, why this is so.

There are the HL7 encoding rules, which are meant as an interim standard made available until there are implementations of OSI standards. These encoding rules are very simple: any data is represented as a string of displayable ASCII characters with a set of five delimiters defined, which terminate the data items. Since it is thus very unlikely, that any byte of data will interfere with the underlying transport mechanism, it is possible even for the simplest kind of text processor, batch file or serial line to transmit the HL7 messages. However, what seems like an advantage on the very first view, turns out to have a considerable impact on the higher levels of abstraction. These encoding rules impose the restriction to the higher levels of the protocol stack, that they may not send unprintable characters or even may not use the delimiter characters as data. A presentation layer, that forwards it's task up to the higher layers is of pretty little use. It rather should make it's mechanisms and those of the underlying layers transparent to the upper layers.

One could argue here, that the HL7 encoding rules were not meant to be perfect, and that better encoding standards that are now available wold replace them. The abstract message definition, as the heart of HL7 would allow this. However, there are parts of the encoding rules, which have taken a place even in the abstract message definition: The MSH segment defines the `encoding characters' as being the first data field of the MSH. This is wrong. It would have been easy to let the negotiation about encoding characters be part of the LLP.

These problems notwithstanding, there is a way to overcome at least parts of the problems that the encoding rules impose on the higher levels of HL7: Since any data is converted to a string of printable ASCII characters, this problem is practically of little relevance for such data types, which represent numerics (i.e. NM, DT, TM, TS, SI). However, text data types (i.e. ST, TX, FT) are set directly by the application which should not be forced to worry about printability of ASCII characters. The encoding module must provide at least some transparency here, but HL7 defines no standard for this, even though the solution is so obviously at hand:

There is already one encoding character defined, which is called the `escape character'. The usage of escape characters are common among applications as well as communication programs. Escape characters are commonly used for two purposes:

  1. Protect data bytes from their misinterpretation as control bytes (e.g. as `\' used in shell programs, or the DEL character used in ANSI X3.28 transparent mode).
  2. Mark some sequence of bytes in a stream of data bytes to be interpreted as entities of control (often referred to as a `escape sequence') (e.g. ANSI terminal, TeX, SGML).

The HL7 escape character used to mark a sequence of control characters (as pointed out in number 2). Unfortunately, the usage of escape sequences is currently limited to TX and FT types. Escape sequences should, however, be aplicable for any type where data/control ambiguities might arise. Since the delimiter characters are readily available for redefinition this ambiguity might arise in the encoding of any data type. Consider some message that redefines the delimiters to be `+.:-?' instead of `|~^&\': Even the numerical an date/time data types must use the escape sequences to unambiguously encode their values.

But there is still a bigger problem: Some features of HL7 extremely corrupt the distinction between Abstract syntax (application layer) and the presentation layer. All these issues are concerned with length of fields or blocks. HL7 drags views which only exist on character streams far inside the abstract level, where we should rather deal with concepts than with strings.

The first issue is the definition of maximal lengths of fields which are not of string or text data type. These make illegal assumptions about representations of values. For example the length of a DT value does not belong into the description of a PID(10) segment, since this makes assumptions about how DT values are represented, which highly depends on the encoding rules used. To give a maximum length is not correct for the PN type too. PN type is a composite type which consists of 6 ST types, there is a maximum length defined to be 48 including the delimiter characters. Not only that delimiters should not be part of an abstract syntax, how can this restriction be applied? Two passes are needed for the correct encoding: the first pass had to assemble the PN encoding from the encoding of it's components, the second had to check the whole string that encodes the PN value for an exceeding length. If the length is more than 48, a crucial question arises: Which of the components is to be truncated? It is obvious that such a restriction is not implementable by a reasonable effort since this restriction is of no use at all but seems to exist merely for historical reasons. This might shed a light on the concept of data in earlier days of HL7: any data was obviously regarded as strings even numerics or composites.

While we could silently ignore the 48 characters restriction of PN, there are more assumptions being made about lengths which seem inadequate to the author. There is the method of continuation segments proposed in the HL7 standard. This is a feature which again loads burden onto the application that the lower layer protocol should carry. Thus an application would have to bother with the reassembling of continued messages which is extremely cumbersome. Segments are entities of data transmission and as such their integrity should not be touched on the application layer. There is hardly any need for the continuation of a segment if there is a proper lower layer protocol. Lengthy messages should be split into packets and reassembled which should all happen completely transparently to the application layer.


Go to the first, previous, next, last section, table of contents.