This is Info file ProtoGen.info, produced by Makeinfo-1.64 from the
input file ProtoGen.texi.

   This text describes the implementation of HL7 that is being done at
the Universitätsklinikum Steglitz in Berlin. It is meant as a report
about the work in general as well as a manual for the software that is
about to be developed.

   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Free Software Foundation.

   Copyright (C) 1994, 1995, 1996 Gunther Schadow


File: ProtoGen.info,  Node: Consistency check,  Next: The data item numbers,  Prev: Errors,  Up: The HL7 database

Consistency check
=================

   Being mounted with the insight of the last section, we can now start
to check our database. We will have to check for

   * tuples with undefined keys

   * whether one relation is included in another

   * consistency of conceptually redundant data

   * missing data

   * syntax errors in message definitions

   During the rest of this section, we will merely see the output that
was generated by the Prolog check clauses:

     Removing Facts with undefined key value:
     retracting segment(G198, "")

   There was a completely void segment definition caused while chapter
1 was processed. Something has screwed up, we can safely ignore this.

   Now we're checking for tables which had not been defined in appendix
A, but were used in the chapters. `chptbl.pl' was loaded and compared
against table/3, result is `none', meaning that no undefined tables
have been found. Next, we check for one to one relationship between
table/2 and table/3, particularly if  table/3 contains all of table/2,
which is true. The descriptions of corresponding tables are consistent,
which means, that we can savely exclude table/2 from the further process
without loss of information.

     Performing checks on:
     *** Tables
     chptbl compiled, 0.05 sec, 2,120 bytes.
       tables which are not yet defined: none
       tables from table/2 that are not in table/3: none
       tables with conflicting descriptions: none

   The next part checks whether there are tables referred to for which
there are no values defined. This is tue for a lot of tables but is not
necessarily wrong. A table denoted as `user_defined' may well be
undefined in the HL7 standard. But how about tables marked as
`hl7_standard'? These may be references to tables defined by other
standards, however we'll remember this problem in case we get into
trouble but leave this alone for now. We also check if these missing
tables are defined somewhere in the chapters, but they are in neither
case. Finally we check for values which do not belong to a table, which
would point us towards an error in our AWK scripts, but this is not the
case.

       tables without any value definition:
         for class `hl7_and_user': 111,112,101,30,90,63.
           in the chapters: none
         for class `hl7_standard': 50,31,13,25,59,51,52,55,56,37,14,26,18,
                                   75,17,29,35,41,15.
           in the chapters: none
         for class `user_defined': 117,23,19,120,21,22,32,43,44,45,46,47,
                                   49,114,113,57,66,60,24,64,96,68,69,42,
                                   72,86,73,79,118,81,83,84,10,87,88,89,92,
                                   93,94,115,110,98,99.
           in the chapters: none
       values without any defining table: none

   We do a similar processing of segment as we did with table above,
i.e.  checking for containment and conflicting descriptions, and do not
find any surprise.

     *** Segments
       segments from segment/2 that are not in segment/3: none
       segments with conflicting descriptions: none

   Here follows the proof of our assumptions we made about the
one-to-one relationship of field and data element. The extra check on
conflicts in length is of historical reasons and will disapear, since
it is done again by the general check for conflicts

     *** Data elements
       conflicts in length:
         for data_element/9: none
         for field/10: none
       data_element/9 not in field/10: none
       data_element/9 referenced by more than on field/10: none
       field/10 and data_element/9 conflicts: [obx,4,769],[obx,6,562].

   We see that our proof for one-to-one relationship succeeds, although
there are inconsistencies concerning the data types of two pairs of
field and data element, which are marked below:

     field(obx,4,20,*st*,_,_,_,00769,"observation subid").
     data_element(00769,"observation subid",obx,anr,20,*nm*,_,_,_).
     
     field(obx,6,20,*st*,_,_,_,00562,"units").
     data_element(00562,"units",obx,anr,20,*id*,_,_,_).

   Next, check is as self explaining as it succeeds well.

     *** Fields
       field/10 without segment/3: none
       segment/3 without field/10: none

   Finally there are the checks on messages. Message definition syntax
is error free because Prolog has filtered out the mismatched
parentheses.  Nevertheless the problem was not corrected by Prolog, it
was just removed. No syntax error does merely mean that the conversion
scripts have their job well done.  But then there are undefined message
types.  We don't care about the stuff from the `Widget' unfunctional
area.  That the messages of appendix C do not appear in the list of
message types in appendix A is a fact that makes us worry once more
about the `computerized data dictionary' which is said to be used to
generate all the tables. Why are there inconsistencies? Why missing
objects? In fact some people think that the computerized data
dictionary is just a phantom. The ADR message type is unknown as well,
but these problems could be fixed easily once we know about them.

   However, undefined messages are not at all easy to fix, because how
should we know about them if we are told nothing but its name and
functional area? Note, that ORF and ORM are missing because of the
syntax error above. While OCF has disappeared since v2.2, ARD and OSQ
are still hanging around in the tables. We assume them to be fired next
time as well.

   Finally -- again forget about the widgets -- there are messages which
refer to segments which are not defined. This is the worst error, that
we have detected here, because it makes whole messages unimplementable.
However, if we subtract the `ms'-typo in an ACK of which there are many
other definitions as well, and once we know, that we have to treat
`ANY' segments specially, there remains PD1, patient demographics,
which is undefined. After all v2.2 has silently fired this segment as
much as v2.1 silently contained this segment.

     *** Messages
       message definition syntax errors: none
       undefined message types used: adr,nmd,nmq,nmr,wro,wrp.
       undefined message: ard,ocf,orf,orm,osq.
       undefined segments used: [wro,wid],[wrp,wdn],[wrp,wpn],[wrp,wpd],
             [wrp,wdn],[wrp,wpn],[wrp,wpd],[wrp,wdn],[wrp,wpn],[wrp,wpd],
             [ack,ms],[adt,pd1],[orr,any].


File: ProtoGen.info,  Node: The data item numbers,  Next: On the abstractness of abstract syntax in HL7,  Prev: Consistency check,  Up: The HL7 database

The data item numbers
=====================

   The relations for fields, tables and values have each one domain
that is made up from a kind of integer numbers, i.e. strings of a few
numbers often beginning with some zeroes. These aren't really numbers
because obviously, there is a significance given to leading zeroes.
Thus a five digit string denotes a field, a four digit string denotes a
table, and a six digit string denotes a value. If we want to regard
these digit strings as numbers we have to admit, that these numbers are
not a unique classification of data items, but not more than a key to
the relation, where otherwise would be no simple key. In fact the only
usage of these keys is for the relation of tables, while fields and
values have a composite key which is sufficient. In the latter cases,
we won't make use of the digit strings.

   Nevertheless, there is evidence that what was intended with the data
item numbers is a kind of classification of data items of HL7. How else
would this meaning that was given to the leading zeroes be explicable?
It wold have been sufficient just to assign numbers which can not
uniquely specify an HL7 data item but which do their job within a single
relation. Anyway, we don't need a HL7 data classification here.


File: ProtoGen.info,  Node: On the abstractness of abstract syntax in HL7,  Next: On trigger events,  Prev: The data item numbers,  Up: The HL7 database

On the abstractness of abstract syntax in HL7
=============================================

   The HL7 documentation claims, that it complies to the idea of the OSI
reference model with it's seven distinct layers. There are terms like
`abstract syntax' vs. `encoding rules' frequently used in the
specification. However, as we already stated above (*note A view on
HL7::.), these distinctions are not always made as strict as the
document claims. Let us see here, why this is so.

   There are the HL7 encoding rules, which are meant as an interim
standard made available until there are implementations of OSI
standards. These encoding rules are very simple: any data is
represented as a string of displayable ASCII characters with a set of
five delimiters defined, which terminate the data items. Since it is
thus very unlikely, that any byte of data will interfere with the
underlying transport mechanism, it is possible even for the simplest
kind of text processor, batch file or serial line to transmit the HL7
messages.  However, what seems like an advantage on the very first
view, turns out to have a considerable impact on the higher levels of
abstraction. These encoding rules impose the restriction to the higher
levels of the protocol stack, that they may not send unprintable
characters or even may not use the delimiter characters as data. A
presentation layer, that forwards it's task up to the higher layers is
of pretty little use. It rather should make it's mechanisms and those
of the underlying layers transparent to the upper layers.

   One could argue here, that the HL7 encoding rules were not meant to
be perfect, and that better encoding standards that are now available
wold replace them. The abstract message definition, as the heart of HL7
would allow this. However, there are parts of the encoding rules, which
have taken a place even in the abstract message definition: The MSH
segment defines the `encoding characters' as being the first data field
of the MSH.  This is wrong. It would have been easy to let the
negotiation about encoding characters be part of the LLP.

   These problems notwithstanding, there is a way to overcome at least
parts of the problems that the encoding rules impose on the higher
levels of HL7: Since any data is converted to a string of printable
ASCII characters, this problem is practically of little relevance for
such data types, which represent numerics (i.e. NM, DT, TM, TS, SI).
However, text data types (i.e. ST, TX, FT) are set directly by the
application which should not be forced to worry about printability of
ASCII characters. The encoding module must provide at least some
transparency here, but HL7 defines no standard for this, even though the
solution is so obviously at hand:

   There is already one encoding character defined, which is called the
`escape character'. The usage of escape characters are common among
applications as well as communication programs. Escape characters are
commonly used for two purposes:

  1. Protect data bytes from their misinterpretation as control bytes
     (e.g.  as `\' used in shell programs, or the DEL character used in
     ANSI X3.28 transparent mode).

  2. Mark some sequence of bytes in a stream of data bytes to be
     interpreted as entities of control (often referred to as a `escape
     sequence') (e.g.  ANSI terminal, TeX, SGML).

   The HL7 escape character used to mark a sequence of control
characters (as pointed out in number 2).  Unfortunately, the usage of
escape sequences is currently limited to TX and FT types. Escape
sequences should, however, be aplicable for any type where data/control
ambiguities might arise. Since the delimiter characters are readily
available for redefinition this ambiguity might arise in the encoding of
any data type. Consider some message that redefines the delimiters to be
`+.:-?' instead of `|~^&\': Even the numerical an date/time data types
must use the escape sequences to unambiguously encode their values.

   But there is still a bigger problem: Some features of HL7 extremely
corrupt the distinction between Abstract syntax (application layer) and
the presentation layer. All these issues are concerned with length of
fields or blocks. HL7 drags views which only exist on character streams
far inside the abstract level, where we should rather deal with concepts
than with strings.

   The first issue is the definition of maximal lengths of fields which
are not of string or text data type. These make illegal assumptions
about representations of values. For example the length of a DT value
does not belong into the description of a PID(1) segment, since this
makes assumptions about how DT values are represented, which highly
depends on the encoding rules used.  To give a maximum length is not
correct for the PN type too. PN type is a composite type which consists
of 6 ST types, there is a maximum length defined to be 48 including the
delimiter characters. Not only that delimiters should not be part of an
abstract syntax, how can this restriction be applied? Two passes are
needed for the correct encoding: the first pass had to assemble the PN
encoding from the encoding of it's components, the second had to check
the whole string that encodes the PN value for an exceeding length. If
the length is more than 48, a crucial question arises: Which of the
components is to be truncated? It is obvious that such a restriction is
not implementable by a reasonable effort since this restriction is of
no use at all but seems to exist merely for historical reasons. This
might shed a light on the concept of data in earlier days of HL7: any
data was obviously regarded as strings even numerics or composites.

   While we could silently ignore the 48 characters restriction of PN,
there are more assumptions being made about lengths which seem
inadequate to the author. There is the method of continuation segments
proposed in the HL7 standard. This is a feature which again loads burden
onto the application that the lower layer protocol should carry. Thus an
application would have to bother with the reassembling of continued
messages which is extremely cumbersome. Segments are entities of data
transmission and as such their integrity should not be touched on the
application layer.  There is hardly any need for the continuation of a
segment if there is a proper lower layer protocol. Lengthy messages
should be split into packets and reassembled which should all happen
completely transparently to the application layer.

   ---------- Footnotes ----------

   (1)  there is an inconsistency in v2.2 which tells that PID field 7
be a `TS' value of length 8


File: ProtoGen.info,  Node: On trigger events,  Next: On the null value,  Prev: On the abstractness of abstract syntax in HL7,  Up: The HL7 database

On trigger events
=================

   The trigger events build a link from HL7 transactions to the real
world.  Circumstances are described, in which a certain transaction is
initiated. There is a many to one relation between trigger events and
message types. Especially the ADT message is a superset of many messages
which are though syntactically similar, distinct in their contents and
purpose.

   There is a classification of trigger events in the table of trigger
event codes. However, there are more trigger events than are listed in
the table 0003. From the view of the HL7 data base, the concept of event
type codes is unfortunately degraded to merely a table of subselectors,
which are defined only for those events, for which there is a many to
one relation to message types. It is desirable to have a complete list
of event codes, so that a message would be uniquely referred to by it's
event code.

   Until then we have to either refer to a message by specifying the
message type and the event code (if there is one), or we have to merge
the tables of message types and event codes into one such that any
message type that has several event codes is removed from the merged
table.


File: ProtoGen.info,  Node: On the null value,  Prev: On trigger events,  Up: The HL7 database

On the null value
=================

   HL7 makes a distinction between values that are not present and those
that are null. While the meaning of `not present' is obvious (it could
be paraphrased to `unknown'), the semantics of the null value remain
somewhat confusing, even though the null value is repeatedly mentioned
in the HL7 document.

   The problem here is to get an idea of consistent general meaning of
the null value, and the crucial question is:

     What does a null value mean in a NM field?

   the HL7 document (v2.2) tells the following (Page 2-5):

     `The difference [between not present and null] appears when the
     contents of a message will be used to update a record in a database
     rather than create a new one.  If no value is sent, (i.e., it is
     omitted) the old value should remain unchanged.  If the null value is
     sent, the old value should be changed to null.'

   But what does this notation of `null' mean with respect to NM or DT
fields? The problem fades away, if we do regard any data transmitted by
HL7 as strings. However here we want to provide a mapping from diverse
data types to HL7. Should we simply map `""' to `0' then? Even though,
this would solve the problems wit NM types, it would cause nonsense
values in other fields like TS.

   There are two ways out of this paradox:

   * restrict the null value to string like data only

   * open up the meaning of the null value so, that it can apply to any
     data type.

   Here we try to go the second way and assume an object, which is not
present as unknown, whereas null indicates that an object is known to be
non-existent or that it doesn't make sense in a certain context. This is
still consistent to the interpretation of the null value concerning data
bases, that is given in the HL7 standard: An unknown value will not
cause the data base to be changed, but will rather be bound to what is
stored in the data base.

   It still remains subject to further discussion how the difference
between a string of length zero and a null value would have to be
represented and interpreted. For example, a null string could be encoded
as `\0' (or as `""' while the null value would be `\0').  The string of
length null updates a data base field to a string of length null, while
a null value updates the data base to a value meaning `value does not
exist'. Thus the HL7 protocol would be able to handle nulls as they are
proposed for the nested universal relational database model (see LEVENE
(1992)) to handle incomplete information. Since incomplete information
does concern medical informatics it is strongly recommended here, to go
this way of opening up the null value for a general well defined usage
in any data type.

   However, a problem still remains as the scope of the null value is
unclear in a composite data type: If a CM field is received as being
`""', what does it mean? Does it mean that the first component of the
field is null and the other components are not present? Or does it mean
that the whole CM data is null? This ambiguity could be solved by
further interdicting the deletion of delimiters which terminate trailing
items which are not present. Thus `1234^^' must not be truncated to
`1234'. However the error tolerance is a feature which is relied on in
the extensions of v2.2. Therefore it should be hard to convince the HL7
committee to change this. But other ways can be found (and should be
found) to overcome this ambiguity.


File: ProtoGen.info,  Node: Generating C++ code,  Next: Integration into the system,  Prev: The HL7 database,  Up: Top

Generating C++ code
*******************

   This chapter of is dedicated to the actual implementation of the HL7
standard, whose specifications was made available to machine processing
by the process shown in the preceding chapters.  I have already shown
some points, which will become relevant in the implementation. This
chapter will begin by presenting the general concept that was assumed
for the implementation. We will then shed a light on the compiler, that
does the job of translating the data base into program code.

* Menu:

* General concept::
* I/O methods::
* The code generator::


File: ProtoGen.info,  Node: General concept,  Next: I/O methods,  Prev: Generating C++ code,  Up: Generating C++ code

General concept
===============

   The object oriented programming technique as provided by C++ allows a
very natural view on the HL7 data items. Any data item is an object of
HL7 (HL7 object). Individual data types, segments or messages are
special objects, which share common properties. We define the properties
that are common to all HL7 objects as the class "HL7Object".  The class
"HL7Object" is an abstract base class which still contains some method
and data instances including the flag that shows if an object is
present or not.

   There is a hierarchy of objects as shown in figure 4. Basically there
is the HL7Object, which is inherited by any other data object. Then come
the basic HL7 types and the set of delimiters, which could be regarded
as a special data type as well. These basic types have been implemented
manually since the basic types are quite heterogeneous but only a few
in all. It doesn't seem appropriate to implement them automatically from
some kind of a data base. Moreover, the description of data types in the
HL7 standard is presented in a narrative form which is hard to scan for
specifying data.

   The composite objects are however generated from a simple relation
that describes their contents. Normally a component is a basic data
type (with some exceptions, when a component is itself a composite, see
below (*note I/O methods::.)), which reduces the organization in a
composite data type to just collect the basic data types. The relation
that describes the composites is manually edited.

   Segments are described in the HL7 standard by tables, which we have
brought in the form of a data base. The implementations of segments are
generated from this data base. So are the implementations of messages.
There is a need for an abstraction of segments to be introduced: the ANY
segment. This is a segment, that can be of any type. Since segments are
encoded as tagged data, i.e. data that is preceded by some identifying
code, we can exactly discriminate the types of segments by the tag (the
segment id). This is not possible with data types, since there are
ambiguities and no tag.

   What we just told about the ANY segment applies for the ANY message
as well. However, to discriminate messages is more complex than it is
with segments, since the information about message types is scattered
throughout several segments.

   There are several qualities which are common to all HL7 objects, we
present these during the following subsections.

* Menu:

* Names::
* States::
* Class and type::
* Repetition and optionality::
* Register::
* Methods::


File: ProtoGen.info,  Node: Names,  Next: States,  Prev: General concept,  Up: General concept

Names
-----

   In program generators there is the consideration of how to choose
names for the various objects. The higher the complexity of the data
structures the more distinct names have to be given to objects. There is
always a tradeoff between highly descriptive names, which are long and
thus hard to type, and short names which are easier to type but less
descriptive. Even though program generators do well without descriptive
names, the end user of the libraries which are to be built are human
beings, programmers, for which the naming should be convenient. Because
C++ allows overloading of functions names can be shorter without the
risk of name conflicts.

   Here we have make the following naming conventions. Classes for data
types and segments are given the name of their two or three characters
id in upper case, with the small qualifier `typ' or `seg' directly
attached to it. Thus `NMtyp' is the class of numerics and `MSHseg' is
the class of message header segments. Message classes are named a
similar way by attaching `msg' to the uppercase letters which are taken
either from the list of message type ids or from the list of event type
codes(1). Note that we try to avoid using the underscore character `_',
which tends to merely lengthen the name, readability can be acheived by
altering uppercase and lowercase letters as well.

   Instantiations of classes, i.e. variables as well as symbolic names
of table values are given a name which is derived from the objects
description. There is a simple fuunction which produces valid C names
from arbitrary text strings. The function is given three arguments. the
text string, the "threshold length" and the "truncate length".  All
words in the string are concatenated with each first letter in upper
case and all other letters in lower case. The words are truncated to the
truncate length if they are longer than the threshold length. Typically
the threshold length is greater than the truncate length, thus allowing
short words to be completed while longer words to be truncated to a
short length. For example if the threshold is 5 and the truncate length
is 3 a description like "Patient Visit - Additional Info." becomes to
`PatVisitAddInfo'.

   This produces names which are sufficiently unique in most cases.
However there are ambiguities in table values, which require a
refinement of the name assembling algorithm. These ambiguities occur,
when there are single words which begin with a common prefix like the
latin preposition "intra" or the greek quantifier "milli". Thus
"intravenous" and "intradermal" both become `Int'. This can be overcome
by splitting such composite words into two ("intra venous" giving
`IntraVen').

   ---------- Footnotes ----------

   (1)  actually we merge these two lists into one


File: ProtoGen.info,  Node: States,  Next: Class and type,  Prev: Names,  Up: General concept

States
------

   Any object according to HL7 must have (at least) the following
states:

  1. present or not present

  2. null or something other than null

   The status qualities of the object can be seen by the member
functions named `is<quality>()'. However, modifying a status flag does
not always make sense. Thus the `present' quality can only be set, and
the `null' quality can only be cleared, by a class which is permitted
to modify the components of the object (i.e. the object itself including
derived objects). On the other hand, unset() and nullify() which makes
an object not present or null respectively, may be called from any
context which has an instance of an HL7 object.

   There are two other status flags, which are not defined in HL7, but
are useful as a method to test the integrity of the Object. These
qualities are

  3. broken

  4. zombie

   A `broken' object is one that was not initialized correctly.
Normally a broken condition should end up in an exception, because it is
always the result of a programming fault, however exception handling is
currently not supported by some C++ compilers. Until then, it could make
sense to use the broken bit.

   The status of a `zombie' means, that an object was destroyed before:
either because it was explicitly deleted or that it went off the scope.
Such an object is likely to contain damaged values.  If an object is
accessed via pointers and it happens that the zombie flag is set, there
is a programming fault. These events should result in an error exit
(with a core dump).

   Finally there is a bit, which shows whether an object is atomic or a
repeated object.  This bit logically belongs to the class of the object,
which is presented below (*note Repetition and optionality::.).

  5. repeat


File: ProtoGen.info,  Node: Class and type,  Next: Repetition and optionality,  Prev: States,  Up: General concept

Classes and types of objects
----------------------------

   An HL7 object may be of one of the following classes, which is not
to be confused with a C++ class:

`dset'
     the set of delimiters

`message'
     a HL7 message

`group'
     a group in a HL7 message

`segment'
     a segment

`datyp'
     a data type, either atomic or composite

   Except for the set of delimiters, any class may be repetitive, which
is reflected in the repeat bit in the status byte. An HL7 object of a
certain class may be of some type. For example, if the object is a
segment, then the type would be MSH or QRR etc. The appropriate values
are taken from the table, which defines the message type, segment id,
and data types.

   The information about Class and Type is not as vital as is the status
information. In fact, there are object classes which do not have a type
(group, dset) or which can not repeat (dset). However, there are cases,
when e.g. a segment has to be read, without knowing, in advance, what
segment this will be. Therefore, we need a method to identify the type
of a segment anyway, thus it seems reasonable to let the object identify
itself. Classes and types can be set only by the object's setclass() and
settype() members, which are not public. However the public can see the
class and type of an object with theclass() and thetype().


File: ProtoGen.info,  Node: Repetition and optionality,  Next: Register,  Prev: Class and type,  Up: General concept

Repetition and optionality
--------------------------

Repetition
..........

   Repetition is modeled by the class `repeated', which is defined
using the template feature and thus can be easily reused for any class
of HL7 objects. The `repeated' class is a derived class from the class
that is to repeat, enhancing the latter merely by a pointer to the next
object which is called `cdr', thus any repeated object is organized as
a linked list. This is a more homogeneous approach than using arrays,
since there is often no maximal repetition specified.  There are member
functions which allow the input and output of repeated object as well
as an easy to use access to any member of the list, which is
syntactically the same as referencing a member of an array. This could
be achieved by overloading the array reference operator `[]'.  The
assignment operator is as well overloaded in order to make a copy of
the list rather than just assigning the reference.

Optionality
...........

   Since the HL7 specifications demand a tolerance in respect to
missing or unexpected objects, we do not care much about whether a
segment is marked as required in the specifications. It is rather up to
the application to reject incomplete messages, while unexpected objects
tend to be just ignored.


File: ProtoGen.info,  Node: Register,  Next: Methods,  Prev: Repetition and optionality,  Up: General concept

Register
--------

   Since most C++ compilers impose a minimal size for a class, there may
be some bytes left, which can be used by derived classes or are wasted
otherwise. Sometimes, a derived class just needs one bit of information,
which if stored individually in the space of the derived object, would
require another word of minimal alignment width. In order to save memory
space, the derived classes are invited to use the `register' for the
storage of their flags. sreg(), creg() and greg() give arbitrary access
to the flags. To prevent conflicts along a derivation chain, the usage
of the register must be documented in a register allocation table.


File: ProtoGen.info,  Node: Methods,  Prev: Register,  Up: General concept

Methods
-------

   Any object has at least the following methods:

  1. Two constructors, one with an empty parameter list and one with
     all components, which are accessible by the public. The former sets
     the object to non existent, the latter sets the object to existent
     and initializes it with the given parameters. The types of the
     parameters are the same as the types of the components. It is
     optional, to provide a constructor which takes C base types as
     parameters and performs the conversion.

  2. A destructor, which tends to be an empty function. However,
     dynamically allocated memory is freed from here.

  3. Selector(s) or extractor(s) one for each component of public
     relevance.  Extractors are canonically named `get<component>',
     where <component> is the name of the component that the extractor
     will access. The component is not returned as the value of the
     function but is assigned to a reference parameter. Selectors
     return an integer value reflecting the success of the extraction.
     A return Value of zero means that the extraction was performed
     successfully. Values less than zero mean that the object itself
     can not perform the extraction (often because it is not present or
     is null).  A Value greater than zero tells that the returned
     object is unhealthy in any way.

  4. Modifier(s) one for each component of public relevance. Modifiers
     always set an object to be present. nullify() is a special
     modifier.

  5. Input method, returns 1 if reading was successful, 0 otherwise.

  6. Output method. Input and output methods can handle different
     encoding rules depending on some properties of the stream. See
     *Note I/O methods:: for more.


File: ProtoGen.info,  Node: I/O methods,  Next: The code generator,  Prev: General concept,  Up: Generating C++ code

I/O methods
===========

   The encoded segments finally will appear on a stream as well as the
decoding will process data, that arrives on a stream. The stream may be
bound to a RS232 line, a TCP/IP line, a batch file, a pipe or whatever
medium is required.

   There is an option to choose, whether the output methods are to
produce HL7 encoding rule compliant data or human readable data, that
is used in order to debug the protocol. This human readable data will
be output in a LISP like style, since LISP provides a simple notation
for complex objects. The style can however be modified on compile time.

   The mechanisms that C++ provides with the `iostream' library are
very powerful and elegant. However we have to extend the iostream object
by some variables, which allow us to set several states for the streams.
There are currently the following states that a stream can assume:

`hl7er'
     The stream transports HL7 encoding rules

`debug'
     The stream transports human readable data

`level'
     A small integer, which tells about the delimiters currently to be
     used.

   `hl7er' and `debug' are boolean values which are of course mutually
exclusive. They are however not represented in a single bit because
there may be more coding modes to be integrated into the system.

   The `level' is normally set to 0, but there are cases, when the
level of the stream must be increased. Consider a composite data type
CX, which does refer in one of its components to an other composite data
type CY: The components of CY thus become the subcomponents of CX.
However, the i/o methods of CY need to know if CY is regarded as a
composite field or as a composite component. If the latter is true, the
level is increased by one and CY uses the subcomponent delimiter to
terminate its components. However, it becomes clear that CY may not have
in turn a composite component, since there is no sub-sub-component
delimiter. In fact in v2.1 the subcomponent feature is never used.
Rather such composites are flattened terminating each component by a
`^' (for an example see the definition of the CN data type in chapter
2).


File: ProtoGen.info,  Node: The code generator,  Prev: I/O methods,  Up: Generating C++ code

The code generator
==================

   The code generator is a program currently written in Prolog which
produces C++ code from the HL7 data base. This compiler is currently
under development and is still tentative. At this moment it generates
code for composite data types, segments and tables. The last step of
generating C++ classes for messages is not finished yet.

   The Prolog program was first created in a monolithic approach and has
then been split up into modules, which are far easier to maintain than
the monolith. There are tables scattered throughout the modules, which
enhance the data base which we have created by detailed information
about many different things including composite types, required methods
and their implementation etc.

   In order to keep the complexity of the compiler as little as
possible, the macro features of the C preprocessors are used
extensively. The distinction made between abstract handling of objects
(by macros) and concrete implementation (by the macro definitions) help
to ease porting of the code to different platforms.

* Menu:

* Tables::


File: ProtoGen.info,  Node: Tables,  Prev: The code generator,  Up: The code generator

Tables
------

   Tables are not regarded as HL7 objects as was described above (*note
General concept::.), since tables are not exchanged during
transactions. Rather tables provide means to interpret ID data fields
correctly. They are merely classes with an enum type which is a mapping
from symbolic names to integers, and a static array, which provides the
mapping from the integers to character strings. Finally some useful
member functions are provided which look up the item number of an ID in
the table of character strings or translate such a number back to an ID.
Even though these table objects are simple, it seems reasonable to
enhance them to interface a data base management system for bigger
tables as are classifications and other coding systems.

   Some identifiers used by HL7 are listed in a HL7 table. Notably the
message type identifiers. But there are others, such as segment and data
type identifiers, which are not listed in such a table. It would be of
some use for implementations to have these tables. Rather than keeping
distict tables for the implementation of the protocol and the
application using the protocol, there should be only one set of tables
which are uniformly used by both. Thus we will generate entries in the
`Table' and `Table Value' relation which list segment and type
identifiers. To avoid conflicts with table numbers we will count down
from the highest possible number (i.e. 9999 for tables and 999999 for
values).


File: ProtoGen.info,  Node: Integration into the system,  Next: Bibliography,  Prev: Generating C++ code,  Up: Top

Integration into the system
***************************

   The HL7 implementation as we have built it so far consists merely of
a library which handles encoding and decoding. However, means have to be
provided, by which these functions are controlled. In general, there are
many possibilities how to do this. Here we only give a short list of
what will be included into the system:

   * a batch file interpreter

   * a module that handles incoming unsolicited messages, i.e. a
     program that is invoced by the inter net daemon (inetd(8)) which
     listens at a certain TCP/IP port for incoming packets

   * a module that does the same job on RS-232 lines by listening for
     incoming messages

   * a module that helps connecting a HL7 server via TCP/IP or RS-232

   All these modules will use some kind of lower layer protocol as
described in the HL7 v2.1 document. While on TCP/IP links the minimal
lower layer protocol would be sufficient, the X3.28 based data link
protocol will have to be implemented to support RS-232 connections if
SLIP or PPP is not used.


File: ProtoGen.info,  Node: Bibliography,  Prev: Integration into the system,  Up: Top

Bibliography
************

KUPERMAN (1991)
     Gilead J. Kuperman, Reed M. Gardener, T. Allan Pryor: `HELP: A
     dynamic hospital information system', Springer-Verlag, 1991.

ROSE (1990)
     Marshall T. Rose: `The open book: A practical perspective on OSI',
     Prentice-Hall, 1990.

LEVENE (1992)
     Mark Levene: `The nested universal relation database model'; Vol
     595, G. Goos and J. Hartmanns (Eds.): `Lecture notes in computer
     science', Springer-Verlag, 1992.