Dear Collegues,

the following posting was intended to be a simple follow-up to the
``Ambiguity in ORU message'' thread, but it has grown rather large.
You might regard it as a proposal to the Control WG. I tried to give
an account of the whole ambiguity problem related with grouping of
segments. Thus, you might read the following as an introduction to a
proposal. The conclusions I make here are IMHO important to be obeyed
in the review of the current standard and in the design of new
messages.

Mark Shafarman wrote:

 > I'd state the problem the following way:
 >
 > The repeating OBX groups needs to be allowed to be 0-many, since there are
 > cases where one wants to use the ORU message as an unsolicited  results status
 > message without the OBX segments (there may be no result if a service has been
 > only scheduled, or if the specimen has been taken but not yet resulted).  The
 > real error in the message is the fact that the `]` follwing the OBX needs to
 > be after the repeating 0-many NTE's.
 >
 > Not this:
 >  
 >                 {
 >                   [OBX]     Result                        
 >                  {[NTE]}  Notes and comments             
 >                  }   
 >  
 > but this: 
 >                 {
 >                   [OBX     Result                        
 >                  {[NTE]}]  Notes and comments             
 >                  }   
 >
 > The same correction should be applied to the `]` after the PID segment.

O.K. I see your point. But I feel that there is an even deeper
discussion needed about the nature of groups. What is a group?

  A group is two or more segments (or again groups) that share
  a common fate regarding optionality or repetition, i.e. two
  or more entities enclosed within the same pair of braces (or
  brackets).

At least, this is, what a first observation of HL7 standard will show.
Unfortunately, the notion of a group is very much linked with the
notion of optionality or repeatability. Because all groups are
anonymous groups in the HL7 document it can be easily overlooked that
a group is not only a syntactical construct but usually has a specific
meaning (at least should have). Thus I would rather like to define a
group as folows:

  A group is two or more entities (segments or again groups) that
  are semantically linked, i.e. that form an entity by their meaning.

Thus, the concept of groups is totally independent of any concept of
optionality or repeatability but not vice versa: If two or more
entities form a unit by their meaning, they will never occur
independently from each other, i.e. their syntactical properties with
regard to a certain context will always be the same. Let me introduce
a new notation for groups: A group is designated by a pair of
parentheses. And thus, the following is the most inner group in the
ORU message:

	(
	  OBX
	  [NTE]
	)

Now let me explain what I mean by the shared syntactical properies
with regard to certain contexts. The group that was just displayed
might well occur in different contexts, notably in the ORU message, in
the ORF message and in some ADT messages (the latter is not absolutely
correct, because the ADT section doesn't use NTE segments). It is the
ORU (ORF) message syntax that decides about the property of this group
that it is repeatable, or to be more precise, it is the syntax of the
environment of the occurance of the group, that decides whether it is
optional or repeatable. By contrast: It is not the inner meaning of
the group by itself that it is repeatable, just like it is not the
inner meaning of a PV1 segment which makes it optional (in the ORU
message), but it is the context.
 
To see this is IMHO very important. It leads me to be uncomfortable
with the fact, that groups are not named throughout the HL7 standard.
What we should do is to define groups just as we define messages or
segments, i.e. give a name to it and describe it's intention/meaning,
restrictions etc. (but not whether it is ``repeatable'' or
``optional'', since this is largely independent from the group
itself!) To precisely define groups is necessary anyway, if we want to
redesign (or review) the HL7 standard under an object oriented
paradigma. Let me show what I mean by the now well-discussed ORU
example:

If I do nothing else than to decompose the definition of the ORU
message by groups, this definition appears like as follows. Please
note that I don't *change* anything from the original syntax
specification of the ORU message, I just decompose it.

ORU message:
	MSH				Message Header          
	{ Obs.per.Subject-Group }	Observation of a single subject
	[DSC]				Continuation Pointer


Obs.per.Subject-Group
	(
	  [ Patient-Group ]	Subject (patient or non-patient)
	  { Battery-Group }	The observations to the Subject
	)

Patient-Group:
	(
	  PID			Patient Identification   
	  [{NTE}]		Notes and comments       
	  [PV1]			Patient Visit            
	)

Battery-Group:
	(
	  [ORC]			Order common             
	  OBR			Observations Report ID   
	  {[NTE]}		Notes and comments       
	  { Result-Group }	A number of Results
	)

Result-Group
	(
	  [OBX]			Result
	  {[NTE]}		Notes and comments             
	)

Now let's have a look on what we have, group by group, thereby
starting at the innermost group, the Result-Group. Let us try to
describe the meaning of the Result-Group as follows:

     The Result-Group denotes a single result.

A single result would naturally consist of a result segment OBX which
may be followed by some notes and commands {[NTE]}. But, please, look
that the last sentence is obviously judged false by the definition of
the syntax of the Result-Group (which was not given by me but
inferenced formally, by decomposition): The definition says that the
Result-Group need not denote any result, because the OBX segment is
optional. Now finally, I hope, it becomes obvious, what I meant by
``possibly empty repeating groups''. The Result-Group as defined above
may well be empty! So there is now an alternative: Either we hold the
definition of the Result-Group to be correct and have to gain a new
understanding of it's meaning, or we keep our understanding and change
the definition.  Let's try to hold the definition since it is the
approved standard:

     The Result-Group denotes a single result or no result.

Which would imply that a result entity without any result would make
sense, I can see no sense in this. Thus, I vote to hold the meaning
and change the syntax appropriately:

Result-Group
	(
	  OBX			Result
	  {[NTE]}		Notes and comments             
	)

The Battery-Group is quite well explained in the HL7 Standard, which
says that:

    "The OBR segment provides information that applies to all of
     the observations that follow [...] For simplicity we will
     refer to the observation set as the battery" (v2.2 Page 7-2)

But Mark Shafarman has remarked, that a battery status might be to
report prior to any reportable results, in which case there is no OBX
segment to transmit. Unfortunately, this is not reflected by the
decomposed definition of the ORU syntax. But let me give a provoking
reason for that: The ORU definition as it is now is wrong! Why?
Because it implies a nonsense meaning to the Result-Group, and omits
an important feature from the Battery-Group! Mark Shafarman told us
how to fix the mistake in the definition of the syntax, as I have
cited him at the top of my message. And so, I will do. Let's fix the
definition of the Battery-Group, which reflects that a battery
information may contain zero or many results.

Battery-Group:
	(
	  [ORC]			Order common             
	  OBR			Observations Report ID   
	  {[NTE]}		Notes and comments       
	  {[ Result-Group ]}	A number of Results, or none
	)

Now, let's recompose the ORU message to meet the convention of the
message syntax definition of the current HL7 standard:

	MSH			Message Header           
	{			Observation per Subject Group
	  [			Subject Group
	    PID			Patient Identification   
	    [{NTE}]		Notes and comments       
	    [PV1]		Patient Visit            
	  ]
	  {			Battery Group
	    [ORC]		Order common             
	    OBR			Observations Report ID   
	    {[NTE]}		Notes and comments       
	    {[			Result Group
	      OBX		Result                        
	      {[NTE]}		Notes and comments             
	    ]}  
	  }   
	}   
	[DSC]			Continuation Pointer

To summarize what we have done: By a formal decomposition of the
grammar of the ORU message regarding groups, we realized two problems,
one in either of two groups. Both problems have been fixed
independently within either group. After recomposition we have a clear
and correct grammer with regard to the examinated problem. But we are
not finished yet, there is even more. Let's have a break:

			BREAK :-)

The original problem with the ORU message was not what has been just
solved. It was that the grammar of the ORU message has ambiguous
sentences. The sentence:

	MSH OBR PID OBR OBR

is perfectly correct with regards to the grammar, but we are unable to
say whether how the segments are grouped. Either,

	MSH ( OBR ) ( PID OBR OBR )

or 

	MSH ( OBR ) ( PID OBR ) ( OBR )

is a possible answer, but we do not know. This is again obviously a
fault of the grammar of the ORU message. The parentheses that I made
in the grouping alternatives correspond to the parentheses of the
Obs.per.Subject-Group that was found in the formal definition of the
ORU message. Why is it a problem and how could we solve it? The
problem stems from the convention of the HL7 standard to leave groups
annonymous. We recognize groups not because the standard tells us, but
because we recognize certain structures within the formal message
definition as groups. There might still be different viewpoints about
what makes up a group (anticipating some discussions about what I
postulated for the nature of a group above :-). In a similar sense,
two parser might be of different viewpoints, what makes up the groups
within the "MSH OBR PID OBR OBR" message. And in deed, if the standard
would teach us better, our (assumed) disagreement would be solved and
the disagreement between the two parsers would as well be solved, if
we would teach them by what they read: Encoding rules.

Don't worry, I am not going to propose a change in HL7 encoding rules
now, I just want to show the problem from the perspective of different
encodeing rules that are in use in EDI. Think of a strictly tagged
encoding scheme, that would present any data by the triple:

	<Tag, Length, Data>

In such an encoding scheme, the ambiguity that we have would never
occur, becausethe parsers would have to read something like:

  T= ORU Msg.
  L= up to here-----------------------------+
  D= T= Obs.p.S.Gr     	       		    |
     L= up to here---------------------+    |
     D= T= Batt.Gr		       |    |
	L= up to here-------------+    |    |
	D= T= OBR Seg.		  |    |    |
	   L= up to here-----+	  |    |    |
	   D= ...	     |	  |    |    |
     T= Obs.p.S.Gr<--<--<----+----+----+    |
     L= up to here---------------------+    |
     D= T= Subj.Gr.		       |    |
     	L= up to here-------------+    |    |
	D= T= PID Seg.		  |    |    |
	   L= up to here-----+    |    |    |
	   D= ...	     |    |    |    |
        T= Batt.Gr<----------+----+    |    |
	L= up to here-------------+    |    |
	D= T= OBR Seg.		  |    |    |
	   L= up to here-----+    |    |    |
	   D= ...	     |    |    |    |
     T= Obs.p.S.Gr<----------+----+----+    |
     L= up to here---------------------+    |
     D= T= Batt.Gr		       |    |
	L= up to here-------------+    |    |
	D= T= OBR Seg.		  |    |    |
	   L= up to here-----+    |    |    |
	   D= ...	     |    |    |    |
               <-------------+----+----+----+

Even though this figure might look confusing to the human eye, parsers
tend to love it, if they are trained in it. It is a perfectly well
understood encoding mechanism for nested structures and it is
successfully beeing used in ISO/OSI's basic encoding rules (BER) as
well as in the encoding of DICOM (ACR/NEMA) messages. With such an
encoding mechanism any entity is unambiguously identified by a tag
(T), and thus ambiguous sentences will never happen!

What can we learn for our HL7 from this? Our encoding rules are just
not capable of handling any nesting! So, if we want to stay with our
good-old-style encoding rules, for there is reason to do so, we have
to take special care about where our encoding rules will fail. But can
we know in advance whether they might fail or not?

In HL7, groups are implicit both in the standard document as well as
in the tarnsported messages. Whether a sequence of segments is a
certain group or not depends on whether the sequence matches the
definition of that group. There are cases where the parser can tell
from the first segment whether the sequence will *not* match, for
example in the Subject-Group defined above: A seqence of segments
which does not start by a PID can never match a Subject-Group. But the
parser can not tell if the sequence *does* match a group until it is
finished with the last entity that matches the group *and* if there is
no required entity which was not seen. Thus, any entity of the grammar
tries to match the maximum amount of segments. This leads to the
possibility of one entity hiding the other, which is the case in the
ambiguity shown above. There is no way out of this problem, except
by

+----------------------------------------------------------------+
!  taking care that no two groups that can immediately follow    !
!  each other in a message may ever be able to produce the same  !
!  pattern of segments						 !
+----------------------------------------------------------------+

How can we assure that this rule is violated by a message? It is not
easy to be sure by the first sight, since the rule is tricky: Segments
that can immediately follow each other in a message need not be in
close contact in the written grammar that defines the message, since
there might be optional parts separating entities in the grammar, that
can be missing in a concrete message. As a first rule of thumb, by
removing optional parts, we can tell if there are ambiguities in the
other parts.

Let's try this with the ORU message. I'll leave away the
Subject-Group, the ORC segment, the OBX segment and the DSR segment,
which were all optional, from the original (!) ORU message definition.
This yields:

	MSH			Message Header           
	{			Observation per Subject Group
	  {			Battery Group
	    OBR			Observations Report ID   
	    {[NTE]}		Notes and comments       
	    {			Observation Group
	      {[NTE]}		Notes and comments             
	    }  
	  }   
	}   

Look at the nesting of the form "{ { OBR ... } }". It is impressingly
clear, that there is no way to decide to which level of repeatition
any OBR segment belongs. This is an absolutely ambiguous structure in
the ORU message and deserves to be corrected. The sequence: "{[NTE]} {
{[NTE]} }" even shows us two of such ambiguities at once: First, we
recognize the same structure as above "{ { [NTE] } }" which must be
fixed. And second, the structure is preceded by another "{ [NTE] }"
which multiplies the ambiguity. To summarize, we have found that any
grammar that, after removal of optional parts, shows up one of these
four structures or any combination thereof, is known to be ambiguous.

(1)    { { X } }
(2)    { X } [ X ]
(3)    [ X ] [ X ]
(4)    { [ X ] [ Y ] }

When we write the grammar rules in BNF form and feed them to yacc, we
get warnings about conflicts in all four rules. Below is the BNF
grammar for each of the four rules that can be fed to yacc (where X
is a terminal symbol). If you try yacc on any one of these rules, you
will get the shift/reduce or reduce/reduce conflict warnings:

(1)    a: b | b a;
       b: X | X a;

	shift/reduce conflict

(2)    a: b c | b;
       b: X | X b;
       c: X;

	shift/reduce conflict

(3)    a: b c | b | c;
       b: X;
       c: X;

	reduce/reduce conflict

(4)    a: b | b a;
       b: X Y | X | Y;

	shift/reduce conflict

A practical rule to avoid these conflicts is this:

|  Any group must have at least one required segment that is  |
|  unique throughout all the groups of a certain message.     |

This is not only useful to avoid ambiguities, but can also help
understand the meaning of a group as we have seen in the Result-Group.
A group where none of the entities is required is likely to reveal a
conceptual mistake, like the Result-Group without a result (OBX)
segment.

It is very interesing here, that the ADT Working Group has (by good
intuition?) always been aware of this problem and assured that the
above rule is never violated! On the other hand, the Financial Working
Group did the mistake as well as the Observation Results Group. Look
at the definition of the BAR message:

BAR
	MSH             Message Header                 
	EVN             Event Type                     
	PID             Patient ID Information         
	{
	  [ PV1 ]       Patient Visit                  
	  [ PV2 ]       Patient Visit - Additional Info
	  [{ OBX }]     Health Information             
	  [{ AL1 }]     Allergy Information            
	  [{ DG1 }]     Patient Diagnosis              
	  [{ PR1 }]     Procedures                     
	  [{ GT1 }]     Guarantor                      
	  [{ NK1 }]     Next of Kin                    
	  [
	   { 
	     IN1        Insurance                      
	   [ IN2 ]      Insurance - Additional Info.   
	   [ IN3 ]      Insurance - Add'l Info. - Cert.
	   }
	  ]   
	  [ACC]         Accident Information           
	  [UB1]         Universal Bill Information     
	  [UB2]         Universal Bill 92 Information  
	}

The above column of optional entities within the outer repeating group
shows the same kind of ambiguitiy as was exhibited as structure (4)
above. Thus "MSH EVN PID OBX OBX" is a correct but ambiguous sentence.

Most interesting is that the Order Entry Working Group, which uses the
Result-Group as the Result Report Working Group, has done it's job
right by defining it as follows:

       [
        {
         OBX              Results Segment
              [{NTE}]     Notes and Comments (for Results)
         }
       ]

which is even nicer than what was demanded here by avoiding the awful
{[X]} construct, which would be ambiguous to a parser that does not
handle it as it would handle: [{X}]. The problem is, that [X] is
perfectly matched by just nothing, leaving the parser in an infinite
loop on the repeatition with an infinite number of absent optional
segments. However this problem is most commonly avoided by the parser
who is told to consume a token in any step or else use a different
rule. But anyway [{X}] is more correct than {[X]}.

Now the final task of this paper is to propose fixes for the mistakes
left in the BAR and ORU message. Since, as an europeean, I am not
privy to the deeper meanings of the BAR message, wich is designed
primarily for American hospitals, I can not provide a ready made
solution. But I would ask the Financial WG to decompose the message
regarding groups (which is easy), find and name a meaning of the
outer group, then try to figure out the difference between "( DG1 DG1
)" and "( DG1 ) ( DG1 )". Can there be a diagnosis without visit? Why
is the PV1 segment left optional?

For the ORU message my proposition is this: Define the Result-Group
like in the Order Entry chapter. And disallow the
Obs.per.Subject-Group.  Only the ORU and ORF sets a whole message body
into a repeating group. The HL7 encoding rules simply can not handle
this kind of nesting!

Let me conclude with the the main sentences of this posting which I
propose to be obeyed throughout the review and new design of the HL7
standard:

The definition of a group:

* A group is two or more entities (segments or again groups) that
  are semantically linked, i.e. that form an entity by their meaning.

Make grouping explicit!

* Groups should be named and described just like messages and segment
  should be named and described.

Account for the meaning of groups!

* Groups *must* make sense in the light of decomposed grammar
  definitions.

Ambiguity *must* be avoided!

* No two groups that may immediately follow each other in a message
  may ever be able to match of the same pattern of segments.

A useful rule to avoid ambiguity is:

* Any group must have at least one required segment that is unique 
  throughout all the groups of a certain message.

regards
-Gunther Schadow