Dear Collegues, the following posting was intended to be a simple follow-up to the ``Ambiguity in ORU message'' thread, but it has grown rather large. You might regard it as a proposal to the Control WG. I tried to give an account of the whole ambiguity problem related with grouping of segments. Thus, you might read the following as an introduction to a proposal. The conclusions I make here are IMHO important to be obeyed in the review of the current standard and in the design of new messages. Mark Shafarman wrote: > I'd state the problem the following way: > > The repeating OBX groups needs to be allowed to be 0-many, since there are > cases where one wants to use the ORU message as an unsolicited results status > message without the OBX segments (there may be no result if a service has been > only scheduled, or if the specimen has been taken but not yet resulted). The > real error in the message is the fact that the `]` follwing the OBX needs to > be after the repeating 0-many NTE's. > > Not this: > > { > [OBX] Result > {[NTE]} Notes and comments > } > > but this: > { > [OBX Result > {[NTE]}] Notes and comments > } > > The same correction should be applied to the `]` after the PID segment. O.K. I see your point. But I feel that there is an even deeper discussion needed about the nature of groups. What is a group? A group is two or more segments (or again groups) that share a common fate regarding optionality or repetition, i.e. two or more entities enclosed within the same pair of braces (or brackets). At least, this is, what a first observation of HL7 standard will show. Unfortunately, the notion of a group is very much linked with the notion of optionality or repeatability. Because all groups are anonymous groups in the HL7 document it can be easily overlooked that a group is not only a syntactical construct but usually has a specific meaning (at least should have). Thus I would rather like to define a group as folows: A group is two or more entities (segments or again groups) that are semantically linked, i.e. that form an entity by their meaning. Thus, the concept of groups is totally independent of any concept of optionality or repeatability but not vice versa: If two or more entities form a unit by their meaning, they will never occur independently from each other, i.e. their syntactical properties with regard to a certain context will always be the same. Let me introduce a new notation for groups: A group is designated by a pair of parentheses. And thus, the following is the most inner group in the ORU message: ( OBX [NTE] ) Now let me explain what I mean by the shared syntactical properies with regard to certain contexts. The group that was just displayed might well occur in different contexts, notably in the ORU message, in the ORF message and in some ADT messages (the latter is not absolutely correct, because the ADT section doesn't use NTE segments). It is the ORU (ORF) message syntax that decides about the property of this group that it is repeatable, or to be more precise, it is the syntax of the environment of the occurance of the group, that decides whether it is optional or repeatable. By contrast: It is not the inner meaning of the group by itself that it is repeatable, just like it is not the inner meaning of a PV1 segment which makes it optional (in the ORU message), but it is the context. To see this is IMHO very important. It leads me to be uncomfortable with the fact, that groups are not named throughout the HL7 standard. What we should do is to define groups just as we define messages or segments, i.e. give a name to it and describe it's intention/meaning, restrictions etc. (but not whether it is ``repeatable'' or ``optional'', since this is largely independent from the group itself!) To precisely define groups is necessary anyway, if we want to redesign (or review) the HL7 standard under an object oriented paradigma. Let me show what I mean by the now well-discussed ORU example: If I do nothing else than to decompose the definition of the ORU message by groups, this definition appears like as follows. Please note that I don't *change* anything from the original syntax specification of the ORU message, I just decompose it. ORU message: MSH Message Header { Obs.per.Subject-Group } Observation of a single subject [DSC] Continuation Pointer Obs.per.Subject-Group ( [ Patient-Group ] Subject (patient or non-patient) { Battery-Group } The observations to the Subject ) Patient-Group: ( PID Patient Identification [{NTE}] Notes and comments [PV1] Patient Visit ) Battery-Group: ( [ORC] Order common OBR Observations Report ID {[NTE]} Notes and comments { Result-Group } A number of Results ) Result-Group ( [OBX] Result {[NTE]} Notes and comments ) Now let's have a look on what we have, group by group, thereby starting at the innermost group, the Result-Group. Let us try to describe the meaning of the Result-Group as follows: The Result-Group denotes a single result. A single result would naturally consist of a result segment OBX which may be followed by some notes and commands {[NTE]}. But, please, look that the last sentence is obviously judged false by the definition of the syntax of the Result-Group (which was not given by me but inferenced formally, by decomposition): The definition says that the Result-Group need not denote any result, because the OBX segment is optional. Now finally, I hope, it becomes obvious, what I meant by ``possibly empty repeating groups''. The Result-Group as defined above may well be empty! So there is now an alternative: Either we hold the definition of the Result-Group to be correct and have to gain a new understanding of it's meaning, or we keep our understanding and change the definition. Let's try to hold the definition since it is the approved standard: The Result-Group denotes a single result or no result. Which would imply that a result entity without any result would make sense, I can see no sense in this. Thus, I vote to hold the meaning and change the syntax appropriately: Result-Group ( OBX Result {[NTE]} Notes and comments ) The Battery-Group is quite well explained in the HL7 Standard, which says that: "The OBR segment provides information that applies to all of the observations that follow [...] For simplicity we will refer to the observation set as the battery" (v2.2 Page 7-2) But Mark Shafarman has remarked, that a battery status might be to report prior to any reportable results, in which case there is no OBX segment to transmit. Unfortunately, this is not reflected by the decomposed definition of the ORU syntax. But let me give a provoking reason for that: The ORU definition as it is now is wrong! Why? Because it implies a nonsense meaning to the Result-Group, and omits an important feature from the Battery-Group! Mark Shafarman told us how to fix the mistake in the definition of the syntax, as I have cited him at the top of my message. And so, I will do. Let's fix the definition of the Battery-Group, which reflects that a battery information may contain zero or many results. Battery-Group: ( [ORC] Order common OBR Observations Report ID {[NTE]} Notes and comments {[ Result-Group ]} A number of Results, or none ) Now, let's recompose the ORU message to meet the convention of the message syntax definition of the current HL7 standard: MSH Message Header { Observation per Subject Group [ Subject Group PID Patient Identification [{NTE}] Notes and comments [PV1] Patient Visit ] { Battery Group [ORC] Order common OBR Observations Report ID {[NTE]} Notes and comments {[ Result Group OBX Result {[NTE]} Notes and comments ]} } } [DSC] Continuation Pointer To summarize what we have done: By a formal decomposition of the grammar of the ORU message regarding groups, we realized two problems, one in either of two groups. Both problems have been fixed independently within either group. After recomposition we have a clear and correct grammer with regard to the examinated problem. But we are not finished yet, there is even more. Let's have a break: BREAK :-) The original problem with the ORU message was not what has been just solved. It was that the grammar of the ORU message has ambiguous sentences. The sentence: MSH OBR PID OBR OBR is perfectly correct with regards to the grammar, but we are unable to say whether how the segments are grouped. Either, MSH ( OBR ) ( PID OBR OBR ) or MSH ( OBR ) ( PID OBR ) ( OBR ) is a possible answer, but we do not know. This is again obviously a fault of the grammar of the ORU message. The parentheses that I made in the grouping alternatives correspond to the parentheses of the Obs.per.Subject-Group that was found in the formal definition of the ORU message. Why is it a problem and how could we solve it? The problem stems from the convention of the HL7 standard to leave groups annonymous. We recognize groups not because the standard tells us, but because we recognize certain structures within the formal message definition as groups. There might still be different viewpoints about what makes up a group (anticipating some discussions about what I postulated for the nature of a group above :-). In a similar sense, two parser might be of different viewpoints, what makes up the groups within the "MSH OBR PID OBR OBR" message. And in deed, if the standard would teach us better, our (assumed) disagreement would be solved and the disagreement between the two parsers would as well be solved, if we would teach them by what they read: Encoding rules. Don't worry, I am not going to propose a change in HL7 encoding rules now, I just want to show the problem from the perspective of different encodeing rules that are in use in EDI. Think of a strictly tagged encoding scheme, that would present any data by the triple: In such an encoding scheme, the ambiguity that we have would never occur, becausethe parsers would have to read something like: T= ORU Msg. L= up to here-----------------------------+ D= T= Obs.p.S.Gr | L= up to here---------------------+ | D= T= Batt.Gr | | L= up to here-------------+ | | D= T= OBR Seg. | | | L= up to here-----+ | | | D= ... | | | | T= Obs.p.S.Gr<--<--<----+----+----+ | L= up to here---------------------+ | D= T= Subj.Gr. | | L= up to here-------------+ | | D= T= PID Seg. | | | L= up to here-----+ | | | D= ... | | | | T= Batt.Gr<----------+----+ | | L= up to here-------------+ | | D= T= OBR Seg. | | | L= up to here-----+ | | | D= ... | | | | T= Obs.p.S.Gr<----------+----+----+ | L= up to here---------------------+ | D= T= Batt.Gr | | L= up to here-------------+ | | D= T= OBR Seg. | | | L= up to here-----+ | | | D= ... | | | | <-------------+----+----+----+ Even though this figure might look confusing to the human eye, parsers tend to love it, if they are trained in it. It is a perfectly well understood encoding mechanism for nested structures and it is successfully beeing used in ISO/OSI's basic encoding rules (BER) as well as in the encoding of DICOM (ACR/NEMA) messages. With such an encoding mechanism any entity is unambiguously identified by a tag (T), and thus ambiguous sentences will never happen! What can we learn for our HL7 from this? Our encoding rules are just not capable of handling any nesting! So, if we want to stay with our good-old-style encoding rules, for there is reason to do so, we have to take special care about where our encoding rules will fail. But can we know in advance whether they might fail or not? In HL7, groups are implicit both in the standard document as well as in the tarnsported messages. Whether a sequence of segments is a certain group or not depends on whether the sequence matches the definition of that group. There are cases where the parser can tell from the first segment whether the sequence will *not* match, for example in the Subject-Group defined above: A seqence of segments which does not start by a PID can never match a Subject-Group. But the parser can not tell if the sequence *does* match a group until it is finished with the last entity that matches the group *and* if there is no required entity which was not seen. Thus, any entity of the grammar tries to match the maximum amount of segments. This leads to the possibility of one entity hiding the other, which is the case in the ambiguity shown above. There is no way out of this problem, except by +----------------------------------------------------------------+ ! taking care that no two groups that can immediately follow ! ! each other in a message may ever be able to produce the same ! ! pattern of segments ! +----------------------------------------------------------------+ How can we assure that this rule is violated by a message? It is not easy to be sure by the first sight, since the rule is tricky: Segments that can immediately follow each other in a message need not be in close contact in the written grammar that defines the message, since there might be optional parts separating entities in the grammar, that can be missing in a concrete message. As a first rule of thumb, by removing optional parts, we can tell if there are ambiguities in the other parts. Let's try this with the ORU message. I'll leave away the Subject-Group, the ORC segment, the OBX segment and the DSR segment, which were all optional, from the original (!) ORU message definition. This yields: MSH Message Header { Observation per Subject Group { Battery Group OBR Observations Report ID {[NTE]} Notes and comments { Observation Group {[NTE]} Notes and comments } } } Look at the nesting of the form "{ { OBR ... } }". It is impressingly clear, that there is no way to decide to which level of repeatition any OBR segment belongs. This is an absolutely ambiguous structure in the ORU message and deserves to be corrected. The sequence: "{[NTE]} { {[NTE]} }" even shows us two of such ambiguities at once: First, we recognize the same structure as above "{ { [NTE] } }" which must be fixed. And second, the structure is preceded by another "{ [NTE] }" which multiplies the ambiguity. To summarize, we have found that any grammar that, after removal of optional parts, shows up one of these four structures or any combination thereof, is known to be ambiguous. (1) { { X } } (2) { X } [ X ] (3) [ X ] [ X ] (4) { [ X ] [ Y ] } When we write the grammar rules in BNF form and feed them to yacc, we get warnings about conflicts in all four rules. Below is the BNF grammar for each of the four rules that can be fed to yacc (where X is a terminal symbol). If you try yacc on any one of these rules, you will get the shift/reduce or reduce/reduce conflict warnings: (1) a: b | b a; b: X | X a; shift/reduce conflict (2) a: b c | b; b: X | X b; c: X; shift/reduce conflict (3) a: b c | b | c; b: X; c: X; reduce/reduce conflict (4) a: b | b a; b: X Y | X | Y; shift/reduce conflict A practical rule to avoid these conflicts is this: | Any group must have at least one required segment that is | | unique throughout all the groups of a certain message. | This is not only useful to avoid ambiguities, but can also help understand the meaning of a group as we have seen in the Result-Group. A group where none of the entities is required is likely to reveal a conceptual mistake, like the Result-Group without a result (OBX) segment. It is very interesing here, that the ADT Working Group has (by good intuition?) always been aware of this problem and assured that the above rule is never violated! On the other hand, the Financial Working Group did the mistake as well as the Observation Results Group. Look at the definition of the BAR message: BAR MSH Message Header EVN Event Type PID Patient ID Information { [ PV1 ] Patient Visit [ PV2 ] Patient Visit - Additional Info [{ OBX }] Health Information [{ AL1 }] Allergy Information [{ DG1 }] Patient Diagnosis [{ PR1 }] Procedures [{ GT1 }] Guarantor [{ NK1 }] Next of Kin [ { IN1 Insurance [ IN2 ] Insurance - Additional Info. [ IN3 ] Insurance - Add'l Info. - Cert. } ] [ACC] Accident Information [UB1] Universal Bill Information [UB2] Universal Bill 92 Information } The above column of optional entities within the outer repeating group shows the same kind of ambiguitiy as was exhibited as structure (4) above. Thus "MSH EVN PID OBX OBX" is a correct but ambiguous sentence. Most interesting is that the Order Entry Working Group, which uses the Result-Group as the Result Report Working Group, has done it's job right by defining it as follows: [ { OBX Results Segment [{NTE}] Notes and Comments (for Results) } ] which is even nicer than what was demanded here by avoiding the awful {[X]} construct, which would be ambiguous to a parser that does not handle it as it would handle: [{X}]. The problem is, that [X] is perfectly matched by just nothing, leaving the parser in an infinite loop on the repeatition with an infinite number of absent optional segments. However this problem is most commonly avoided by the parser who is told to consume a token in any step or else use a different rule. But anyway [{X}] is more correct than {[X]}. Now the final task of this paper is to propose fixes for the mistakes left in the BAR and ORU message. Since, as an europeean, I am not privy to the deeper meanings of the BAR message, wich is designed primarily for American hospitals, I can not provide a ready made solution. But I would ask the Financial WG to decompose the message regarding groups (which is easy), find and name a meaning of the outer group, then try to figure out the difference between "( DG1 DG1 )" and "( DG1 ) ( DG1 )". Can there be a diagnosis without visit? Why is the PV1 segment left optional? For the ORU message my proposition is this: Define the Result-Group like in the Order Entry chapter. And disallow the Obs.per.Subject-Group. Only the ORU and ORF sets a whole message body into a repeating group. The HL7 encoding rules simply can not handle this kind of nesting! Let me conclude with the the main sentences of this posting which I propose to be obeyed throughout the review and new design of the HL7 standard: The definition of a group: * A group is two or more entities (segments or again groups) that are semantically linked, i.e. that form an entity by their meaning. Make grouping explicit! * Groups should be named and described just like messages and segment should be named and described. Account for the meaning of groups! * Groups *must* make sense in the light of decomposed grammar definitions. Ambiguity *must* be avoided! * No two groups that may immediately follow each other in a message may ever be able to match of the same pattern of segments. A useful rule to avoid ambiguity is: * Any group must have at least one required segment that is unique throughout all the groups of a certain message. regards -Gunther Schadow