% /* delimiters are tokens */ eom = '' ; eos = 0x0D ; eof = '|' ; eoc = '^' ; eos = '&' ; % /* 1 first scetch of the syntax */ message: segments ; segments: segment | segment eos segments ; segment: fields ; fields: field | field eof fields ; field: atom | composite ; composite: component | component eoc composite ; component: atom | subcomposite ; subcomposite: subcomponent | subcomponent eosc subcomposite ; subcomponent: atom ; /* 2 second scetch of the syntax, here, we regard components and subcomponents as basically the same idea in different levels. Each level can simply terminate in an atom */ message: segments ; segments: segment | segment eos segments ; segment: fields ; fields: field | field eof fields ; field: data(1) ; data(n) : atom | composite(n) ; composite(n): data(n+1) | data(n+1) eoc(n) composite(n) ; /* data(1) = component, eoc(n) = eoc data(2) = subcomponent, eoc(n) = eosc */ /* 3 third scetch of the syntax, here we bring the whole syntax to a recursive form, with a start level (message) and a terminal (atom) */ message: segments ; segments: segment | segment eos segments ; segment: fields ; fields: field | field eof fields ; field: components ; components: component | component eoc components ; component: subcomponents ; subcomponents: subcomponent | subcomponent eosc subcomponents ; subcomponent: atom ; /* 4 here is a formalism which describes exactly the above syntax (3) , there are terms (t) and lists (l) in which the terms are terminals: */ message : l(1) ; --- l(n) : t(n) | t(n) d(n) l(n) ; t(n) : l(n+1) ; --- l(max) : atom /* 5 we can further reduce the terms (t) ending up in a syntax which is merely made up from lists */ message : l(1) ; --- l(n) : l(n+1) | l(n+1) d(n) l(n) ; --- l(max) : atom /* at this point, we have a perfectly simple description of our syntax, however this syntax fails to express what we want: there is no object left except the start (message) and the terminal (atom). Since we want to model this syntax with singular objects, we have to find a complementary approach */ /* 6 we try to reduce the plurals (the lists of something) by swaping the two body lines of (4), which leads us to a definition, which deals with objects rather than lists */ message : t(1) ; --- t(n) : l(n) ; l(n) : t(n+1) | t(n+1) d(n) l(n) ; --- l(max) : atom /* 7 this can be further reduced by the lists leaving us each object defined with simple tail recursion */ message = t(1) --- t(n): t(n+1) | t(n+1) d(n) t(n) ; --- t(6) = atom /* 8 forth sketch of the syntax, which we get when we resolve the general definition (7) */ message: segment | segment eos message ; segment: field | field eof segment ; field: component | component eoc field ; component: subcomponent | subcomponent eosc component ; subcomponent: atom ; /* at this point we have a simple and expressive syntax definition, however, HL7 makes a different approach to message syntax on the one hand and segment syntax on the other. If segments are void, they usually do not appear at all on the presentation stream, while fields, components etc which are void and not trailing have to appear by their delimiter. */ /* 9 Thus here is an other approach to delimiters which is complementary to the one we made so far */ message = t(1) --- t(n): t(n+1) d(n) | t(n+1) t(n) ; --- t(6) = atom /* at least at t3 (field), this way of handling delimiters becomes inadequate, since a field which consists of a simple atom is appended by three delimiters `&^|' which is certainly wrong. This becomes clear when we unroll the generic syntax (9) */ t1 (=message) : t2 d1 (=eom) | t2 t1 ; t2 (=segment) : t3 d2 (=eos) | t3 t2 ; t3 (=field) : t4 d3 (=eof) | t4 t3 ; t4 (=component) : t5 d4 (=eoc) | t5 t4 ; t5 (=subcomponent) : t6 d5 (=eosc) ; t6 (=atom) /* We had to make the following changes, which is extremely ugly. moreover there is no message delimiter `eom' */ t3 (=field) : ( t4 | t5 | t6 ) d3 | t4 t3 ; t4 (=component) : ( t5 | t6 ) d4 | t5 | t5 t4 ; t5 (=subcomponent) : t6 d5 ; /* This method is not as elegant as (8) above, since any object has it's delimiter appended, even if it is a trailing object. However this is perfectly adequate for segments, since segments are ended by an eos if and only if they are present. */ /* 10 fifth sketch of the syntax definition */ message: segment | segment message ; segment: field eos | field eof segment ; field: component | component eoc field ; component: subcomponent | subcomponent eosc component ; subcomponent: atom ; /* Note that we handle segment tags (ids) as fields of ID data type. this field is given a common meaning in any segment similar to the MSH segment which introduces any message. In order to respect these specialities, we do some refinements to (10) */ message: message-header message-body ; message-header: segment ; message-body: segment | segment message-body ; segment: segment-id segment-body ; segment-id: field ; segment-body: field eos | field eof segment-body ; field: component | component eoc field ; component: subcomponent | subcomponent eoc component ; subcomponent: atom ; /* there are even more refinements to make to message-header, in order to reflect the handling of delimiters: */ message-header: msh-header eos; | msh-header eof segment-body; msh-header: "MSH" delimiter-definition eof; /* Further considerations: A) A null value can not apply to all HL7 objects as was assumed so far. Null is only meaningful in data items instead. Messages and segments are however not data items but groupings of data for the purpose of meaningful communication. Messages and segments give meaning to data items, but are no data items by themselves. That's why the null value makes no sense for messages and segments or groups of segments. Since this is so, we have to revise the concept of null values throughout these HL7Objects, which are no data items, including the abstract base class. B) The lack of abstractness of HL7 Syntax imposes further inconvenience to us: Since the segment ID's `MSH', `FHS' and `BHS' are read before the delimiter characters are negotiated, we cannot let IDtyp read these delimiters, since an IDtyp could (and in fact does) depend on delimiter characters. Thus these tags must be read by their length i.e. we have to read exactly three characters before the delimiters which are in turn followed by the first field delimiter. On the other hand an abstract syntax wouldn't even make decisions about the appearence of tags on the presentation stream. Thus, the segment ID's do not belong into the abstract definition of a segment. It may be wise not to handle tags as data (IDtyp) at all (what was said about null values above applies to segment ids as well). To make things even worse, there is an other tag deeply inside the MHS segment: the message type id. For now we keep this question pending, and work around that problem by handling `special segments' differently from normal segments. Special segments are those, which carry information about the encoding of themselves and of subsequent segments. Namely MSH, FHS and BHS are special segments. These are read differently in that they let the delimiters read themselves from the stream directly after the 3 letter tag, which is not regarded as an IDtyp. */ /* Refinement of Class design in HL7 HL7-Object / | \ Structure Delimiters Data Type Object / \ / | Primitive Composite Structure Segment Type Type of Segments / //|||\\ / \ Code ID . . . . . Message Group */ /* * Considerations about lower level resources */ All reads and writes by the encoding classes go through a stream abstraction, which is a class derived from iostream. As such, it can be bound to diverse sources and sinks like strings, disk files, pipes, sockets, etc. If any of these is not supported by default, it can be supplied with relatively little effort. One problem is, that there is no `end of message' character defined in HL7. The message parser must know when a message is finished in order to distinguish between an unexpected segment and a new segment which is beyond the scope of the parser. This can be achieved by the following strategy: * The parser stops, when it comes accross a MSH, BHS or FHS segment, which marks the start of a new message. * Since HL7 LLPs want any message to be embedded between SB and EB characters, the stream abstraction can signal a new message and an end of a message conditions on these characters. This would be like an implied end of file character (the eof condition) with binary files. This method does only work for the outer level of parsing, i.e. a message not within a batch or file, a batch not within a file or a file. Delimiters are valid for exactly one message, one batch and one file. Within such an entity, it is not allowed to change the delimiters after they are set during the negotiations at the beginning. We can distinguish the following abstraction levels in a HL7 transaction: - A connection is the binding of two streams (xin and xout) to a source and a sink, which may be a media file, a pipe, a socket, a modem etc. - A pair of streams is the abstraction of such a binding made in a connection. - A transaction is the exchange of a message/response pair, a batch/response or a file/response pair over the pair of streams. - A message, batch or file make up units of parsing. /* * General Rules for the Parser */ GENERAL 1 Reading an object may succeed, fail, or take an exceptional exit. 2 If reading fails, the stream must be backtracked. 3 There are distinct levels of parsing. Levels are related to classes as shown by the following table. level default class delimiter 0 none Message or Group of Segments (SegStruc) -1 Segment -2 | Field -3 ~ Repetition -4 ^ Component -5 & Subcomponent, Atom 4 When a delimiter is found whose level is greater or equal than the current level then reading is stopped for the current object. 5 When reading of an object has stopped everything up to (and including ???) the delimiter of the current level is consumed. This includes any delimiter of lower levels. DATA OBJECTS 6 Reading of data objects may never fail. PRIMITIVE TYPES (ATOMS) 7 If an atom can not handle the characters it finds on the stream, it raises an exception (parse error). 8 If an atom finds nothing but a delimiter, it unsets itself and succeeds. 9 If an atom finds the null symbol (`""'), it nullifys itself an succeeds. COMPOSITE TYPES 10 If a composite type has to stop reading due to a delimiter before it has read anything else, it unsets itself and succeeds. 11 If a composite type reads nothing but a null symbol (`""'), it nullifys itself and succeeds. Examples: | not present ""| null ^| present (with all components not present) ""^| present (with 1st component null and 2nd not present) REPEATED DATA OBJECTS 12 If a repeated object has to stop reading due to a delimiter before it has read anything else, it unsets itself and succeeds. ... GROUPS XX