%
/* delimiters are tokens */

eom = '' ;
eos = 0x0D ;
eof = '|' ;
eoc = '^' ; 
eos = '&' ;

%
/* 1
first scetch of the syntax */

message: segments ;
segments: segment
 | segment eos segments ;
segment: fields ;
fields: field
 | field eof fields ;
field: atom
 | composite ;
composite: component
 | component eoc composite ;
component: atom
 | subcomposite ;
subcomposite: subcomponent
 | subcomponent eosc subcomposite ;
subcomponent: atom ;

/* 2
second scetch of the syntax, here, we regard components and
subcomponents as basically the same idea in different levels. Each
level can simply terminate in an atom */

message: segments ;
segments: segment
 | segment eos segments ;

segment: fields ;
fields: field
 | field eof fields ;

field: data(1) ;

data(n) : atom | composite(n) ;
composite(n): data(n+1)
 | data(n+1) eoc(n) composite(n) ;

/* 
data(1) = component, 	eoc(n) = eoc
data(2) = subcomponent, eoc(n) = eosc
*/

/* 3
third scetch of the syntax, here we bring the whole syntax to a
recursive form, with a start level (message) and a terminal (atom) */

message: segments ;

segments: segment
 | segment eos segments ;
segment: fields ;

fields: field
 | field eof fields ;
field: components ;

components: component
 | component eoc components ;
component: subcomponents ;

subcomponents: subcomponent
 | subcomponent eosc subcomponents ;
subcomponent: atom ;

/* 4
here is a formalism which describes exactly the above syntax (3) ,
there are terms (t) and lists (l) in which the terms are terminals: */

message : l(1) ;
---
l(n) : t(n) | t(n) d(n) l(n) ;
t(n) : l(n+1) ;
---
l(max) : atom

/* 5
we can further reduce the terms (t) ending up in a syntax which is
merely made up from lists */

message : l(1) ;
---
l(n) : l(n+1) | l(n+1) d(n) l(n) ;
---
l(max) : atom

/*
at this point, we have a perfectly simple description of our syntax,
however this syntax fails to express what we want: there is no object
left except the start (message) and the terminal (atom).  Since we
want to model this syntax with singular objects, we have to find a
complementary approach */

/* 6 
we try to reduce the plurals (the lists of something) by swaping the
two body lines of (4), which leads us to a definition, which deals
with objects rather than lists */

message : t(1) ;
---
t(n) : l(n) ;
l(n) : t(n+1) | t(n+1) d(n) l(n) ;
---
l(max) : atom

/* 7
this can be further reduced by the lists leaving us each object
defined with simple tail recursion */

message = t(1)
---
t(n): t(n+1) | t(n+1) d(n) t(n) ;
---
t(6) = atom

/* 8
forth sketch of the syntax, which we get when we resolve the
general definition (7) */

message: segment
 | segment eos message ;
segment: field
 | field eof segment ;
field: component
 | component eoc field ;
component: subcomponent
 | subcomponent eosc component ;
subcomponent: atom ;

/* at this point we have a simple and expressive syntax definition,
however, HL7 makes a different approach to message syntax on the one
hand and segment syntax on the other. If segments are void, they
usually do not appear at all on the presentation stream, while fields,
components etc which are void and not trailing have to appear by their
delimiter. */

/* 9 
Thus here is an other approach to delimiters which is complementary to
the one we made so far */

message = t(1)
---
t(n): t(n+1) d(n) | t(n+1) t(n) ;
---
t(6) = atom

/* at least at t3 (field), this way of handling delimiters becomes
inadequate, since a field which consists of a simple atom is appended
by three delimiters `&^|' which is certainly wrong. This becomes clear
when we unroll the generic syntax (9) */

t1 (=message)      : t2 d1 (=eom)  | t2 t1 ;
t2 (=segment)      : t3 d2 (=eos)  | t3 t2 ;
t3 (=field)        : t4 d3 (=eof)  | t4 t3 ;
t4 (=component)    : t5 d4 (=eoc)  | t5 t4 ;
t5 (=subcomponent) : t6 d5 (=eosc) ;
t6 (=atom)

/* We had to make the following changes, which is extremely ugly.
moreover there is no message delimiter `eom' */

t3 (=field)        : ( t4 | t5 | t6 ) d3 | t4 t3 ;
t4 (=component)    : (      t5 | t6 ) d4 | t5 | t5 t4 ;
t5 (=subcomponent) :             t6   d5 ;

/* This method is not as elegant as (8) above, since any object has
it's delimiter appended, even if it is a trailing object. However this
is perfectly adequate for segments, since segments are ended by an eos
if and only if they are present. */

/* 10 
fifth sketch of the syntax definition */ 

message: segment
 | segment message ;

segment: field eos
 | field eof segment ;

field: component
 | component eoc field ;
component: subcomponent
 | subcomponent eosc component ;
subcomponent: atom ;

/* Note that we handle segment tags (ids) as fields of ID data type.
this field is given a common meaning in any segment similar to the MSH
segment which introduces any message. In order to respect these
specialities, we do some refinements to (10) */

message: message-header message-body ;
message-header: segment ;
message-body: segment
 | segment message-body ;

segment: segment-id segment-body ;
segment-id: field ;
segment-body: field eos
 | field eof segment-body ;

field: component
 | component eoc field ;
component: subcomponent
 | subcomponent eoc component ;
subcomponent: atom ;

/*
there are even more refinements to make to message-header, in order
to reflect the handling of delimiters:
*/

message-header: msh-header eos;
 | msh-header eof segment-body;
msh-header: "MSH" delimiter-definition eof;

/*
Further considerations:

A) A null value can not apply to all HL7 objects as was assumed so far.
Null is only meaningful in data items instead. Messages and segments
are however not data items but groupings of data for the purpose of
meaningful communication. Messages and segments give meaning to data
items, but are no data items by themselves. That's why the null value
makes no sense for messages and segments or groups of segments. Since
this is so, we have to revise the concept of null values throughout
these HL7Objects, which are no data items, including the abstract base
class.

B) The lack of abstractness of HL7 Syntax imposes further inconvenience
to us: Since the segment ID's `MSH', `FHS' and `BHS' are read before
the delimiter characters are negotiated, we cannot let IDtyp read
these delimiters, since an IDtyp could (and in fact does) depend on
delimiter characters. Thus these tags must be read by their length
i.e. we have to read exactly three characters before the delimiters
which are in turn followed by the first field delimiter.

On the other hand an abstract syntax wouldn't even make decisions
about the appearence of tags on the presentation stream. Thus, the
segment ID's do not belong into the abstract definition of a segment.
It may be wise not to handle tags as data (IDtyp) at all (what was
said about null values above applies to segment ids as well). To make
things even worse, there is an other tag deeply inside the MHS
segment: the message type id.

For now we keep this question pending, and work around that problem by
handling `special segments' differently from normal segments. Special
segments are those, which carry information about the encoding of
themselves and of subsequent segments. Namely MSH, FHS and BHS are
special segments. These are read differently in that they let the
delimiters read themselves from the stream directly after the 3
letter tag, which is not regarded as an IDtyp.
*/

/*
Refinement of Class design in HL7


				  HL7-Object
			      /	      |		 \
		  Structure	  Delimiters	      Data Type
		   Object			    /		\
		/      |		       Primitive	 Composite
       Structure    Segment			Type		    Type
      of Segments			     /  //|||\\
      /	       \	        	Code  ID . . . . . 	
Message		Group

*/

/*
 * Considerations about lower level resources
 */

All reads and writes by the encoding classes go through a stream
abstraction, which is a class derived from iostream. As such, it can
be bound to diverse sources and sinks like strings, disk files, pipes,
sockets, etc. If any of these is not supported by default, it can
be supplied with relatively little effort.

One problem is, that there is no `end of message' character defined in
HL7. The message parser must know when a message is finished in order
to distinguish between an unexpected segment and a new segment which is
beyond the scope of the parser. This can be achieved by the following
strategy:

* The parser stops, when it comes accross a MSH, BHS or FHS segment,
  which marks the start of a new message.

* Since HL7 LLPs want any message to be embedded between SB and EB
  characters, the stream abstraction can signal a new message and an
  end of a message conditions on these characters. This would be like
  an implied end of file character (the eof condition) with binary
  files.
    This method does only work for the outer level of parsing, i.e. a
  message not within a batch or file, a batch not within a file or a
  file. 

Delimiters are valid for exactly one message, one batch and one file.
Within such an entity, it is not allowed to change the delimiters
after they are set during the negotiations at the beginning. We can
distinguish the following abstraction levels in a HL7 transaction:

- A connection
	is the binding of two streams (xin and xout) to a source and a
	sink, which may be a media file, a pipe, a socket, a modem etc.
- A pair of streams
	is the abstraction of such a binding made in a connection.
- A transaction
	is the exchange of a message/response pair, a batch/response
	or a file/response pair over the pair of streams.
- A message, batch or file
	make up units of parsing.

/*
 * General Rules for the Parser
 */

GENERAL

 1 Reading an object may succeed, fail, or take an exceptional exit.

 2 If reading fails, the stream must be backtracked.

 3 There are distinct levels of parsing. Levels are related to
   classes as shown by the following table.

       level  default   class
	      delimiter

	 0	none	Message or Group of Segments (SegStruc)
	-1	<CR>	Segment
	-2	|	Field
	-3	~	Repetition
	-4	^	Component
	-5	&	Subcomponent, Atom

 4 When a delimiter is found whose level is greater or equal than the
   current level then reading is stopped for the current object.

 5 When reading of an object has stopped everything up to (and including ???)
   the delimiter of the current level is consumed. This includes any
   delimiter of lower levels. 

DATA OBJECTS

 6 Reading of data objects may never fail.

PRIMITIVE TYPES (ATOMS)

 7 If an atom can not handle the characters it finds on the stream, it
   raises an exception (parse error).

 8 If an atom finds nothing but a delimiter, it unsets itself and
   succeeds.

 9 If an atom finds the null symbol (`""'), it nullifys itself an
   succeeds.

COMPOSITE TYPES

10 If a composite type has to stop reading due to a delimiter before
   it has read anything else, it unsets itself and succeeds.

11 If a composite type reads nothing but a null symbol (`""'), it
   nullifys itself and succeeds.

   Examples:

	|	not present
	""|	null
	^|	present (with all components not present)
	""^|	present (with 1st component null and 2nd not present)

REPEATED DATA OBJECTS

12 If a repeated object has to stop reading due to a delimiter before
   it has read anything else, it unsets itself and succeeds.

...

GROUPS

XX