Reflecting hl7spec.dtd,v 1.7 1997/07/24 21:29:24
The popularity of SGML is growing in the HL7-Community. I personally doubt, whether it will be too useful as a transfer syntax for HL7 messages. However, SGML is definitely useful for the edition of the HL7 standard documents themselves. Die popularity of SGML could be a trigger to finally quit the unportable and unintelligible WinWord processing of HL7. The traditional way of treating the HL7 standard as a WordPerfect or WinWord document has always resulted in errors and inconsistencies. As the german user group on its way of translating HL7 into german usually steps through every single message, segment, data item and code table, we always discover a lot of errors. Here I want to show concrete steps as to how SGML could be employed to the HL7 standard. Which would in the end result in a much more consistent and useful standard.
The last version (1.5) of this DTD contained a major error that has been fixed by now.
Replacing WinWord with SGML facilitates the production of beautifully printed paper documents as well as a set of HTML files which can be browsed easily using extensive cross referencing. Mireover the protability of SGML facilitates the exchange and assembly of text fragments. The key difference from WYSIWYG is that the text in SGML will be marked up by logical contents as opposed to appearance. In order to do this, it is necessary to define a DTD (document type description) for the HL7 standard.
This is best done in a layered fashion, where each layer represents an aspect of the HL7 standard. On the first layer, HL7 standard can be regarded as a book. Just like a book, the HL7 document is made up of chapters, sections, appendices, indexes etc. Moreover, a book has author(s) and editions.
However, HL7 is not just any book, it is a book that sets a standard. A standard document also has chapters, sections, etc. but in a more constrained manner. For instance, in ISO standards it is common for the first section to be named `scope', section two may be a list of normative references, section three may be a definition of terms, followed by a number of freely named sections. Also, in ISO standards, the (sub)sections are commonly named `clauses' and appenices are called `Annexes'. The sections of a standard may have different status as `normative' or `informative'. We all know HL7 as being qualified as `ballot draft' or `final'. The view on the document as a standard is represented by the second layer.
Finally, the third layer deals with elements that are specific to HL7. These are: messages, segments, fields and data elements, composite and primitive data types, and code tables.
Each of the three layers is defined by a document type definition (DTD). The major advantages of the layered structure are, reuse of predefined DTDs and their rendering, conformability to standards pertaining to the respective view on the document, and flexibility to render the document in different ways. Rendering the original layer 3 SGML document occurs in three steps as shown in the figure below.
+==============+ LAYER 3 DTD \HL7 STANDARD/ <-----> Databases, ==transform== \ / -----> Implementations LAYER 2 DTD \STANDARD/ ==transform== \ / LAYER 1 DTD \BOOK/ ===format==== \ / HTML, PDF, RTF, TeX, ... \/ -----> Printed/Online Docs
The first two steps are transformations of the DTD instance into an instance of the next lower level DTD. This sounds complicated, but it isn't. The clue is that the transformation only needs to consider the elements that are specific to the current layer, which are then translated into elements of the next lower layer. All other elements remain unchanged.
For instance a transformation from the HL7 layer(3) to the standard layer (2) will translate the definition of a message to a clause (a subsection) that contains a heading, a descriptive text, and the typical `abstract message definition' formatted as a table. A segment would be formatted as a clause with a similar opening but which contains the table of fields and a tail of sub-clauses where each contains a description of its respective data element.
The structure of the lower layers can be borrowed from other DTDs or style guides. For example there is an early document type given by the ISO 8879:1986 standard in its annex E. This DTD is known as the ISO general DTD (isogen). It defines most elements of a book. The american publisher's association has prepared a similar DTD (that is originally based on isogen) which, along with oher DTDs for articles and serials) is now an ISO standard (12083:1993). Other layer one document types to be considered are the DocBook by the Davenport Group tuned for technical documentation, the OSF book for a similar purpose. The Text Encoding Initiative (TEI) DTD that is tailored to support text critical scholary work.
The advantage of using a standard DTD over an ad hoc creation is that there is a layer on which SGML documents can be electronically exchanged on that layer. If for example, the HL7 standard is to be shared with other standard developers (SDO) at ANSI or ISO, it is easy if all SDOs share a commonly used layer 2 DTD.
Most important, however, is the higher probability for standard DTDs to be supported by SGML rendering applications of the present and future. As we'll see later, the most dificult task of exploiting the virtues of SGML are not merely to parse and validate the SGML document but to render it into a form suitable for distribution on paper or online browseable form. When there is a standard rendering method for a book DTD, using that DTD frees from the task to develop that method from scratch.
The layer 1 and layer 2 DTDs have defined elements generally used for the markup of books and standard documents resp. The layer 3 DTD that defines elements specific to the HL7 Standard. Essentially, this is a meta model of HL7 expressed in terms of SGML. The meta model used here is based on a substantial analysis and implementation work for HL7 that has been done in the ProtoGen/HL7 project.
The following drawing is an outline of the HL7 meta model as a class hierarchy.
Class | +===========================ISA=================================+ | | | Classification Information Interaction | | | isa +=========isa=+ isa | | | | | | Structure=====<has==+ | | | | | | | | isa============+ | | | | | | | | | | | | | | | DataType==<has=+ | SegStruc | | | ^ | | | | | +======isa=+ has | | +===isa====+ | | | | | | | | | | | PRIMTYPE COMPOSITE SEGMENT GROUP MESSAGE==<has<==TRANSACTION | | | | use use | v v CODE==+==========+
The leaf-level classes of the meta model are
The leaf-level classes can logically be grouped into other, abstract classes all of them are eventually subsumed under a common base class. The class hierarchy can be expressed with SGML entities as shown below.
<!-- INTERACTION --> <!ENTITY % class.transaction "transaction"> <!ENTITY % class.interaction "%class.transaction;"> <!-- INFORMATION --> <!ENTITY % class.message "message"> <!ENTITY % class.group "group"> <!ENTITY % class.segment "segment"> <!ENTITY % class.composite "composite"> <!ENTITY % class.primtype "primtype"> <!ENTITY % class.abc.type "type" -- enabling subtype polymorphism --> <!ENTITY % class.type "%class.abc.type| %class.primtype;| %class.composite;"> <!ENTITY % class.segstruc "%class.group;| %class.message;"> <!ENTITY % class.structure "%class.segment;| %class.segstruc;"> <!ENTITY % class.codetab "codetab"> <!-- CLASSIFICATION --> <!ENTITY % class.information "%class.type;| %class.structure;"> <!ENTITY % class.classification "%class.codetab;"> <!-- TOP LEVEL CLASS --> <!ENTITY % class "%class.interaction;| %class.information;| %class.classification;">
Classes can be rendered in different modes. A class is either defined or used. Each class is defined exactly once, where it is assigned a unique id by which it can be referenced from everywhere else in the standard. Each definition associates an id Y with an ordered set of items ( X1, X2, .., Xn ) which we write symbolically as:
Y := ( X1, X2, ... , Xn )
Normally the full definition of a class is to be displayed at the location in the document where the class is defined. However, defining and displaying are tasks that do not directly relate. For instance, it might be useful to define a class at some point in the text, where it is shown only partly, possibly only by its name, or not shown at all. The display of the full definition may be more appropriate at a different place. This is very easy to express and implement with SGML. The definition of a class in SGML is where the DEF attribute is assigned an ID value.
Whether defined or refered to, a class can be used in different modes. It can be displayed in full with annotations, only briefly or not at all. Or, it can be used within a definition of an other class. The different modes of usage can be selected by the SGML attribute `USE' with the following values:
An other use of INSTANT is in message definitions, that might want to define a group (aka ``segment group,'' see above) only once. The defined group can be used within all the definitions of similar messages that imply the same group. This is especially useful in the chapters 3 and 6, where the insurance information is repeatedly defined in the group (IN1, [IN2], [IN3]).
The INLINE mechanism can be used in order to implement class inheritance. This HL7 DTD is thus ready for object oriented versions of HL7 specifications. Inheritance can be marked up in this DTD with an attribute `ISA' (read ``is a'') by which the parent class is referenced.
For instance, all messages that begin with (MSH, EVN) can be subsumed under an abstract EVN-Message. This message can be defined silently (using HIDE) as being made up of just MSH and EVN and from which the concrete messages inherit using ISA. This is a powerful mechanism that enhances consistency but can be completely hidden from the end-reader of the text. It does not require the reader to be knowledgeable in the object oriented method.
<!ENTITY % use.display "display"> <!ENTITY % use.hide "hide"> <!ENTITY % use.refer "refer"> <!ENTITY % use.instant "instant"> <!ENTITY % use.inline "inline"> <!ENTITY % use "%use.display;| %use.hide;| %use.refer;| %use.instant;| %use.inline;"> <!ENTITY % class.attr "use (%use;) #IMPLIED def ID #IMPLIED -- signals that the class is defined -- ref IDREF #IMPLIED -- required in all other cases -- isa IDREF #IMPLIED -- implements class inheritance --"> <!ATTLIST (%class.interaction) %class.attr;> <!ATTLIST (%class.message) %class.attr;> <!ATTLIST (%class.segment) %class.attr;> <!ATTLIST (%class.type) %class.attr; code IDREF #IMPLIED -- associated code table -- maxlen NUMBER #IMPLIED -- only sensible with primtype --> <!ATTLIST (%class.classification) %class.attr;>
Having been following this text up to this point you might ask
where the concept of field and data element is reflected. The
rationale for not including a field as a class in the model is
that:
All classes in the given model are defined by a list of items. For example, segments are defined by a list of fields. An item in this list, however, is something different from the class that it refers to. E.g. a field instantiates a data type giving it some name (and interpretation) and specifying whether the data is to occur once, or repeatedly, etc.
This common schema is obfusciated by the fact that the items are named differently in the different classes. But what is named ``field'' in segments has the same function as a ``component'' in composite data type. Moreover, messages and groups also contain items. These are called ``places'' in this DTD. Each item (place, field, segment) instantiates exactly one class.
<!-- INTERACTION --> <!ENTITY % item.request "request" -- used in transactions --> <!ENTITY % item.reply "reply" -- used in transactions --> <!ENTITY % item.speech-act "%item.request;| %item.reply;"> <!ENTITY % item.interaction "%item.speech-act;"> <!-- INFORMATION --> <!ENTITY % item.place "place" -- used in segment structures --> <!ENTITY % item.field "field" -- used in segments --> <!ENTITY % item.component "component" -- used in composite types --> <!ENTITY % item.information "%item.place;| %item.field;| %item.component;"> <!-- CLASSIFICATION --> <!ENTITY % item.value "value" -- used in code tables --> <!ENTITY % item.classification "%item.value;"> <!-- THE ABSTRACT ITEM --> <!ENTITY % item "%item.interaction;| %item.information;| %item.classification;">
Some (currently not all!) informational items have a notion of repeatability or optionality, which can generally be thought of as an occurance qualifier. This is selected using the attribute `OCCUR' with the values:
OCCUR MIN MAX ALSO KNOWN AS ====== === === ======================= one 1 1 mandatory opt 0 1 optional rep 1 n repeatable optrep 0 n optional and repeatable
The exact boundaries can be selected using the MIN and MAX qualifiers.
The notion of maximal length only exists in data types and makes sense only in primitive datatypes, although HL7 used to define maximal lengths for composite types such as PN. The MAXLEN attribute has therefore been moved as an attribute of TYPE classes. This allows to place some constraints on the type at a specific point of use.
Code tables can be associated with any occurance of a type, at definition, at any point of use, and as a property of the item that instantiates the type. Whether pre-set associations are overridden or marked as a consistency violation depends on the transformer.
Finally, as an item gives a certain interpretation to a class, it is also an entity, called `data element' and assigned a number in the HL7 standard. There are only a few data elements that occur on several fields. For instance the `filler order number' that occurs in the OBR, ORC and FT1 segments. However, it seems that there is only a blur awareness in the HL7 standard about the relationship of `field' and `data element'.
<!ENTITY % occur.one "one"> <!ENTITY % occur.opt "opt"> <!ENTITY % occur.rep "rep"> <!ENTITY % occur.optrep "optrep"> <!ENTITY % occur "%occur.one;| %occur.opt;| %occur.rep;| %occur.optrep;"> <!ENTITY % item.occur.attr "occur (%occur;) %occur.one; min NUMBER #IMPLIED max NUMBER #IMPLIED"> <!-- ATTLIST (%item.interaction;) none --> <!-- item.information --> <!ATTLIST (%item.place;) %item.occur.attr;> <!ATTLIST (%item.field;) %item.occur.attr; id ID #IMPLIED code IDREF #IMPLIED -- associated code table --> <!ATTLIST (%item.component;) %item.occur.attr; code IDREF #IMPLIED -- associated code table --> <!ATTLIST (%item.classification;) id ID #IMPLIED -- id of a coded value -->
The DTD elements are declared here. With the entities just defined, the element declaration becomes simple.
Each class, when defined, is opened by a name that defines an optional abbreviation as its attribute `ABBR'. This name or the is placed into the text whenever the class is refered to (USE=REFER). Descriptional text can follow the name. The rest of the class definition is the list of items.
<!ELEMENT (%class.transaction;) - o (name,descr?, %item.request;, descr?, %item.reply, descr?)?> <!ELEMENT (%class.group;) - o (name?,descr?, (%item.place;)* )?> <!ELEMENT (%class.message;) - o (name,descr?, (%item.place;)* )?> <!ELEMENT (%class.segment;) - o (name,descr?, (%item.field;)* )?> <!ELEMENT (%class.composite;) - o (name,descr?, (%item.component;)* )?> <!ELEMENT (%class.codetab;) - o (name,descr?, (%item.value;)* )?> <!-- this enables subclass polymorphism for type --> <!ELEMENT %class.abc.type; - o EMPTY>
Items can have an optional name, except for fields, where the name is required. If an optional name is left out, the name of the class that the item instantiates will be used instead. Descriptions of items can optionally follow the formal specifications. The layer 3 -> 2 transformer will typically place these descriptions into the text as separate subsubsections after the defining table of items.
<!ELEMENT (%item.speech-act;) - o (name?, (%class.message;), descr?)> <!ELEMENT (%item.place;) - - (name?, (%class.structure;), descr?)> <!ELEMENT (%item.field;) - - (name, (%class.type;), descr?)> <!ELEMENT (%item.component;) - - (name?, (%class.type;), descr?)>1.2 LAYER 2 DTD: THE STANDARD LAYER
This is currently only a dummy, since there is no standard DTD that is based on a standard book layer.
<!ENTITY % iso.standard.base "IGNORE" -- does not work --> <!ENTITY % ieee.standard.base "IGNORE" -- does not work --> <!ENTITY % ansi.standard.base "IGNORE" -- does not work --> <![ %iso.standard.base; [ <!ENTITY % ISOStandardDTD PUBLIC "-//ISO//DTD Standard V2.01//EN"> %ISOStandardDTD; ]]> <![ %ieee.standard.base; [ <!ENTITY % IEEEStandardDTD PUBLIC "-//IEEE//DTD Standard V0.0//EN"> %IEEEStandardDTD; ]]>1.3 LAYER 1 DTD: THE BOOK LAYER
The layer 1 DTD views the HL7 standard as a book. There are several alternatives which DTD can be used here. While in this early stage of the project there have been options to choose from, it is eventually necessary to definitely decide for one of these DTDs. Here I chose the ISO book. The DTD had to be changed at two points in order to link the %class; into the content model of the appropriate elements. Get the changed ISO book DTD from here.
<!ENTITY % iso.book.base "INCLUDE"> <!-- a plug-in for the ISO 12083:1993 book --> <![ %iso.book.base; [ <!ENTITY % ISObook PUBLIC "-//HL7//DTD ISO 12083:1993 Book modified for HL7//EN"> <!ENTITY % ISOnum PUBLIC "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN" > <!ENTITY % ISOpub PUBLIC "ISO 8879:1986//ENTITIES Publishing//EN" > <!ENTITY % ISOtech PUBLIC "ISO 8879:1986//ENTITIES General Technical//EN" > <!ENTITY % ISOdia PUBLIC "ISO 8879:1986//ENTITIES Diacritical Marks//EN" > <!ENTITY % ISOlat1 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" > <!ENTITY % ISOlat2 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 2//EN" > <!ENTITY % ISOamso PUBLIC "ISO 8879:1986//ENTITIES Added Math Symbols: Ordinary//EN" > <!ENTITY % ISOgrk1 PUBLIC "ISO 8879:1986//ENTITIES Greek Letters//EN" > <!ENTITY % ISOgrk3 PUBLIC "ISO 8879:1986//ENTITIES Greek Symbols//EN" > %ISObook; %ISOnum; %ISOpub; %ISOtech; %ISOdia; %ISOlat1; %ISOlat2; %ISOamso; %ISOgrk1; %ISOgrk3; <!ENTITY % main "(front, body, appmat?, back?)"> <!ENTITY % main.include "%i.float;"> <!ENTITY % text.model "%s.zz;"> <!ENTITY % phrase.model "%m.ph;"> ]]>2.4 GLUE CODE
The content model of names, descriptions and the main element depend on the underlying layer 2 and 1 DTDs. In the last version of this DTD, the %class; was simply included into the main element and all of its sub elements. However, this produced a recursion on which the whole level 3 design screwed up. It was therefore necessary to modify the ISO book DTD to include classes as references and as definitions in the book DTD precisely at the points where this is appropriate. For the ISO book, this is in the phrase model and in the section model.
<!ELEMENT name - - (%phrase.model;)*> <!ELEMENT descr - - (%text.model;)*> <!ELEMENT hl7spec - - (%main;) +(%main.include;)>2 INTELLIGENT PROCESSING OF THE HL7 SPECIFICATION
2.1 PAPER DOCUMENTS
The DTD thus created could be used for the edition and release of the HL7 standard in electronic form. As such, it is immediately useful. However, nobody really wants to read the standard in SGML. The conversion of SGML to LaTeX should be relatively straight forward on all platforms (WinDOS, UNIX, VMS, ...) as there is freely available software. Thus, beautiful documents could be printed on paper with consistent layout and content. There is a bunch of commercial software as well, that helps in this task (I recently had an advertizement for such a product in my snail mail box).
2.2 HTML BROWSEABLE DOCUMENTS
The DTD can be used in order to produce an HTML document. This is a translation within SGML from one DTD to another. However, information that was conatined in the HL7 DTD is lost as it is translated to elements of layout. Therefore HTML is not suitable as the primary language of the HL7 specs. But having the specs browseable offers new opportunities to explore and understand the standard. The key is the easy following of cross references. Cross references can be build automatically using the elements of the HL7 DTD. For instance, whenever a class (message, segment, etc.) or any other controlled term is used, the reader can click to see its definition. Whenever a segment is named, the user can click to its list of fields. Back and forth crossing chapters, stopping over at glossaries and indices, etc. The HTML version of the HL7 specs that have been prepared by Al Stone from the original WinWord docs are already a glimpse in what is possible. However, the connectivity of the standard by means of references could be made much tighter with SGML using the proposed DTD.
2.3 IMPLEMENTATIONS
An Implementation of HL7 can be regarded as just another transformation of the specs in SGML to the specs in a target language. Whether C++, CORBA IDL, or any intermediary format that is used in the generation of an implementation, all of this can be generated from the HL7 DTD elements of the third layer. This should be possible with the same tools that are used to transform between layers or that generate LaTeX or HTML. Therefore, the enormous amount of manual work of SIGOBT could be automatized. ProtoGen/HL7 implementations would always be up to date of the latest (pre-)version of the standard. Moreover those people who find SGML suitable as a message transfer syntax can derive their DTD from the HL7 specs in SGML! The real strenth of SGML is that is so multi functional. As the specification of HL7 can be edited in a controlled and consistent manner, it can be used for many purposes including printing, browsing and implementing. There is little more left that can be done with standard.
3 TURNING IT ALL AROUND
So far I have outlined how an SGML DTD can be defined and what many possible processings there are. But the HL7 specifications in SGML can in turn be derived from an other form. For example, the formal elements of the HL7 specifications can be captured in a database. Frank Oemig of HL7 Germany has created an enormously useful database of HL7 that includes all the drafts and finals from 2.1 up to 2.3. With this database he can give account for every single bit of formal specification that has changed between the versions. He also prepared browseable HTML files that can be used in order to jump through the definitions of all messages, segments and tables of all versions since 2.1. Unfortunately, longer descriptive text must still be looked up in the paper documents where following references is so tedious. SGML would solve this problem. The HL7 DTD can be regarded as a data model. An instance of that DTD can in turn be regarded as a Database of HL7 specifications of a certain version. It is possible to use SGML translators to feed the specs into a relational or object database. Frank Oemig has shown, that it is relatively easy to return an SGML version of the data in the database. This would allow extensive checks for referential integrity before the standard is released.
4 EXPERIENCES MADE SO FAR
After the first draft of thi proposal, the DTD was improved and is workable based on ISO 12083:1993 book DTD it adds all the elements that are special to HL7. The layer 2 DTD is, however, not yet integrated as I was watching out for standards. From now on the corpus of the HL7 standard can be moved into SGML.
However, the problems are often visible only when a task is brought near the end. This is especially true with SGML. While documents can be coded, validated and parsed against the defined DTD easily it is much more difficult to make the document functional. As SGML is a notation of the structure, or syntax, of the document, the semantics and pragmatics can neither be specified nor implemented in SGML alone. Tools that are available for transformation and formatting of SGML are not widely available (platform dependent, not redistributeable, expensive) or not general enough to be used for all kinds of transformations suggested in this proposal. And all of the many tools available use a different specification language to define the transformations. Since the transformations are the semantic salt in the SGML soup, it is critical that they be formulated in a maintainable and portable, standard, fashion.
The Document Style Semantics and Specification Language (DSSSL) is an ISO standard which addresses this problem. It is based on Scheme (a `light' dialect of LISP). Therefore good DSSSL implementations will be very powerful, as they are complete programming environments, tailored to SGML transformation and formatting, which inherit the power and flexibility of LISP. Jade is an emerging implementation of DSSSL that in its first pre-releases focused on formatting of SGML documents in RTF, which is importable by Microsoft text processors. DSSSL implies a quite complex architecture consisting of front-ends and back-ends. The idea is, that a `style sheet' that describess formatting of instances of some DTD can be rendered in multiple different formats. However, an inspection of current Jade shows, that it is doubtful whether it is possible to make the back end issues really transparent. Above all, it is a shortcoming of DSSSL that there is no portable and distributable implementation yet, which addresses the issue of intra-SGML transformation.
On the other hand, simple tools like SGMLSASP by Goldfab and Clark, are not sufficiently powerful to perform the restructuring of trees that is necessary in order to cope with HL7 layer 3. For instance, in definitions of segments, the field notes that pertain to the fields are to be rendered after the table that listed all the fields.
While the layered approach seemed to be the best choice, again practice reveals that there is no standard layer 2 and 1 DTD that is supported by formatting and transformation applications already. The TEI is convertable to RTF or LaTeX by some tools, DocBook is convertible to HTML using other tools, and the ISO book is not supported at all. Moreover, the existing DTDs do not really support layering because in order to make distinct additions in content models the lower layer DTD has to be manually changed yielding a derivative of the standard DTD, but not the standard itself. This becomes difficult to maintain. Methods have to be developed, by which an extension of existing DTDs is facilitated without having to change the base DTD.
CONCLUSION
Before any real attempt in practically demonstrating the functionality of the DTD proposed herein can be made, there are open issues to be solved by other parties. It seems advisable to wait for the availability of DSSSL engines, whose capabilities should be more the transformation to other SGML formats or general character streams, than the formatting and layout of text on paper which is easyly possible with LaTeX.
Everyone who considers the applicaton of SGML to HL7 should be excited by now about the whole new world of opportunities that SGML gives to the management and application of HL7. Ranging from editing over balloting, publishing, browsing up to implementing our HL7 standard, all that is possible from a single and portable platform. SGML. The more this becomes clear, the more does the lack of simple and consistent support for transformation of DTD instances hurt.
For HL7 internal issues, I'd like to ask the SGML folks to consider steping aside from the immediate application of SGML as a transfer syntax and first promote the use of SGML for the specification of HL7. If this is done, the step to SGML as a transfer syntax is just a rather simple translation of an instance of HL7-spec DTD to an HL7-encoding DTD, if the tools where available.
The HL7 spec DTD can be fetched from here
Send comments to the author:
Gunther Schadow <schadow@ukbf.fu-berlin.de>