V3DT conference call notes for Mon, Mar 22, 1999.

The HL7 version 3 data type task group has had its twentysecond conference call on Monday, March 22, 1999, 11:00 PM EST to 11:45 PM EST.

Attendees were:

Agenda items were:

  1. How far are we to completion?
  2. How are we going to do ITS mappings?
  3. We may want specific XML ITS for every data type.
  4. We need specific string literals for many data types.
  5. We need to clearly define implicit conversion rules for each data type.

How far are we to completion?

We have all data types on paper now. The things left to do are as follows:
  1. Calendar modulus expressions are only roughly scetched.
  2. A number of open issues need to be resolved.
  3. The probability distribution generic type needs review.
  4. Need XML ITS specifications for every data type.
  5. Need string literals for many data types.
  6. Need implicit conversion rules defined for each data type.

How are we going to do ITS mappings?

We will do an XML ITS mapping as a basis. ITS mappings beyond that will have to be prepared by people who focus on that other ITS.

XML ITS for every data type.

We identified clusters of data types that can be considered together for the purpose of XML ITS definition.

Cluster 1 Person name / address / organization name.

assigned to Gunther.

Cluster 2 Code Value, Concept Descriptor & Co.

assigned to Gunther.

The easy stuff

assigned to Gunther.

Boolean, NoInformation, TII, TIL, might go with XML attributes only, i.e. without any content elements.

Character String no attributes, only data.

Binary Data/Multimedia Free Text

Generic Types

assigned to Mark Tucker

Nesting vs. Mixin style - Mark Tucker.

Quantities

assigned to Mike Henderson & Co.

We need specific string literals for many data types.

String literals are defined for three reasons: We will try to go with XML as far as possible and we will decide on whether or not to define separate string literals depending on the experience with XML representations.

We need to clearly define implicit conversion rules for each data type.

Tabled for later.

Appendix A: Mike Henderson's Homework:

Appendix B: Mark Tucker's Homework:

Appendix C: Gunther Schadow's Homework:

The Easy Stuff

Character String

A string is just characters, XML is based on Unicode just as our strings are, so there is nothing special to do for HL7 v3 strings. HL7 escape sequences are not defined, and there is no need to send raw bytes through a string, since we have a separate data type for this. Thus, an HL7 string is just any XML CDATA.

Examples:

<foo>A string as XML content element data is just characters</foo> <foo bar="A string as XML attribute data is no different"/>

All the XML rules for white space and new line handling apply. However two points are unclear:

  1. What do we do about consecutive white spaces?
  2. What do we do about literal new lines?

Binary Data

Binary data can be just literal default encoded characters, which is only recommended if the data is supposed to be characters in default encoding (e.g. media type text/plain).

For binary data we must be able to set an encoding format. Most normally we want to use base64 encoding, but some people might want hexadecimal or uu encoding (discouraged).

Anyway, we need at least an encoding attribute and may be others (e.g. length of data chunk).

Three two ways exist: mixin attributes or nested element

Consider we want the data be encoded explicitly in base64 in a component named FOO of type BIN. Mixin attributes would put the ENC=... attribute inside the FOO tag.

<FOO TY=BIN ENC="base64"> YOLckIwZua6MMVtjuGFNdyw7r+9h6W2kt+pFl7SR7KTtwnyJSkCIaflI84L6P 7SKVHX2zUftEduysr98BUEsBAhQAFAAAAAgA3JJ0JnduMIMHTgAAAEABABEAA AAAAAGNoNG91dGxpbmUuZDQuZG9jUEsFBA== </FOO>

If we want to avoid mixin attributes we can define an XML element for the BIN data type:

<FOO> <BIN ENC="base64"> YOLckIwZua6MMVtjuGFNdyw7r+9h6W2kt+pFl7SR7KTtwnyJSkCIaflI84L6P 7SKVHX2zUftEduysr98BUEsBAhQAFAAAAAgA3JJ0JnduMIMHTgAAAEABABEAA AAAAAGNoNG91dGxpbmUuZDQuZG9jUEsFBA== </BIN> </FOO>

Now that looks cleaner to me. We would simply make "base64" the default ENCoding, so that normally we could write

<FOO> <BIN> YOLckIwZua6MMVtjuGFNdyw7r+9h6W2kt+pFl7SR7KTtwnyJSkCIaflI84L6P 7SKVHX2zUftEduysr98BUEsBAhQAFAAAAAgA3JJ0JnduMIMHTgAAAEABABEAA AAAAAGNoNG91dGxpbmUuZDQuZG9jUEsFBA== </BIN> </FOO>

Other ENCodings would be "8bit," "hex," "uu," "binhex" (for Apple Macintosh), etc. However, why shouldn't we simply say: no optionality, either use base64 or die!

However, remember that we use BIN in connection with the free text (FTX) data type. For media types text/plain it would be nice to have a less obscuring encoding. So, there is real use for a different encoding. Call it "text". Encoding "text" means: use the default encoding of the surrounding message (either UTF-8 or the encoding set in the initial mark.

<FOO> <BIN ENC=TEXT> This is text data encoded in the default encoding used for this message. Most likely it is encoded in UTF-8. </BIN> </FOO>

Text encoding uses the same XML escaping rules as usual XML text content, i.e. use "&lt;" if the data contains "<".

The remaining question is, what does TEXT encoding mean for non-character data, i.e. how are bytes constructed from the text? This is how it works:

  1. resolve all XML parameter entity references, such as "&lt;";
  2. apply the XML standard rules for dealing with white space and new lines;
  3. encode the text in the default encoding (e.g., UTF-8) and the resulting byte string is the data.
Obviously this is not the preferred encoding for radiology images, but one can use TEXT encoding for everything, since it should be clearly defined this way.

Free Text

Free text (FTX) uses the BIN data type whose XML ITS is given above. Actually free text (FTX) reuses the BIN data type so that we save one level of nesting (so to speak: FTX "inherits" from BIN).

<FOO> <FTX MEDIA="application/pdf" COMP=gzip> YOLckIwZua6MMVtjuGFNdyw7r+9h6W2kt+pFl7SR7KTtwnyJSkCIaflI84L6P 7SKVHX2zUftEduysr98BUEsBAhQAFAAAAAgA3JJ0JnduMIMHTgAAAEABABEAA AAAAAGNoNG91dGxpbmUuZDQuZG9jUEsFBA== </FTX> </FOO> ... <BAR> <FTX MEDIA="text/plain" ENC=text> This is text that could appear exactly the same way in a character string. But only because CHARSET defaults to the default character encoding. </FTX> </BAR> ... <BAZ> <FTX MEDIA="text/plain" CHARSET="US-ASCII" ENC="base64"> AAAgA3JJ0JnduMIMHTgAAAEABABEAA7SKVHX2zUftEduysr98BUEsBAhQAFAA 2kt+pFl7SR7KTtwnyJSkCIaflI84L6PYOLckIwZua6MMVtjuGFNdyw7r+9h6W UuZDQuZG9jUEsFBAAAAAAGNoNG91dGxpbm3= </FTX> </BAZ>

As you can see the ENC attribute from BIN is used at the same level as the CHARSET or COMP attribute from FTX. In an implicitly typed world (such as assumed in the HIMSS demo) could we write the following?

<FOO TY=FTX MEDIA="application/pdf" ENC="base64"> YOLckIwZua6MMVtjuGFNdyw7r+9h6W2kt+pFl7SR7KTtwnyJSkCIaflI84L6P 7SKVHX2zUftEduysr98BUEsBAhQAFAAAAAgA3JJ0JnduMIMHTgAAAEABABEAA AAAAAGNoNG91dGxpbmUuZDQuZG9jUEsFBA== </FOO>
This has reduced the level of nesting even more. Note TY, MEDIA and ENC are attributes coming from very different sources but all end up at the same SGML level. TY comes from the enclosing component entity FOO, MEDIA comes from FTX and ENC comes from BIN. Though it is very brief, I get nervous since attribute name conflicts are very hard to control.

Object Identifier

An ISO object identifier is simply a string of numbers and dots.

<FOO BAR="2.3.5.814.23.56.66.33324.55667.55.667.777"> ... <BAR TY="OID"> 2.3.5.814.23.56.66.33324.55667.55.667.777 </BAR>

The question is, do we want to allow for human readable OIDs, such as:

<FOO BAR="they (2) theirs (3) them (5) our (814) mine (23) this (56)" ... <BAR TY="OID"> they (2) theirs (3) them (5) our (814) mine (23) this (56) </BAR>
This is clearly a question that is genuinely about string literals, outside of the XML question.

BTW we could define a genuine XML form of the OID:

<OID> <HOP N="2">they</HOP> <HOP N="3">theirs</HOP> <HOP N="5">them</HOP> <HOP N="814">our</HOP> <HOP N="23">mine</HOP> <HOP N="56">this</HOP> </OID>
Do you like that? Pretty long if you ask me. I am going to consider only the number-and-dots form shown first.

Technical Instance Identifier

Builds on the OID defined above. Again different styles are possible. First the type-less attribute mixin style:

<ThingId TY=TII ROOT="2.3.5.814.23.56" EXT="137-J-3484"/>

Next the explicit type element style, where attributes have a proper place:

<ThingId> <TII ROOT="2.3.5.814.23.56" EXT="137-J-3484"/> </ThingId>

Next a style that would use a string literal and thus would fit into only one attribute

<Thing ThingId="137-J-3484@2.3.5.814.23.56"/>
Note that this creates a problem when the extension (EXT) itself contains an at sign ("@"). But it is nice and handy, so may be it is worth the little additional trouble.

Technical Instance Locator

Again, the different styles: (1) component tag mixin, (2) explicit type instance element, (3) literal

<ReferTo TY=TIL PROTO="http" ADDR="//aurora.rg.iupui.edu/v3dt"/> <ReferTo> <TIL PROTO="http" ADDR="//aurora.rg.iupui.edu/v3dt"/> </ReferTo> <Thing ReferTo="http://aurora.rg.iupui.edu/v3dt"/>
Special rules apply for the colon (":") here. The colon never is part of the protocol code. That way, by mere coincidence, the ReferTo attribute's value appears like a URL. However, it is not strictly a URL. The following examples would also be correct:
<Thing ReferTo="URL:http://aurora.rg.iupui.edu/v3dt"/> <Doctor Phone="phone:+13178160516"/>
Which reminds me that we will have to be more specific on the "phone" (and "fax") protocols. The HL7 v3 Data Type Specification document lacks a table of allowed and recommended protocols.

Issue in all those "easy" elements above: Code Value comes as a string literal. This is nice, however, it doesn't tell what to do if you want to be more explicit about the components of the Code Value (coding system, version, print name, etc.) This means, that what has beend said above does not work without a sophisticated string literal for CV.

Cluster 1 Person name / address / organization name.

Person Name

Here I only care about the non-RIM class part of person name. That is the LIST of Person Name Parts. Each part is simply a value and its optional classifier. The classifier is a SET of Code Value. Wow! This is gonna be a big one if we want to use vanilla XML representation of a set of CV:

<PART>Gunther <CLASSIFIER TY=SET> <EL TY=CV>given</EL> <EL TY=CV>birth</EL> <EL TY=CV>callme</EL> </CLASSIFIER> </PART> <PART>Schadow <CLASSIFIER TY=SET> <EL TY=CV>family</EL> <EL TY=CV>birth</EL> <EL TY=CV>unmarried</EL> </CLASSIFIER> </PART>
It is pretty clear that we don't want this. So, first we need a short form for a set of Code Values (Note: this set of code values is actually a kind of code phrase!)
<SET OF=CV CSYS="HL7-3844" CSVER="0.9"> <EL>given</EL> <EL>birth</EL> <EL>callme</EL> </SET>
This tells us that it is a SET OF CVs and all other attributes are the defaults for all the element (EL) CVs. Second step is to put all the elements into one chunk of data.
<SET OF=CV CSYS="HL7-3844" CSVER="0.9"> given birth callme </SET>
which allows us now to put the entire set into an attribute
<SET OF=CV CSYS="HL7-3844" CSVER="0.9" EL="given birth callme">
where white space is the delimiter of the elements. This is pretty standard XML style to use white space as delimiters (cf. the IDS attribute of SGML.) Alternatively we can leverage the one-character short forms which are very well suited for such a set of flags. However, we won't be able to rely on single character flags not to run out so we should still keep the white space.
<SET OF=CV CSYS="HL7-3844" CSVER="0.9" EL="G B C"/>
And usually we will not need the CSYS and CSVER stuff. So we can shorten more:
<SET OF=CV EL="G B C"/>
making progress, don't we? Now, in a person name variant we don't expect anything else then SET OF=CVs. So finally we can boil down to the short form:
<NAME TY=PN> <A C="G B C">Gunther</A> <A C="F B U">Schadow</A> </NAME> <NAME TY=PN> <A C="G B C">Irma</A> <A C="G B I">C.</A> <A C="F M">Jongeneel</A> <A C="P W">e.g. </A> <A C="P VV">de </A> <A C="F B">Haas</A> </NAME>

The nice thing about this latter form is that if we run Irma's XML name verbatim through an HTML browser, we get this:

Irma C. Jongeneel e.g. de Haas

Which is a demonstration of exactly the reason for the new name type design: you can simply ignore all the tags and print the name as is. However, be aware that the browser only accidentially adhered to the white space rules. As soon as delimiters come into play, it would not do the right thing. For instance:

<NAME TY=PN> <A C="G B C">Irma</A> <A C="G B I">C.</A> <A C="F M">Jongeneel</A> <A C="D">-</A> <A C="P VV">de </A> <A C="F B">Haas</A> </NAME>
comes out as " Irma C. Jongeneel - de Haas " where it should be "Irma C. Jongeneel-de Haas".

Address

The stuff for the address. It is similar to the person name above. The difference is simply that we have an outer container with address purpose, and bad address flag.

<ADDRESS PURPOSE="RES" BAD=TRUE> <A R=HNR>1028</A> <A R=STR>Pinewood Ct</A> <A R=DEL/> <A R=CTY>Indianapolis</A> <A R=DEL>,</A> <A R=STA>IN</A> <A R=DEL>-</A> <A R=ZIP>46240</A> </ADDRESS>
However, there is more potential to shortening and simplification. Since address part role codes are simple codes not code phrases. That way we can use those as tag names, not as attribute values!
<ADDRESS PURPOSE="RES" BAD=TRUE> <HNR>1028</HNR> <STR>Pinewood Ct</STR> <DEL/> <CTY>Indianapolis</CTY> <DEL>,</DEL> <STA>IN</STA> <DEL>-</DEL> <ZIP>46240</ZIP> </ADDRESS>
And of course we can argue the content-vs.-attribute battle. What we can not do is put HNR, STR, CTY, etc as attributes of ADDRESS, since those things may repreat (e.g. there are two DELs in the above example.)

The real issue is white space rules and how we handle the LIT part type. LIT is used for unclassified stuff and is the default. So we would like to not mention LIT. Instead of

<ADDRESS PURPOSE="RES" BAD=TRUE <HNR>1028</HNR> <STR>Pinewood Ct</STR> <DEL/> <LIT>North side near 96th Street and College</LIT> <DEL/> <CTY>Indianapolis</CTY> <DEL>,</DEL> <STA>IN</STA> <DEL>-</DEL> <ZIP>46240</ZIP> </ADDRESS>
we want
<ADDRESS PURPOSE="RES" BAD=TRUE <HNR>1028</HNR> <STR>Pinewood Ct</STR> <DEL/> North side near 96th Street and College <DEL/> <CTY>Indianapolis</CTY> <DEL>,</DEL> <STA>IN</STA> <DEL>-</DEL> <ZIP>46240</ZIP> </ADDRESS>
but in order to do so, we need to obey white space in the ADDRESS element content. That requires us, however, to refine white space rules, since all leading whitespace and all white space before and after tagged elements is still to be discarded.

Organization Name

Simple:

<OrgName TY=ON> <ONXV TYPE="L">Franklin Templeton Growth Fund, Inc.</ONXV> <ONXV TYPE="A">Templeton Growth</ONXV> <ONXV TYPE="D">TGF</ONXV> <ONXV TYPE="ST">TEPLX</ONXV> </OrgName>
Any questions?

Cluster 2 Code Value, Concept Descriptor & Co.

Code Value

As noted, above, most often we can simply use a string for a code value and be sure it will be interpreted using the default expected code system. The message is invalid if the string literal is not defined in the code system.

A full blown CV would look like this

<EventCode> <CV V="ORD1234-1" CS="HL7-2374" CSV="3.0" PN="CANCEL ORDER"/> </EventCode>
or in attribute mixin form
<EventCode TY=CV V="ORD1234-1" CS="HL7-2374" CSV="3.0" PN="CANCEL ORDER"/>

Code Phrase is next. Here we are defining a long form and a short form. The long form takes care of all the possibility to use codes from different code systems in one phrase.

<CXPH> <CV V="274" CS="ICHC" PN="AUBURN"/> <CV V="23" CS="ICCM" PN="LIGHT"/> </CXPH>
but usually we will just stick with all codes in a phrase from one code system and version, which allows us to compress:
<CXPH V="274 234" CS="ICHC" PN="AUBURN LIGHT"/>

Alternatively, the print name could also appear in the content position of a code value

<CXPH> <CV V="274" CS="ICHC">AUBURN</CV> <CV V="23" CS="ICCM">LIGHT</CV> </CXPH>
but beware that sometimes "replacement" and even the code "value" itself can be good candidated for content. I suggest going without any content.

Now we are ready for the Concept Descriptor with Code Transaltions.

<HairColor TY=CD> the patient had a light auburn hear color <CXLT ID="1" V="274 234" CS="ICHC" PN="AUBURN LIGHT"/> <CXLT ID="2" ORG="1" V="B038 G943" CS="PILS-A" PN="BROWN REDDISH"/> </HairColor>
This shows that the original text is put into the content position. But remember that original text is a FTX type, i.e. possibly multi media. This case obviously converts a string to FTX text/plain. If we want to be explicit, here goes:
<HairColor TY=CD> <FTX MEDIA="text/plain">the patient had a light auburn hear color</FTX> <CXLT ID="1" V="274 234" CS="ICHC" PN="AUBURN LIGHT"/> <CXLT ID="2" ORG="1" V="B038 G943" CS="PILS-A" PN="BROWN REDDISH"/> </HairColor>

Finally, note that the "reference to code translation" is mapped into XML using the ID and IDREF attribute types. ID is ID and ORG is IDREF. That takes care of it. Forward references may even be allowed in XML (?) though could be preveted here since cyclic transaltion paths are forbidden.


Next conference call is next Monday, March 29, 1999, 11:00 AM EST.

Agenda items for next time are:

  1. Review XML ITS experiments (homeworks).

Please have your homework done by Sunday, so that we have enough time to see each other's work.

regards

-Gunther Schadow