The HL7 version 3 data type task group has had its twentysecond conference call on Monday, March 22, 1999, 11:00 PM EST to 11:45 PM EST.
Attendees were:
Agenda items were:
We will do an XML ITS mapping as a basis. ITS mappings beyond that will have to be prepared by people who focus on that other ITS.
We identified clusters of data types that can be considered together for the purpose of XML ITS definition.
assigned to Gunther.
assigned to Gunther.
assigned to Gunther.
Boolean, NoInformation, TII, TIL, might go with XML attributes only, i.e. without any content elements.
Character String no attributes, only data.
Binary Data/Multimedia Free Text
assigned to Mark Tucker
Nesting vs. Mixin style - Mark Tucker.
A string is just characters, XML is based on Unicode just as our strings are, so there is nothing special to do for HL7 v3 strings. HL7 escape sequences are not defined, and there is no need to send raw bytes through a string, since we have a separate data type for this. Thus, an HL7 string is just any XML CDATA.
Examples:
A string as XML content element data is just characters
All the XML rules for white space and new line handling apply. However two points are unclear:
Binary data can be just literal default encoded characters, which is only recommended if the data is supposed to be characters in default encoding (e.g. media type text/plain).
For binary data we must be able to set an encoding format. Most normally we want to use base64 encoding, but some people might want hexadecimal or uu encoding (discouraged).
Anyway, we need at least an encoding attribute and may be others (e.g. length of data chunk).
Three two ways exist: mixin attributes or nested element
Consider we want the data be encoded explicitly in base64 in a component named FOO of type BIN. Mixin attributes would put the ENC=... attribute inside the FOO tag.
YOLckIwZua6MMVtjuGFNdyw7r+9h6W2kt+pFl7SR7KTtwnyJSkCIaflI84L6P 7SKVHX2zUftEduysr98BUEsBAhQAFAAAAAgA3JJ0JnduMIMHTgAAAEABABEAA AAAAAGNoNG91dGxpbmUuZDQuZG9jUEsFBA==
If we want to avoid mixin attributes we can define an XML element for the BIN data type:
YOLckIwZua6MMVtjuGFNdyw7r+9h6W2kt+pFl7SR7KTtwnyJSkCIaflI84L6P 7SKVHX2zUftEduysr98BUEsBAhQAFAAAAAgA3JJ0JnduMIMHTgAAAEABABEAA AAAAAGNoNG91dGxpbmUuZDQuZG9jUEsFBA==
Now that looks cleaner to me. We would simply make "base64" the default ENCoding, so that normally we could write
YOLckIwZua6MMVtjuGFNdyw7r+9h6W2kt+pFl7SR7KTtwnyJSkCIaflI84L6P 7SKVHX2zUftEduysr98BUEsBAhQAFAAAAAgA3JJ0JnduMIMHTgAAAEABABEAA AAAAAGNoNG91dGxpbmUuZDQuZG9jUEsFBA==
Other ENCodings would be "8bit," "hex," "uu," "binhex" (for Apple Macintosh), etc. However, why shouldn't we simply say: no optionality, either use base64 or die!
However, remember that we use BIN in connection with the free text (FTX) data type. For media types text/plain it would be nice to have a less obscuring encoding. So, there is real use for a different encoding. Call it "text". Encoding "text" means: use the default encoding of the surrounding message (either UTF-8 or the encoding set in the initial mark.
This is text data encoded in the default encoding used for this message. Most likely it is encoded in UTF-8.
Text encoding uses the same XML escaping rules as usual XML text
content, i.e. use "<
" if the data contains
"<
".
The remaining question is, what does TEXT encoding mean for non-character data, i.e. how are bytes constructed from the text? This is how it works:
<
";
Free text (FTX) uses the BIN data type whose XML ITS is given above. Actually free text (FTX) reuses the BIN data type so that we save one level of nesting (so to speak: FTX "inherits" from BIN).
... YOLckIwZua6MMVtjuGFNdyw7r+9h6W2kt+pFl7SR7KTtwnyJSkCIaflI84L6P 7SKVHX2zUftEduysr98BUEsBAhQAFAAAAAgA3JJ0JnduMIMHTgAAAEABABEAA AAAAAGNoNG91dGxpbmUuZDQuZG9jUEsFBA== ... This is text that could appear exactly the same way in a character string. But only because CHARSET defaults to the default character encoding. AAAgA3JJ0JnduMIMHTgAAAEABABEAA7SKVHX2zUftEduysr98BUEsBAhQAFAA 2kt+pFl7SR7KTtwnyJSkCIaflI84L6PYOLckIwZua6MMVtjuGFNdyw7r+9h6W UuZDQuZG9jUEsFBAAAAAAGNoNG91dGxpbm3=
As you can see the ENC attribute from BIN is used at the same level as the CHARSET or COMP attribute from FTX. In an implicitly typed world (such as assumed in the HIMSS demo) could we write the following?
This has reduced the level of nesting even more. Note TY, MEDIA and ENC are attributes coming from very different sources but all end up at the same SGML level. TY comes from the enclosing component entity FOO, MEDIA comes from FTX and ENC comes from BIN. Though it is very brief, I get nervous since attribute name conflicts are very hard to control.YOLckIwZua6MMVtjuGFNdyw7r+9h6W2kt+pFl7SR7KTtwnyJSkCIaflI84L6P 7SKVHX2zUftEduysr98BUEsBAhQAFAAAAAgA3JJ0JnduMIMHTgAAAEABABEAA AAAAAGNoNG91dGxpbmUuZDQuZG9jUEsFBA==
An ISO object identifier is simply a string of numbers and dots.
... 2.3.5.814.23.56.66.33324.55667.55.667.777
The question is, do we want to allow for human readable OIDs, such as:
This is clearly a question that is genuinely about string literals, outside of the XML question.they (2) theirs (3) them (5) our (814) mine (23) this (56)
BTW we could define a genuine XML form of the OID:
Do you like that? Pretty long if you ask me. I am going to consider only the number-and-dots form shown first.they theirs them our mine this
Builds on the OID defined above. Again different styles are possible. First the type-less attribute mixin style:
Next the explicit type element style, where attributes have a proper place:
Next a style that would use a string literal and thus would fit into only one attribute
Note that this creates a problem when the extension (EXT) itself contains an at sign ("@"). But it is nice and handy, so may be it is worth the little additional trouble.
Again, the different styles: (1) component tag mixin, (2) explicit type instance element, (3) literal
Special rules apply for the colon (":") here. The colon never is part of the protocol code. That way, by mere coincidence, the ReferTo attribute's value appears like a URL. However, it is not strictly a URL. The following examples would also be correct:
Which reminds me that we will have to be more specific on the "phone" (and "fax") protocols. The HL7 v3 Data Type Specification document lacks a table of allowed and recommended protocols.
Issue in all those "easy" elements above: Code Value comes as a string literal. This is nice, however, it doesn't tell what to do if you want to be more explicit about the components of the Code Value (coding system, version, print name, etc.) This means, that what has beend said above does not work without a sophisticated string literal for CV.
Here I only care about the non-RIM class part of person name. That is the LIST of Person Name Parts. Each part is simply a value and its optional classifier. The classifier is a SET of Code Value. Wow! This is gonna be a big one if we want to use vanilla XML representation of a set of CV:
It is pretty clear that we don't want this. So, first we need a short form for a set of Code Values (Note: this set of code values is actually a kind of code phrase!)Gunther given birth callme Schadow family birth unmarried
This tells us that it is a SET OF CVs and all other attributes are the defaults for all the element (EL) CVs. Second step is to put all the elements into one chunk of data.given birth callme
which allows us now to put the entire set into an attributegiven birth callme
where white space is the delimiter of the elements. This is pretty standard XML style to use white space as delimiters (cf. the IDS attribute of SGML.) Alternatively we can leverage the one-character short forms which are very well suited for such a set of flags. However, we won't be able to rely on single character flags not to run out so we should still keep the white space.
And usually we will not need the CSYS and CSVER stuff. So we can shorten more:
making progress, don't we? Now, in a person name variant we don't expect anything else then SET OF=CVs. So finally we can boil down to the short form:
Gunther Schadow Irma C. Jongeneel e.g. de Haas
The nice thing about this latter form is that if we run Irma's XML name verbatim through an HTML browser, we get this:
Irma C. Jongeneel e.g. de Haas
Which is a demonstration of exactly the reason for the new name type design: you can simply ignore all the tags and print the name as is. However, be aware that the browser only accidentially adhered to the white space rules. As soon as delimiters come into play, it would not do the right thing. For instance:
comes out as "Irma C. Jongeneel - de Haas
The stuff for the address. It is similar to the person name above. The difference is simply that we have an outer container with address purpose, and bad address flag.
However, there is more potential to shortening and simplification. Since address part role codes are simple codes not code phrases. That way we can use those as tag names, not as attribute values!1028 Pinewood Ct Indianapolis , IN - 46240
And of course we can argue the content-vs.-attribute battle. What we can not do is put HNR, STR, CTY, etc as attributes of ADDRESS, since those things may repreat (e.g. there are two DELs in the above example.)1028 Pinewood Ct Indianapolis ,IN -46240
The real issue is white space rules and how we handle the LIT part type. LIT is used for unclassified stuff and is the default. So we would like to not mention LIT. Instead of
we want1028 Pinewood Ct North side near 96th Street and College Indianapolis ,IN -46240
but in order to do so, we need to obey white space in the ADDRESS element content. That requires us, however, to refine white space rules, since all leading whitespace and all white space before and after tagged elements is still to be discarded.1028 Pinewood Ct North side near 96th Street and CollegeIndianapolis ,IN -46240
Simple:
Any questions?Franklin Templeton Growth Fund, Inc. Templeton Growth TGF TEPLX
As noted, above, most often we can simply use a string for a code value and be sure it will be interpreted using the default expected code system. The message is invalid if the string literal is not defined in the code system.
A full blown CV would look like this
or in attribute mixin form
Code Phrase is next. Here we are defining a long form and a short form. The long form takes care of all the possibility to use codes from different code systems in one phrase.
but usually we will just stick with all codes in a phrase from one code system and version, which allows us to compress:
Alternatively, the print name could also appear in the content position of a code value
but beware that sometimes "replacement" and even the code "value" itself can be good candidated for content. I suggest going without any content.AUBURN LIGHT
Now we are ready for the Concept Descriptor with Code Transaltions.
This shows that the original text is put into the content position. But remember that original text is a FTX type, i.e. possibly multi media. This case obviously converts a string to FTX text/plain. If we want to be explicit, here goes:the patient had a light auburn hear color
the patient had a light auburn hear color
Finally, note that the "reference to code translation" is mapped into XML using the ID and IDREF attribute types. ID is ID and ORG is IDREF. That takes care of it. Forward references may even be allowed in XML (?) though could be preveted here since cyclic transaltion paths are forbidden.
Next conference call is next Monday, March 29, 1999, 11:00 AM EST.
Agenda items for next time are:
Please have your homework done by Sunday, so that we have enough time to see each other's work.
regards
-Gunther Schadow