HL7 Version 3 Data Type Redesign Project - outdated by January 1999

Please note that this page was frozen in January 1999 it reflects stuff that may have been updated in the meantime. If you want to know the latest, please refer to the up to date index.

The HL7 control query committee on its September 1998 meeting in San Diego set up a task force to work out a redesign proposal for HL7 version 3 data types. Work products will be a proposal document that will describe the new data types, the rationale for the design decisions and hints and caveats how to use the new types. We will also work out a meta model of data types so that this work can plug seamlessly into the other version 3 work maintained by Woody Beeler's tool set. We will also use UML and Rose to build information models.

Work document

I originally intended to write a proposal document that would have been updated as we were moving along in our discussions. This never really worked out, though. It is just too much work on too many details. I can barely keep up with writing notes. The current document is thus only a very messy first incomplete draft. Its main achievement is to combine a table of contents to be used as a work list and that it includes the slides discussed in San Diego. Only a few secions are written as of yet and a lot of text that is present will probably change drastically or will be removed.

The document is available as PDF (need Acrobat 3) or PS. The preface tells how the document is maintained and how you can make contributions.

FOR ANY UP TO DATE REVIEW OF WHAT WE DO, PLEASE CONSULT THE NOTES BELOW.

I was hoping that I could summarize all those notes in the work document shortly before the January HL7 meeting, but I am a bit pessimistic now that this will work out. We are still not finished in going once around everything and time is so short.

Slides

All graphics accessible from here are: Copyright © 1998 Regenstrief Institute. All rights reserved.

First set of slides

A ZIP file of all slides can be downloaded.

Conference Call Notes and Schedule

  1. Thursday, October 8, 1998, 11:00 to 12:00 AM EDT.
  2. Thursday, October 15, 1998, 11:00 to 12:30 AM EDT.
  3. Thursday, October 22, 1998, 11:00 to 12:30 AM EDT.
  4. Thursday, October 29, 1998, 11:00 to 12:30 AM EST.
  5. Thursday, November 5, 1998, 11:00 to 12:30 AM EDT. (incomplete)
  6. Monday, November 23, 1998, 11:00 to 12:30 AM EST.
  7. Monday, November 30, 1998, 11:00 to 12:30 AM EST.
  8. Monday, December 7, 1998, 11:00 to 12:30 AM EST.
  9. Monday, December 14, 1998, 11:00 to 12:30 AM EST.
  10. Monday, December 21, 1998, 11:00 AM to 12:30 EST.
  11. Monday, December 28, 1998, 11:00 AM EDT. (canceled)
  12. Monday, January 4, 1999, 11:00 AM EDT.
  13. Monday, January 11, 1999, 11:00 AM EDT.
  14. Wednesday, January 13, 1999, 2:00 PM EST. (done, no notes)

The Report for Orlando '99 is here as HTML, PS, or PDF.

Please go to the recent index to find out what happened after Orlando '99.


Data Type Definition Meta Model

With special greetings to Woody Beeler, here is a meta model of the data type definitions.

List of Defined v3 Data Types

The following is an overview of the data type that we have defined so far. Note that since this is work in progress different data types may have different statuses of confirmation (from proposed, tentative to much agreed). Please see the notes for open issues and follow up on future evolvements.

Boolean
A boolean value is the domain of two valued logic: either true or false tertium non datur and all the stuff everyone should know about logics. The boolean type is amaizingly useful throughout all layers of abstraction, from the bit in a machine up to object oriented data analysis.
No Information
A No Information value can occur in place of any other value to express that specific information is missing and how or why it is missing. This is like a NULL in SQL but with the ability to specify a certain flavor of missing information.
Character String
A character string is a primitive data type that contains Unicode characters. A single character is not considered an HL7 data type. Note that the string type is not limited to ASCII characters and none of the "escape" sequences of v2.3 are defined. Transmitting Unicode characters is considered an ITS layer issue and the application layer is not supposed to deal with the peculiarities of different character encodings.
Multimedia Enhanced Free Text
Free text may be anything from a few formatted characters to complex documents or images. This data type is defined similar to the ED data type that in turn is based on the MIME standard.
Technical Instance Identifier
Technical identifiers are unique and unravelable through the consistent and required use of the ISO OBJECT IDENTIFIER (OID).
Code Value
A code value is used to refer to technical concepts and is also the basic building block for construcing more complex concept descriptors for real world concepts.
Concept Descriptor
Concept descriptors are the way to refer to real world concepts (e.g. diagnoses, procedures, etc.). Just as with the old CE data type one can specify a code from one coding system with its translation into another coding system. This data type is more general than the CE so that multiple Code Translations can be given, and their dependencies can be exactly specified. With Code Phrases one single axial code can be mapped to multiple codes for a multi axial codeing system and vice versa.
Technical Instance Locator
A technical instance locator is a reference to some technical thing (e.g., image, document, telephone, e-mail box, etc.) It is a generalization of the well-known URL concept.
Integer Number
Embody the usual concept of integer numbers. Integers are used almost only for counts or values derived from counts by addition and subtraction.
Floating Point Number
Embody the abstract concept of real numbers. Floating point numbers have a built-in notion of precision in terms of the number of significant decimal digits.
Ratio of Quantities
A quotient of any two quantities. Quantities currently defined are
Measurement with Unit
This can be a physical measurement (with physical units) but also a currency (with monetary units).
Point in Time
A difference scale quantity in the physical dimension of time. Usual expressions of points in time are made based on calendars, which are quite complex "coordinate systems" for time. This is basically the old "TS" data type.
Calendar Modulus Expressions
Expression of the form day-of-the-month, or day-of-the-week, month-of-the-year, hour-of-the-day, all have a common structure (x of the y). This data type is not yet defined. We may end up with one or many data types to cover what was called TM (time) or "week day" in HL7.
Interval
Also called "range". A continuous subset of an ordered type. Intervals are expressed by boundaries of the base type. Boundaries may be undefined.
Uncertain Discrete Value using Probabilities
A discrete value and an associated probability for that value to apply in a given context.
Uncertain Value using Narrative Expressions of Confidence
A discrete value and a narrative expression of confidence for that value to apply in a given context. Those "narrative expressions" are keywords, such as "approximately", "probably", "likely", "slight chance of", etc.
Non-Parametric Probability Distribution
A collection of Uncertain Discrete Value using Probabilities to specify a probability distribution.
Parametric Probability Distribution
Contains mean, standard deviation and also a distribution type plus its parameters. This is useful, for example, to specify "precisely" the accuracy of a measurement or to specify results of clinical trials.
History
Generic data type that allows the history of some data element to be sent. A History is a list of History Items.
History Item
A History Item can be used wherever a validity time (effective date/time, expiry data/time) is essential part of some data. Used primarily as the element of a History.
Annotated Information
Whenever a sender feels that "there is more to say" about a data element, the annotation structure can be sent that contains the data element together with some free form annotation. The annotation is meant to be interpreted by humans.

All Data Types by Category

The following three subsections list the above data types by category: (1) primitive, (2) composite, and (3) generic types.

List of (relatively) Primitive Types

Note: Primitive-ness and composite-ness are relative qualifications of data types. It all depends on how the type system is designed.

Boolean
A boolean value is the domain of two valued logic: either true or false tertium non datur and all the stuff everyone should know about logics. The boolean type is amazingly useful throughout all layers of abstraction, from the bit in a machine up to object oriented data analysis.
No Information
A No Information value can occur in place of any other value to express that specific information is missing and how or why it is missing. This is like a NULL in SQL but with the ability to specify a certain flavor of missing information.
Character String
A character string is a primitive data type that contains Unicode characters. A single character is not considered an HL7 data type. Note that the string type is not limited to ASCII characters and none of the "escape" sequences of v2.3 are defined. Transmitting Unicode characters is considered an ITS layer issue and the application layer is not supposed to deal with the peculiarities of different character encodings.
Binary Data
This is used only as part of Multimedia Enhanced Free Text. Binary data is just the raw data bits, no tagging or typing whatsoever. Binary data is mentioned here to distinguish it from Character String and to make sure people do not mismatch Binary Data with Character String
ISO OBJECT IDENTIFIER (OID)
Used only by the Technical Instance Identifier
Integer Number
Embody the usual concept of integer numbers. Integers are used almost only for counts or values derived from counts by addition and subtraction.
Floating Point Number
Embody the abstract concept of real numbers. Floating point numbers have a built-in notion of precision in terms of the number of significant decimal digits.
Point in Time
A difference scale quantity in the physical dimension of time. Usual expressions of points in time are made based on calendars, which are quite complex "coordinate systems" for time. This is basically the old "TS" data type.

List of (relatively) Composite Types

Note: Primitive-ness and composite-ness are relative qualifications of data types. It all depends on how the type system is designed.

Multimedia Enhanced Free Text
Free text may be anything from a few formatted characters to complex documents or images. This data type is defined similar to the ED data type that in turn is based on the MIME standard.
Technical Instance Identifier
Technical identifiers are unique and unravelable through the consistent and required use of the ISO OBJECT IDENTIFIER (OID).
Code Value
A code value is used to refer to technical concepts and is also the basic building block for construcing more complex concept descriptors for real world concepts.
Concept Descriptor
Concept descriptors are the way to refer to real world concepts (e.g. diagnoses, procedures, etc.). Just as with the old CE data type one can specify a code from one coding system with its translation into another coding system. This data type is more general than the CE so that multiple Code Translations can be given, and their dependencies can be exactly specified. With Code Phrases one single axial code can be mapped to multiple codes for a multi axial codeing system and vice versa.
Technical Instance Locator
A technical instance locator is a reference to some technical thing (e.g., image, document, telephone, e-mail box, etc.) It is a generalization of the well-known URL concept.
Measurement with Unit
This can be a physical measurement (with physical units) but also a currency (with monetary units).
Ratio of Quantities
A quotient of any two quantities. Quantities currently defined are
Calendar Modulus Expressions
Expression of the form day-of-the-month, or day-of-the-week, month-of-the-year, hour-of-the-day, all have a common structure (x of the y). This data type is not yet defined. We may end up with one or many data types to cover what was called TM (time) or "week day" in HL7.

List of Generic Types

Generic types are used for "orthogonal issues", i.e. information of general interest that can apply to all data types or a subset of all data types. Type conversion rules are used to convert a generic types to a base type and vice versa. This means, noone is forced to implement all those more complex generic data types. Generic types are available as you need them without anyone being forced to use them.

Interval
Also called "range". A continuous subset of an ordered type. Intervals are expressed by boundaries of the base type. Boundaries may be undefined.
Uncertain Discrete Value using Probabilities
A discrete value and an associated probability for that value to apply in a given context.
Uncertain Value using Narrative Expressions of Confidence
A discrete value and a narrative expression of confidence for that value to apply in a given context. Those "narrative expressions" are keywords, such as "approximately", "probably", "likely", "slight chance of", etc.
Non-Parametric Probability Distribution
A collection of Uncertain Discrete Value using Probabilities to specify a probability distribution.
Parametric Probability Distribution
Contains mean, standard deviation and also a distribution type plus its parameters. This is useful, for example, to specify "precisely" the accuracy of a measurement or to specify results of clinical trials.
History
Generic data type that allows the history of some data element to be sent. A History is a list of History Items.
History Item
A History Item can be used wherever a validity time (effective date/time, expiry data/time) is essential part of some data. Used primarily as the element of a History.
Annotated Information
Whenever a sender feels that "there is more to say" about a data element, the annotation structure can be sent that contains the data element together with some free form annotation. The annotation is meant to be interpreted by humans.