HL7 Version 3 Data Type Redesign Project - outdated by January
1999
Please note that this page was frozen in
January 1999 it reflects stuff that may have been updated in the
meantime. If you want to know the latest, please refer to the up to date index.
The HL7 control query committee on its September 1998 meeting in San
Diego set up a task force to work out a redesign proposal for HL7
version 3 data types. Work products will be a proposal document that
will describe the new data types, the rationale for the design
decisions and hints and caveats how to use the new types. We will also
work out a meta model of data types so that this work can plug
seamlessly into the other version 3 work maintained by Woody Beeler's
tool set. We will also use UML and Rose to build information models.
Work document
I originally intended to write a proposal document that would have
been updated as we were moving along in our discussions. This never
really worked out, though. It is just too much work on too many
details. I can barely keep up with writing notes. The current document
is thus only a very messy first incomplete draft. Its main achievement
is to combine a table of contents to be used as a work list and that
it includes the slides discussed in San Diego. Only a few secions are
written as of yet and a lot of text that is present will probably
change drastically or will be removed.
The document is available as PDF (need Acrobat
3) or PS. The preface tells how the document is
maintained and how you can make contributions.
FOR ANY UP TO DATE REVIEW OF WHAT WE DO, PLEASE CONSULT THE
NOTES BELOW.
I was hoping that I could summarize all those notes in the work
document shortly before the January HL7 meeting, but I am a bit
pessimistic now that this will work out. We are still not finished in
going once around everything and time is so short.
Slides
All graphics accessible from here are: Copyright ©
1998 Regenstrief Institute. All rights reserved.
First set of slides
A ZIP file of all slides can be downloaded.
Conference Call Notes and Schedule
- Thursday, October 8, 1998, 11:00 to 12:00 AM EDT.
- Thursday, October 15, 1998, 11:00 to 12:30 AM EDT.
- Thursday, October 22, 1998, 11:00 to 12:30 AM EDT.
- Thursday, October 29, 1998, 11:00 to 12:30 AM EST.
- Thursday, November 5, 1998, 11:00
to 12:30 AM EDT. (incomplete)
- Monday, November 23, 1998, 11:00 to 12:30 AM EST.
- Monday, November 30, 1998, 11:00 to
12:30 AM EST.
- Monday, December 7, 1998, 11:00 to
12:30 AM EST.
- Monday, December 14, 1998, 11:00 to
12:30 AM EST.
- Monday, December 21, 1998, 11:00 AM
to 12:30 EST.
-
Monday, December 28, 1998, 11:00 AM EDT. (canceled)
- Monday, January 4, 1999, 11:00 AM EDT.
- Monday, January 11, 1999, 11:00 AM
EDT.
- Wednesday, January 13, 1999, 2:00 PM EST. (done, no notes)
The Report for Orlando '99 is here as
HTML,
PS, or
PDF.
Please go to the recent index to find out
what happened after Orlando '99.
Data Type Definition Meta Model
With special greetings to Woody Beeler, here is a meta model of the data type definitions.
List of Defined v3 Data Types
The following is an overview of the data type that we have defined
so far. Note that since this is work in progress different data types
may have different statuses of confirmation (from proposed, tentative
to much agreed). Please see the notes for open issues and follow up on
future evolvements.
-
Boolean
- A boolean value is the domain of two valued logic: either true or
false tertium non datur and all the stuff everyone should
know about logics. The boolean type is amaizingly useful throughout
all layers of abstraction, from the bit in a machine up to object
oriented data analysis.
-
No Information
- A No Information value can occur in place of any other value to
express that specific information is missing and how or why it is
missing. This is like a NULL in SQL but with the ability to specify a
certain flavor of missing information.
-
Character String
- A character string is a primitive data type that contains Unicode
characters. A single character is not considered an HL7 data
type. Note that the string type is not limited to ASCII characters and
none of the "escape" sequences of v2.3 are defined. Transmitting
Unicode characters is considered an ITS layer issue and the
application layer is not supposed to deal with the peculiarities of
different character encodings.
-
Multimedia Enhanced Free Text
- Free text may be anything from a few formatted characters to
complex documents or images. This data type is defined similar to the
ED data type that in turn is based on the MIME standard.
-
Technical Instance Identifier
- Technical identifiers are unique and unravelable through the
consistent and required use of the ISO OBJECT IDENTIFIER (OID).
-
Code Value
- A code value is used to refer to technical concepts and is also
the basic building block for construcing more complex concept
descriptors for real world concepts.
-
Concept Descriptor
- Concept descriptors are the way to refer to real world concepts
(e.g. diagnoses, procedures, etc.). Just as with the old CE data type
one can specify a code from one coding system with its translation
into another coding system. This data type is more general than the
CE so that multiple
Code Translations can be given, and their dependencies can be exactly
specified. With Code
Phrases one single axial code can be mapped to multiple codes for
a multi axial codeing system and vice versa.
-
Technical Instance Locator
- A technical instance locator is a reference to some technical
thing (e.g., image, document, telephone, e-mail box, etc.) It is a
generalization of the well-known URL concept.
-
Integer Number
- Embody the usual concept of integer numbers. Integers are used
almost only for counts or values derived from counts by addition and
subtraction.
-
Floating Point Number
- Embody the abstract concept of real numbers. Floating point
numbers have a built-in notion of precision in terms of the number of
significant decimal digits.
-
Ratio of Quantities
- A quotient of any two quantities. Quantities currently defined are
-
Measurement with Unit
- This can be a physical measurement (with physical units) but also
a
currency (with monetary units).
-
Point in Time
- A difference scale quantity in the physical dimension of
time. Usual expressions of points in time are made based on calendars,
which are quite complex "coordinate systems" for time. This is
basically the old "TS" data type.
-
Calendar Modulus Expressions
- Expression of the form day-of-the-month, or day-of-the-week,
month-of-the-year, hour-of-the-day, all have a common structure
(x of the y). This data type is not yet defined. We may
end up with one or many data types to cover what was called TM (time)
or "week day" in HL7.
-
Interval
- Also called "range". A continuous subset of an ordered
type. Intervals are expressed by boundaries of the base
type. Boundaries may be undefined.
-
Uncertain Discrete Value using Probabilities
- A discrete value and an associated probability for that value to
apply in a given context.
-
Uncertain Value using Narrative Expressions of Confidence
- A discrete value and a narrative expression of confidence for that
value to apply in a given context. Those "narrative expressions" are
keywords, such as "approximately", "probably", "likely", "slight
chance of", etc.
-
Non-Parametric Probability Distribution
- A collection of
Uncertain Discrete Value using Probabilities
to specify a probability distribution.
- Parametric
Probability Distribution
- Contains mean, standard deviation and
also a distribution type plus its parameters. This is useful, for
example, to specify "precisely" the accuracy of a measurement or to
specify results of clinical trials.
- History
- Generic
data type that allows the history of some data element to be sent. A
History is a list of History Items.
-
History Item
- A
History Item can be used wherever a validity time (effective
date/time, expiry data/time) is essential part of some data. Used
primarily as the element of a History.
-
Annotated Information
- Whenever a sender feels that "there is more to say" about a data
element, the annotation structure can be sent that contains the data
element together with some free form annotation. The annotation is
meant to be interpreted by humans.
All Data Types by Category
The following three subsections list the above data types by
category:
(1) primitive,
(2) composite,
and (3) generic types.
List of (relatively) Primitive Types
Note: Primitive-ness and composite-ness are relative qualifications
of data types. It all depends on how the type system is designed.
-
Boolean
- A boolean value is the domain of two valued logic: either true or
false tertium non datur and all the stuff everyone should
know about logics. The boolean type is amazingly useful throughout
all layers of abstraction, from the bit in a machine up to object
oriented data analysis.
-
No Information
- A No Information value can occur in place of any other value to
express that specific information is missing and how or why it is
missing. This is like a NULL in SQL but with the ability to specify a
certain flavor of missing information.
-
Character String
- A character string is a primitive data type that contains Unicode
characters. A single character is not considered an HL7 data
type. Note that the string type is not limited to ASCII characters and
none of the "escape" sequences of v2.3 are defined. Transmitting
Unicode characters is considered an ITS layer issue and the
application layer is not supposed to deal with the peculiarities of
different character encodings.
-
Binary Data
- This is used only as part of Multimedia Enhanced Free
Text. Binary data is just the raw data bits, no tagging or typing
whatsoever. Binary data is mentioned here to distinguish it from
Character String and to make sure people do not
mismatch Binary Data with Character String
- ISO OBJECT IDENTIFIER (OID)
- Used only by the
Technical Instance Identifier
-
Integer Number
- Embody the usual concept of integer numbers. Integers are used
almost only for counts or values derived from counts by addition and
subtraction.
-
Floating Point Number
- Embody the abstract concept of real numbers. Floating point
numbers have a built-in notion of precision in terms of the number of
significant decimal digits.
-
Point in Time
- A difference scale quantity in the physical dimension of
time. Usual expressions of points in time are made based on calendars,
which are quite complex "coordinate systems" for time. This is
basically the old "TS" data type.
List of (relatively) Composite Types
Note: Primitive-ness and composite-ness are relative qualifications
of data types. It all depends on how the type system is designed.
-
Multimedia Enhanced Free Text
- Free text may be anything from a few formatted characters to
complex documents or images. This data type is defined similar to the
ED data type that in turn is based on the MIME standard.
-
Technical Instance Identifier
- Technical identifiers are unique and unravelable through the
consistent and required use of the ISO OBJECT IDENTIFIER (OID).
-
Code Value
- A code value is used to refer to technical concepts and is also
the basic building block for construcing more complex concept
descriptors for real world concepts.
-
Concept Descriptor
- Concept descriptors are the way to refer to real world concepts
(e.g. diagnoses, procedures, etc.). Just as with the old CE data type
one can specify a code from one coding system with its translation
into another coding system. This data type is more general than the
CE so that multiple
Code Translations can be given, and their dependencies can be exactly
specified. With Code
Phrases one single axial code can be mapped to multiple codes for
a multi axial codeing system and vice versa.
-
Technical Instance Locator
- A technical instance locator is a reference to some technical
thing (e.g., image, document, telephone, e-mail box, etc.) It is a
generalization of the well-known URL concept.
-
Measurement with Unit
- This can be a physical measurement (with physical units) but also
a
currency (with monetary units).
-
Ratio of Quantities
- A quotient of any two quantities. Quantities currently defined are
-
Calendar Modulus Expressions
- Expression of the form day-of-the-month, or day-of-the-week,
month-of-the-year, hour-of-the-day, all have a common structure
(x of the y). This data type is not yet defined. We may
end up with one or many data types to cover what was called TM (time)
or "week day" in HL7.
List of Generic Types
Generic types are used for "orthogonal issues", i.e. information of
general interest that can apply to all data types or a subset of all
data types. Type conversion rules are used to convert a generic types
to a base type and vice versa. This means, noone is forced to
implement all those more complex generic data types. Generic types are
available as you need them without anyone being forced to use them.
-
Interval
- Also called "range". A continuous subset of an ordered
type. Intervals are expressed by boundaries of the base
type. Boundaries may be undefined.
-
Uncertain Discrete Value using Probabilities
- A discrete value and an associated probability for that value to
apply in a given context.
-
Uncertain Value using Narrative Expressions of Confidence
- A discrete value and a narrative expression of confidence for that
value to apply in a given context. Those "narrative expressions" are
keywords, such as "approximately", "probably", "likely", "slight
chance of", etc.
-
Non-Parametric Probability Distribution
- A collection of
Uncertain Discrete Value using Probabilities
to specify a probability distribution.
- Parametric
Probability Distribution
- Contains mean, standard deviation and
also a distribution type plus its parameters. This is useful, for
example, to specify "precisely" the accuracy of a measurement or to
specify results of clinical trials.
- History
- Generic
data type that allows the history of some data element to be sent. A
History is a list of History Items.
-
History Item
- A
History Item can be used wherever a validity time (effective
date/time, expiry data/time) is essential part of some data. Used
primarily as the element of a History.
-
Annotated Information
- Whenever a sender feels that "there is more to say" about a data
element, the annotation structure can be sent that contains the data
element together with some free form annotation. The annotation is
meant to be interpreted by humans.