HL7 v3.0 Data Types Specification

*** report-0.9.html Wed Jul 14 01:12:17 1999 --- report.html Wed Jul 14 01:01:38 1999 *************** *** 1,16 **** ! HL7 v3.0 Data Types Specification - Version 0.9 ! !

Version 0.9

Gunther Schadow --- 1,23 ---- ! HL7 v3.0 Data Types Specification - Version 0.95 ! ! !

HL7 v3.0 Data Types Specification

Version 0.95

Gunther Schadow *************** *** 30,42 **** semantic units in the space of all possible data types. This redesign work is heavily based on experiences with HL7 v2.x. !

Data types are defined for (1) character strings and multimedia ! enabled free text; (2) codes and identifiers for concepts and ! instances both of the real world and of technical artifacts; (3) all ! kinds of quantities including integer and floating point numbers, ! physical measurements with units, various kinds of time. Data types ! are classified (generalized) in various ways with respect to certain ! properties of interest.

A number of issues have been identified to be equally applicable to many if not all data types. Intervals (of ordered types), uncertain --- 37,49 ---- semantic units in the space of all possible data types. This redesign work is heavily based on experiences with HL7 v2.x. !

Data types are defined for (1) character strings and display data, ! which accomodates both character based text and multimedial data; (2) ! codes and identifiers for concepts and instances both of the real ! world and of technical artifacts; (3) all kinds of quantities ! including integer and real numbers, physical measurements with units, ! various kinds of time. Data types are classified (generalized) in ! various ways with respect to certain properties of interest.

A number of issues have been identified to be equally applicable to many if not all data types. Intervals (of ordered types), uncertain *************** *** 90,99 ****

2.1 Introduction

2.2 Character String !

2.3 Free Text

2.3.1 Multimedia Enabled Free Text

2.3.2 Binary Data

--- 97,106 ----

2.1 Introduction

2.2 Character String !

2.3 Display Data

2.3.1 Display Data

2.3.2 Binary Data

*************** *** 110,115 **** --- 117,125 ----

3.2 Technical Concepts and the Code Value +

3.2.1 State of a State Machine +

3.3 Real World Concepts *************** *** 145,151 ****

4.1 Overview

4.2 Integer Number !

4.3 Floating Point Number

4.4 Ratio

4.5 Measurements --- 155,161 ----

4.5 Measurements *************** *** 158,165 ****

4.6 Time

4.6.1 Point in Time !

4.6.4 Calendar Modulus Expressions

--- 168,183 ----

4.6 Time

4.6.1 Duration of Time ! !

4.6.2 Point in Time ! !

4.6.3 Time Interval ! !

4.6.4 Periodic Time Points and ! Intervals ! !

4.6.5 Other Issues and Curiosities About Time

*************** *** 167,172 **** --- 185,194 ----

5.1 Interval +

5.1.1 Intervals as Sets - The Notion of Set Revisited +

5.2 General Annotations

5.3 The Historical Dimension *************** *** 820,827 ****

Integer Number , !

! Floating Point Number ,

Physical Quantity --- 842,849 ----

Integer Number , !

! Real Number ,

Physical Quantity *************** *** 1161,1335 ****

1.2.6 The Meta Model

The following is a first draft of a meta model for the data type ! definitions in UML. Since all the concepts are described in the text ! above, this section does not have a lot of text. If you read this with ! an HTML browser, you can click on the class boxes in the diagram to ! find the description of the respective concepts embodied by that ! class. ! !

If you are not concerned with the overall methodology, maintenance ! and quality control of the HL7 v3 specification you can safely skip ! this section. ! !

! ! ! ! !
! !
Figure 2: The meta model of data type definitions.
! ! ! ! ! ! ! ! ! ! ! ! !

! !
Figure 2: The meta model of data type definitions.

Data Type

! !

Every data type has a name and a description. The history attribute ! exists for compatibility with the current MDF meta modeling style. ! !

A data type may be defined as being "internal". An internal type is ! used only to define other composite data types. Internal types are not ! supposed to be directly used in messages. For example, we define a ! type Binary that contains pure raw data bits, ! and that is used only by Multimedia Enabled Free ! Text. ! !

A data type may be defined as being "generic". A generic type is a type whose complete ! specification is deferred until it is actually used in one or the other ! way. The missing pieces (Generic_type_parameter) must be specified ! when used. This is what C++ knows as "templates". ! ! !

Primitive Data Type

! !

A primitive data type has only a textual specification of its ! semantics. The specification is separate from the inherited ! description attribute, because it is essential for a primitive data ! type to have a very careful (and likely long) specification that ! describes the exact semantics of such a type. [Perhaps we can replace ! this DescriptiveText with a pointer to the data type specification ! document.] ! ! !

Composite Data Type

! !

A composite data type consists of ! one or more named and typed components. ! ! !

Data Type Component

! !

A component of a composite data ! type is like a variable, i.e. it has a name and a type. The type ! can be declared to be included by reference instead of by value. This ! is useful if you know such a component mentions an instance that is ! already mentioned elsewhere in the message. In languages such as Java, ! where objects are always handled through references this does not make ! any difference. ! !

Most fields are declared as being of some specific type. However ! when building generic types one sometimes ! wants to leave the type-declaration of a field unspecified. Instead of ! leaving the type declaration completely unspecified, one can also ! constrain the allowable types to certain specific types. When just ! some types are allowed for a given generic data type. ! ! !

DTM Generalization

! !

A data type may be categorized into possibly many generalizations. For instance, Integer Number might be classified as an ! Ordered Type, as a Discrete Type and as a ! Quantity. Generalizations are themselves data types. ! !

All the rules of inheritance known from the object-oriented method ! apply here. I.e. generalized types without attributes are called ! "abstract types" (all the above mentioned generalizations are ! abstract). You can never instantiate an abstract type. A ! specialization type of a non-abstract type inherits all the attributes ! of the parent. Specialized types can add additional attributes or can ! make further constraints on inherited attributes. ! ! !

Collection Data Type

! !

A collection data type is a collection ! of one or many instances of a particular element type. The particular ! semantic variant of the collection data type be specified in the ! collection_type attribute. ! !

The notion of a collection data type should once and forever ! supersede the traditional notion of "repeatability." [This means, the ! MDF meta model needs to be modified where it mentions "repeated" etc.] ! !

Collections are of one of the following types: ! ! ! ! ! ! ! ! !
set an unordered collection of unique element type instances.
bag an unordered collection of element type instances. Instances may ! occur more than once in the bag.
list an ordered collection of element type instances.
! ! !

Generic Type Parameter

! !

This isn't actually a type, but a parameter of a generic type template. However, generic type ! parameters are used as if they were types in the definition ! of the enclosing generic type. For example, we define a generic type ! Interval on all types with a total order relation. In C++ this ! would look like: ! !

! template <class T> class Interval { ! ... ! enum LimitType limitType; ! T lowLimit; ! T highLimit; ! ... ! } ! ! !

Using DTM_Generalization we ! can define categories of data types and we can constrain the template ! parameters to one of those generalized ! types. ! !

Having such a general type it seems possible to declare the generic ! type Interval without using templates and template parameters: !

! class Interval { ! ... ! enum LimitType limitType; ! OrderedType lowLimit; ! OrderedType highLimit; ! ... ! } ! ! however both declarations are not equivalent. While the first one did ! not constrain the template parameter T to be of an Ordered ! Type, the second declaration did not constrain lowLimit and ! highLimit to actually refer to the same special type. ! !

This meta model allows to make both constraints by using the ! Generic_type_parameter that can be constrained using the association ! has_allowed_types. --- 1183,1191 ----

1.2.6 The Meta Model

The Meta Model discussion has been deleted from this specification ! and can now be found in the HL7 version 3 Message Development ! Framework (MDF). *************** *** 1366,1373 **** rules should transfer the semantics of the data as good as possible. Especially the rules should not merely be driven by the coincidence of representations. For instance, it makes no sense to ! cast an ICD-9 code 100.1 to a floating point number 100.1 just because ! their representation happens to be the same.

The easiest way to state the rule for type conversion is by using a conversion matrix such as exemplified in the following table. The rows --- 1222,1229 ---- rules should transfer the semantics of the data as good as possible. Especially the rules should not merely be driven by the coincidence of representations. For instance, it makes no sense to ! cast an ICD-9 code 100.1 to a real number 100.1 just because their ! representation happens to be the same.

The easiest way to state the rule for type conversion is by using a conversion matrix such as exemplified in the following table. The rows *************** *** 1387,1393 **** CodeTranslation ConceptDescriptor Integer ! Float PhysicalQantity Ratio --- 1243,1249 ---- CodeTranslation ConceptDescriptor Integer ! Real PhysicalQantity Ratio *************** *** 1399,1405 **** promote to CodeValue first promote to CodeValue first if string is a valid integer literal ! if string is a valid floating point literal if string is a valid measurement literal is string is a valid ratio literal --- 1255,1261 ---- promote to CodeValue first promote to CodeValue first if string is a valid integer literal ! if string is a valid real number literal if string is a valid measurement literal is string is a valid ratio literal *************** *** 1473,1503 **** none none N/A ! make a float from an int, precision is number of all digits in the integer ! make a float first use as the numerator, set denominator to 1 ! Float ! use floating point literal convert to string first none none none none ! round the float to an int, cave: this may create pseudo-precision N/A use "1" (the unity) for unit use as the numerator, set denominator to 1 PhysicalQantity ! use floating point literal convert to string first none none none none ! down-cast to float first return the value, may throw exception if unit is not "1" N/A use as the numerator, set denominator to 1 --- 1329,1359 ---- none none N/A ! make a real from an int, precision is number of all digits in the integer ! make a real first use as the numerator, set denominator to 1 ! Real ! use real number literal convert to string first none none none none ! round the real number to an int, cave: this may create pseudo-precision N/A use "1" (the unity) for unit use as the numerator, set denominator to 1 PhysicalQantity ! use real number literal convert to string first none none none none ! down-cast to real first return the value, may throw exception if unit is not "1" N/A use as the numerator, set denominator to 1 *************** *** 1509,1517 **** none none none ! down-cast to float first ! convert numerator and denominator to floats and then build the quotient ! cast the ratio values to a float, make a new unit as the ratio of units (if any) N/A --- 1365,1373 ---- none none none ! down-cast to real first ! convert numerator and denominator to real and then build the quotient ! cast the ratio values to a real number, make a new unit as the ratio of units (if any) N/A *************** *** 1578,1600 ****

The XML encoding designed in summer '98 and used in the '99 HIMSS demo, for example, uses an XML-attribute "TY" and mentions the data type as the value to the TY attribute. For instance, the following two ! MEIs for a simple integer number and a ratio of a float and an int could appear in a message.

! The receiver might expect foo to be a floating point ! value. Using the conversion rule convert ! numerator and denominator to floats and then build the ! quotient the receiver can convert the type he has to the ! type he needs.

Mark Tucker's rule of minimal explicitness states that you only need to send TY attributes at a place where the actual type used --- 1434,1455 ----

! The receiver might expect foo to be a real value. Using ! the conversion rule convert numerator and ! denominator to real numbers and then build the quotient the ! receiver can convert the type he has to the type he needs.

Mark Tucker's rule of minimal explicitness states that you only need to send TY attributes at a place where the actual type used *************** *** 1763,1769 ****

! Boolean
--- 1618,1624 ---- --- 1801,1809 ---- A No Information value can occur in place of any other value to express that specific information is missing and how or why it is missing. This is like a NULL in SQL but with the ability to specify a ! certain flavor of missing information. The No Information type extends ! the value domain of any other data type unless explicitly forbidden by ! domain constraints. *************** *** 1968,2010 **** the information is missing. For the time being we keep the list of possible flavors of null subject to open discussions. Reported numbers of different flavors of null values range between 1 (SQL) and 70 ! (reported by Angelo Rossi-Mori). ! !
If No-Information flavors are to be used in a standard way, we will ! have to define a canonical systematization of flavors of null. ! !
For example, Stan Huff's CE proposal contains the following null ! values: ! !

! Boolean (BL)
*************** *** 1927,1933 **** information was missing afterwards. We will factor this "update" component out into update semantics below. Here we only deal with the representation of incomplete ! information.
After having defined the Boolean, the type that underlies all information, we now define a data type called "No Information" as --- 1782,1789 ---- information was missing afterwards. We will factor this "update" component out into update semantics below. Here we only deal with the representation of incomplete ! information. This means, NULL values do no longer automatically carry ! the notion of "deleting" or "overwriting" with them.
After having defined the Boolean, the type that underlies all information, we now define a data type called "No Information" as *************** *** 1945,1951 **** A No Information value can occur in place of any other value to express that specific information is missing and how or why it is missing. This is like a NULL in SQL but with the ability to specify a ! certain flavor of missing information.

component name

component name
! !
U unknown ! no information at all. I.e. nothing more is known about the ! circumstances of missing information. !
UASK asked but unknown ! the person asked could not supply the information ! (why?) !
NAV not available ! the person asked does have the information somewhere ! but not available right now (e.g. oh, I wrote down what ! the doctor said last time, but I didn't bring this piece ! of paper with me). !
NA not applicable ! e.g. an answer to "gestational age" for a patient ! who is not pregnant. !
NASK not asked ! the person who should collect that information forgot to ! ask. !
! !
The above example list provides no assurance to be complete or ! sufficient and it does not attempt to systematize the many possible ! flavors of null. It serves here as an example to show what such ! flavors of null can comprise. Now that we defined a fairly general ! data type for no information, and as we factored update semantics ! into its own method, this issue of a canonical taxonomy of null values ! is less important. In most cases, all what people need is a No ! Information value without the flavor component.
For example, consider the patient's date of birth is requested and we don't know the date of birth because the patient does not remember --- 1826,1900 ---- the information is missing. For the time being we keep the list of possible flavors of null subject to open discussions. Reported numbers of different flavors of null values range between 1 (SQL) and 70 ! (reported by Angelo Rossi-Mori). If No-Information flavors are to be ! used in a standard way, we have to define a canonical systematization ! of flavors of null. The following table lists a number of canonical ! null value flavors plus additional flavors of null which still need to ! be systematized.
! ! ! ! ! ! ! ! ! ! !
NI no information
canonical ! This is the default null value. It simply says that there ! is no information whatsoever given in the context where the NI ! value occurs. The information may or may not be available ! elsewhere, it may or may not be applicable or known. The NI ! value can not be interpreted any further.
NA not applicable
canonical ! The data element does not apply in a given context, e.g. an ! answer to "gestational age" for a patient who is not ! pregnant.
UNK unknown
canonical ! The information may be applicable, but is not known in the ! given context.
OTH other
canonical ! The information is known but can not be expressed in the ! required constraints. Most often used when a concept needs to be ! coded but the code system does not provide for the appropriate ! concept. Many code systems have an "other" entry (also called ! "not otherwise specified".) Terminologies should not themselves ! contain "other" entries [Cimino ??]. The null value of the OTH ! flavor can and should replace those "other" ! codes. Note: this flavor is ! not itself a "not otherwise specified" code ! for null flavors. !
NASK ! not asked ! the person who should ! collect that information forgot to ask. Needs further ! systematization.
ASKU ! asked but unknown ! the person asked could ! not supply the information (why?) Needs further ! systematization.
NAV ! not available ! the person asked does ! have the information somewhere but not available right now ! (e.g. oh, I wrote down what the doctor said last time, but ! I didn't bring this piece of paper with me). Such data ! elements might be updated soon. Needs further systematization. !
NP ! not present
special ! ! The not present value is only meaningful within a message, ! not within a system's data base. The not-present flavor must ! be replaced by the applicable default value at the receiving ! interface. If no other default value is specified, a No ! Information value with the dafalut flavor no information is ! used. !
! !
In most cases, the No Information value with the default flavor ! no information is sufficient. So, if the flavors of null are ! deemed not useful for technical committees or implementors, they can ! simply assume no flavors to exist other than the default no ! information flavor (which would translate to an SQL NULL) and the ! special flavor not present which is only applicable for ! messages and is replaced by a default value at a receiving interface.
For example, consider the patient's date of birth is requested and we don't know the date of birth because the patient does not remember *************** *** 2013,2039 ****

In this example instance notation we will use the symbol #null to be equivalent with (NoInformation) ! without a flavor.
Note that No Information is formally a composite data type, although it has but one component. We will list No Information under the category "primitive" anyway, since it is so fundamental to our ! type system.
1.2.12 Update Semantics

Update semantics deals with the problem of what a receiver is ! supposed to do with information in the message. That information may ! be equal to prior information at the receivers data base, in which ! case no questions occur. But what if the information is different?
We can categorize the modes of updates in the following taxonomy: --- 1903,1967 ----

In this example instance notation we will use the symbol #null to be equivalent with (NoInformation) ! with the implied default flavor no information.
Note that No Information is formally a composite data type, although it has but one component. We will list No Information under the category "primitive" anyway, since it is so fundamental to our ! type system. This is a very special data type anyway, since it will ! never be used in declaring attributes or data elements, but will ! rather extend every data type to provide for a consistent way to ! account for missing information. ! !
Note that extended Boolean logic (e.g., three-valued logic) is ! supported using the classic Boolean data type with the implied domain ! extension offered by the No Information values. The third value of ! three-valued logic would be the No Information value (of any flavor.) ! The logic operators that apply in three valued logic are defined in ! the following tables:
! !
! ! ! !
Definition of logic operators in three-valued logic
! ! ! ! ! !
NOT
true false
false true
ni ni
! ! ! ! ! ! !
AND true false ni
true true false ni
false false false false
ni ni false ni
! ! ! ! ! ! !
OR true false ni
true true true true
false true false ni
ni true ni ni
!
!

1.2.12 Update Semantics

Update semantics deals with the problem of what a receiver is ! supposed to do with information (or "no information") in a ! message. That information may be equal to prior information at the ! receivers data base, in which case no questions occur. But what if the ! information is different?
We can categorize the modes of updates in the following taxonomy: *************** *** 2116,2125 **** should be part of the MEI meta model.
! It turns out that updating a list is the most difficult task to do, ! since positions are relevant in the list. The problem is concurrent ! updates; you never know exactly what the list looks like at the ! receiver's data base when your update message is being processed. For example, if you think the list is (LIST A B C) and you want to insert an element D to come before --- 2044,2053 ---- should be part of the MEI meta model.
! It turns out that updating a list is the most difficult task ! to do, since positions are relevant in the list. The problem is ! concurrent updates; you never know exactly what the list looks like at ! the receiver's system when your update message is being processed. For example, if you think the list is (LIST A B C) and you want to insert an element D to come before *************** *** 2344,2361 **** terminals. Originally, those were control sequences separated from the normal text by a leading ASCII character number 27 ("escape"), hence the name "escape sequence". But escape sequences have since been used ! in many different styles. In C string literals, troff, TeX and RTF we ! see the backslash character (\) introducing escape ! sequences. Troff has a second kind of escape sequences started by a ! period at the beginning of a new line. HL7 version 2 also uses the ! backslash at the beginning and end of escape sequences. SGML uses angle brackets to enclose escape sequences (markup tags), but in addition there are other kinds of escape sequences in SGML opened with the ampersand or percent sign and closed with a semicolon (entity references).
From the many choices to encode formatted text HL7 traditionally ! used a few special escape sequences and troff-style formatting commands. Those HL7 escape sequences have the disadvantage that they are is not very powerful and somewhat arcane or at least outdated by the more recent developments. HTML has become the most widely deployed --- 2272,2291 ---- terminals. Originally, those were control sequences separated from the normal text by a leading ASCII character number 27 ("escape"), hence the name "escape sequence". But escape sequences have since been used ! in many different styles. In C string literals, ! TROFF, ! T_EX ! and RTF we see the backslash character (\) introducing ! escape sequences. TROFF has a second kind of escape sequences started ! by a period at the beginning of a new line. HL7 version 2 also uses ! the backslash at the beginning and end of escape sequences. SGML uses angle brackets to enclose escape sequences (markup tags), but in addition there are other kinds of escape sequences in SGML opened with the ampersand or percent sign and closed with a semicolon (entity references).
From the many choices to encode formatted text HL7 traditionally ! used a few special escape sequences and TROFF-style formatting commands. Those HL7 escape sequences have the disadvantage that they are is not very powerful and somewhat arcane or at least outdated by the more recent developments. HTML has become the most widely deployed *************** *** 2394,2405 **** attribute. There is hardly any rationale for such a decision at design time of the standard. !
Thus, the irrationality and inflexibility of defining multiple ! data types for free text seems to outweigh the conceivable advantage ! that a special data type might accommodate the intrinsics of some ! special encoding formats in greater detail and accuracy. Thus, we ! define only one flexible data type for free text, that can support all ! the techniques for encoding appearance of free text.
2.1.4 From appearance of text to multimedial information
--- 2324,2335 ---- attribute. There is hardly any rationale for such a decision at design time of the standard. !
Thus, the irrationality and inflexibility of defining multiple data ! types for free text seems to outweigh the conceivable advantage that a ! special data type might accommodate the intrinsics of some special ! encoding formats in greater detail and accuracy. Thus, we define only ! one flexible data type for free text, that can support all the ! techniques for encoding appearance of free text.
2.1.4 From appearance of text to multimedial information
*************** *** 2447,2464 **** data in any free text field, and thus, that free text and multimedia data share the same data type. This is not hard to do since one flexible data type was already required to accommodate the different ! encodings of text formats.
2.1.5 Pulling the pieces together

In the previous exploration of the field of text, we separated out the difference between string data elements, where the raw information ! of characters is sufficient and free text, where there is use for formatting the text and augment or even replace the text with ! multimedia information. This means that there will be a string data ! type on the one hand, and a flexible data type that covers free text ! and multimedial data on the other. !
2.2 Character String
--- 2377,2453 ---- data in any free text field, and thus, that free text and multimedia data share the same data type. This is not hard to do since one flexible data type was already required to accommodate the different ! encodings of text formats. We will call this data type "Display Data" ! and it is used for both free text and multimedia. Display Data will ! consist of a media descriptor code and the data itself. Applications ! will render the data differently depending on the media descriptor ! code. ! !
Although it is technicallz convenient to merge character-based free ! text and multimedia data into one data type, the rationale of this ! decision is semantic not technical. Both, character based free text ! and multimedia data is information sent primarily to human beings for ! theiur interpretation. This conforms to the meaning of the word "text" ! as explained by Webster's dictionary: !
!
! Main Entry: text
! Pronunciation: 'tekst
! Function: noun
! Etymology: Middle English, from Middle French texte, from ! Medieval Latin textus, from Latin, texture, context, from ! texere to weave -- more at TECHNICAL
! Date: 14th century
! 1 a (1) : the original words and form of a written or ! printed work (2) : an edited or emended copy of an original ! work b : a work containing such text
2 a ! : the main body of printed or written matter on a page b ! : the principal part of a book exclusive of front and back ! matter c : the printed score of a musical ! composition
3 a (1) : a verse or passage of Scripture ! chosen especially for the subject of a sermon or for authoritative ! support (as for a doctrine) (2) : a passage from an ! authoritative source providing an introduction or basis (as for a ! speech) b : a source of information or ! authority
4 : THEME, ! TOPIC
5 a : the words of ! something (as a poem) set to music b : matter chiefly in ! the form of words that is treated as data for processing by ! computerized equipment <a text-editing ! typewriter>
6 : a type suitable for printing ! running text
7 : TEXTBOOK !
8 a : something ! written or spoken considered as an object to be examined, explicated, ! or deconstructed b : something likened to a text <the ! surfaces of daily life are texts to be explicated -- ! Michiko Kakutani> <he ceased to be a teacher as he became a ! text -- D. J. Boorstin> !
!
! !
Our Display Data type semantically remains to be text ! in the sense of Webster's definitions 5 b and ! 8. Clearly, word processor documents can contain images such as ! drawings or photographs. Modern documents can embed video sequences ! and animations as well. Dictation (audio) is the most important form ! of pre-written medical narratives. A scanned image of old medical ! records or of handwriting is certainly text. In this sense, almost ! everything can be text, which is also supported by the phenomenologic analysis given in the ! introduction. !
2.1.5 Pulling the pieces together

In the previous exploration of the field of text, we separated out the difference between string data elements, where the raw information ! of characters is sufficient and "display data," where there is use for formatting the text and augment or even replace the text with ! multimedia information. This means that there will be a character string data type, and a display data type that covers character-based ! free text and multimedial data.
2.2 Character String
*************** *** 2473,2479 **** --- 3209,3222 ---- + *************** *** 3309,3317 **** macromolecules)

! Character String
--- 2462,2468 ----

! Character String (ST)
*************** *** 2887,2911 **** !
2.3 Free Text

To cope with the various encoding formats of appearance, there will ! be only one data type for free text. This type will have essentially ! two semantic components: It will (1) contain the free text data and ! (2) specify the application which can render that free text data. The ! application to render the data will be specified by a media type code, ! similar to the Internet MIME standard [cf. RFC 2046] or ! HL7 v2.3's ED data type. The only problem is what data type to use ! for the free text data.
Some formatted text could be defined on top of string data. Due to the backwards compatibility of Unicode to ASCII and ISO Latin-1, the ! simple typewriter-style formatting, the troff escape sequences that ! were used by HL7's old data type FT and HTML/SGML formatting is ! possible on top of Unicode strings. In addition to the string data, we ! have to indicate the formatting method that should be used by the ! receiver to render a given string correctly.
Most proprietory text formatting tools, however, do not fit in the character string, because those application use their own --- 2876,2903 ---- ! !
2.3 Display Data

To cope with the various encoding formats of appearance, there will ! be only one data type for both character-based free text and ! multimedia data. This type is called "Display Data" and will have ! essentially two semantic components: It will (1) contain the data ! component and (2) specify the application which can render that data. ! The application to render the data will be specified by a media type ! code, similar to the Internet MIME standard [cf. RFC 2046] or ! HL7 v2.3's ED data type. The only problem is what data type to use for ! the data component.
Some formatted text could be defined on top of string data. Due to the backwards compatibility of Unicode to ASCII and ISO Latin-1, the ! simple typewriter-style formatting, the TROFF ! escape sequences that were used by HL7's old data type FT and ! HTML/SGML formatting is possible on top of Unicode strings. In ! addition to the string data, we have to indicate the formatting method ! that should be used by the receiver to render a given string ! correctly.
Most proprietory text formatting tools, however, do not fit in the character string, because those application use their own *************** *** 2931,2976 ****
It therefore seem reasonable to define a data type for raw byte strings to complement the character string data type. The raw byte ! type would be used only by the data type for free text, though. There ! is hardly any use case for HL7 application domain Technical Committees ! to use byte string data types directly. !
Using byte strings instead of character strings for free text is not only a good idea for proprietory application data or multimedia data, but is also supported by a closer look to standards such as ! HTML, SGML or troff. While those formats are defined on a notion of ! characters instead of bytes, the applications that implement HTML, ! SGML or troff, have their own means to interpret byte streams as ! character encodings (e.g. HTML has a META element and XML ! defines the character set in its !XML header ! element. More traditional formatting with troff is not even able to ! handle the full abstraction of characters that comes with Unicode and ! thus is also based on byte strings rather than character strings. ! !
As a conclusion, we can uniformly define the free text / multimedia ! data type as the pair of media type selector and raw byte data. If the ! sender does not want to use any of the format options for free text ! but just wants to send the raw characters, he can indicate this with a ! special media type (text/plain). It seems justified to ! make the plain text media type the default. !
2.3.1 Multimedia Enabled Free Text
! The multimedia-enabled free text data type consists of the following ! components:
--- 2923,2981 ----
It therefore seem reasonable to define a data type for raw byte strings to complement the character string data type. The raw byte ! type would be used only by the Display Data ! type, though. There is hardly any use case for HL7 application domain ! Technical Committees to use byte string data types directly. !
Using byte strings instead of character strings for display data is not only a good idea for proprietory application data or multimedia data, but is also supported by a closer look to standards such as ! HTML, SGML or TROFF. While those formats are ! defined on a notion of characters instead of bytes, the applications ! that implement HTML, SGML or TROFF, have their ! own means to interpret byte streams as character encodings (e.g. HTML ! has a META element and XML defines the character set in ! its <?XML encoding=...?> processing ! instruction element. More traditional formatting with TROFF is not even able to handle the full abstraction ! of characters that comes with Unicode and thus is also based on byte ! strings rather than character strings. ! !
As a conclusion, we can uniformly define the display data type as ! the pair of media type selector and raw byte data. If the sender does ! not want to use any of the format options for display data but just ! wants to send the raw characters, he can indicate this with a special ! media type (text/plain). Since the display data type is ! most commonly used for character-based free text, the plain text media ! type is the default. ! !
2.3.1 Display Data
! ! Editorial Note: In previous releases ! of this draft specification this data type was called "Multimedia ! Enabled Free Text" or "Free Text" and was abbreviated "FTX." The name ! change to "Display Data" was strongly suggested because of ! considerable confusion caused by term "text" applied to multimedia ! data. In spite of the drastical name change the functionality of this ! data type has not changed at all. !
! The display data type supports both character-based free text and ! multimedia data and consists of the following components:

! Free Text

! The free text data type can convey any data that is primarily meant to ! be shown to human beings for interpretation. Free text can be any kind ! of text, whether unformatted or formatted written language or other ! multi media data.

component name
*************** *** 2985,2998 **** --- 2990,3003 ---- *************** *** 3020,3027 ****

! Display Data (DD)

! The display data type can convey any data that is primarily meant to ! be shown to human beings for interpretation. Display data can be ! character-based free text, whether unformatted or formatted, as well ! as all kinds of multimedia data.

component name optional
defaults to text/plain ! used to select an appropriate method to render the free text data

data Binary Data required ! contains the free text data as raw bytes

compression optional
defaults to text/plain ! used to select an appropriate method to render the display data

data Binary Data required ! contains the display data as raw bytes

compression

Other components may be defined for certain media types. This ! serves as a way to map MIME media type "parameters" to this Free Text ! data type. An example is the charset component, which is a parameter of the MIME media type text/plain.
The media type descriptor of MIME
Other components may be defined for certain media types. This ! serves as a way to map MIME media type "parameters" to this Display ! Data type. An example is the charset component, which is a parameter of the MIME media type text/plain.
The media type descriptor of MIME maintained by IANA. Any of the IANA defined media types is in principle allowed for use ! with the Free Text data type. But not all media types have the same status in this specification.
The following top level media types are currently defined by the IANA: --- 3046,3052 ---- data base maintained by IANA. Any of the IANA defined media types is in principle allowed for use ! with the Display Data type. But not all media types have the same status in this specification.
The following top level media types are currently defined by the IANA: *************** *** 3059,3125 **** behavioral or physical representation within a given domain" [RFC 2077]

-
This data type is called Free Text , and so it - seems strange, almost frightening, that the above list contain media - types like video, application, even - message. Should there not rather be one data type - only for written text, one for audio, one for image, one for - video, etc.? - -
The rationale that lead to the definition of the free text data - type is that free text is information sent from one human being to - another human being. The receiving human being will - if she has a - method to render and see the information - be able to interpret this - data. To understand the full range of meaning of the word "text" we - should have a look into Webster's - dictionary: -
-
- Main Entry: text
- Pronunciation: 'tekst
- Function: noun
- Etymology: Middle English, from Middle French texte, from - Medieval Latin textus, from Latin, texture, context, from - texere to weave -- more at TECHNICAL
- Date: 14th century
- 1 a (1) : the original words and form of a written or - printed work (2) : an edited or emended copy of an original - work b : a work containing such text
2 a - : the main body of printed or written matter on a page b - : the principal part of a book exclusive of front and back - matter c : the printed score of a musical - composition
3 a (1) : a verse or passage of Scripture - chosen especially for the subject of a sermon or for authoritative - support (as for a doctrine) (2) : a passage from an - authoritative source providing an introduction or basis (as for a - speech) b : a source of information or - authority
4 : THEME, - TOPIC
5 a : the words of - something (as a poem) set to music b : matter chiefly in - the form of words that is treated as data for processing by - computerized equipment <a text-editing - typewriter>
6 : a type suitable for printing - running text
7 : TEXTBOOK -
8 a : something - written or spoken considered as an object to be examined, explicated, - or deconstructed b : something likened to a text <the - surfaces of daily life are texts to be explicated -- - Michiko Kakutani> <he ceased to be a teacher as he became a - text -- D. J. Boorstin> -
-
- -
This multimedia data type remains to be text in the - sense of Webster's definitions 5 b and 8. Clearly, word - processor documents can contain images such as drawings or - photographs. Modern documents can embed video sequences and animations - as well. Dictation (audio) is the most important form of pre-written - medical narratives. A scanned image of old medical records or of - handwriting is certainly text. In this sense, almost everything can be - text, which is supported also by the phenomenologic analysis given in the - introduction. -
There are currently more than 160 different MIME media subtypes defined with the list growing quite fast. It makes no sense to list them all here. In general, all those types defined by the IANA may be --- 3064,3069 ---- *************** *** 3265,3274 ****

image/gif other GIF is a nice format that is supported by almost everyone. But it is patented, and the patent holder, Compuserve, has initiated nasty ! lawsuits in the past. No use to discourage this format, but we can not ! raise an encumbered format to a mandatory status.

image/jpeg mandatory
for high color images

image/gif other GIF is a nice format that is supported by almost everyone. But it is patented, and the patent holder, Compuserve, has initiated nasty ! lawsuits in the past [ The GIF ! Controversy: A Software Developer's Perspective]. No use to ! discourage this format, but we can not raise an encumbered format to a ! mandatory status.

image/jpeg mandatory
for high color images

multipart deprecated This ! major media type depends on the MIME standard, the Free Text data type ! uses only want to use MIME multimedia type definitions, not the MIME ! message format

message deprecated This major media type this is used to encapsulate e-mail messages in --- 3257,3265 ---- macromolecules)

multipart deprecated This ! major media type depends on the MIME standard, the Display Data type ! uses only MIME multimedia type definitions, not the MIME message ! format

message deprecated This major media type this is used to encapsulate e-mail messages in *************** *** 3320,3333 **** and HL7 is not used for e-mail.

!
Constraints may be applied on the media types whenever a Free Text ! data type is used, whether at the time of HL7 message specification, or for a given application conformance statement, and even in the RIM. For instance, suppose the Image Management SIG will eventually define a class "Image". This class Image would conceivably contain an attribute, "image_data", declared ! as Free Text. The IMSIG certainly would not want to see written text or ! audio here, but only images (and maybe a video clip of a coronary angiography.) --- 3268,3281 ---- and HL7 is not used for e-mail.

!

Constraints may be applied on the media types whenever a Display ! Data type is used, whether at the time of HL7 message specification, or for a given application conformance statement, and even in the RIM. For instance, suppose the Image Management SIG will eventually define a class "Image". This class Image would conceivably contain an attribute, "image_data", declared ! as Display Data. The IMSIG certainly would not want to see written text ! or audio here, but only images (and maybe a video clip of a coronary angiography.) *************** *** 3339,3345 ****

! Binary Data
--- 3287,3293 ----

! Binary Data (BIN)
*************** *** 3350,3356 **** PRIMITIVE TYPE
!
The data component of the Free Text data type is not a character string but a block of raw bits. ASN.1 calls this an "octet-string," which is the same as a "byte-string." The important point is that the byte string would not be subject --- 3298,3304 ----
PRIMITIVE TYPE
!

The data component of the Display Data type is not a character string but a block of raw bits. ASN.1 calls this an "octet-string," which is the same as a "byte-string." The important point is that the byte string would not be subject *************** *** 3429,3436 ****

We will define a code for compression algorithms.

We recognized that there will be a reference data type defined to ! be used alternatively for huge data blocks. Should the free text type ! be allowed to be replaced by a reference, or should it contain a reference?

Video streams do not fit into a single message, an external stream --- 3377,3384 ----

We will define a code for compression algorithms.

We recognized that there will be a reference data type defined to ! be used alternatively for huge data blocks. Should the Display Data ! type be allowed to be replaced by a reference, or should it contain a reference?

Video streams do not fit into a single message, an external stream *************** *** 3729,3735 ****

!
Code Value

A code value is exactly one symbol in a code system. The meaning of the symbol is defined exclusively and completely by the code system --- 3677,3683 ----
!
Code Value (CV)

A code value is exactly one symbol in a code system. The meaning of the symbol is defined exclusively and completely by the code system *************** *** 3837,3843 ****
The above conversion rule allows to build concise messages with code values, just like the HL7 v2.x ID data type allowed one to do. !
3.2.1 Outstanding Issues

The code system obviously is by itself a technical concept identifier. If we are going to use the --- 3785,3791 ----
The above conversion rule allows to build concise messages with code values, just like the HL7 v2.x ID data type allowed one to do. !
Outstanding Issues

The code system obviously is by itself a technical concept identifier. If we are going to use the *************** *** 3911,3922 **** from the version id used by the other organization.

Unregistered local coding schemes have been the cause of a lot of ! trouble in the past. Laboratories, whose main concern is not HL7 update their code system ids quite frequently and without caring for backwards compatibility. This places a lot of burden on the shoulders of HL7 communication system managers. This burden would not be easier, but heavier, if every ideolectic coding scheme that changes ever so ! often would have be registered with HL7.
The answer could be to say that locally defined coding systems do not have any meaning outside the defining organization. Thus, there is --- 3859,3870 ---- from the version id used by the other organization.

Unregistered local coding schemes have been the cause of a lot of ! trouble in the past. Laboratories whose main concern is not HL7, update their code system ids quite frequently and without caring for backwards compatibility. This places a lot of burden on the shoulders of HL7 communication system managers. This burden would not be easier, but heavier, if every ideolectic coding scheme that changes ever so ! often would have to be registered with HL7.
The answer could be to say that locally defined coding systems do not have any meaning outside the defining organization. Thus, there is *************** *** 3931,3944 **** would be a digit. We can loosen this constraint a little bit by saying that every code system name starting with "99" be local.
3.3 Real World Concepts

The old CE data type and its interim proposed successors (with various names LCE/CWE and CE/CNE) were basically one pair of Code Value plus a free text string that could be ! used to convey the original text in an uncoded fashion.
The new data type for real world concepts is essentially a generalization the CE. The Concept Descriptor is defined as a --- 3879,4336 ---- would be a digit. We can loosen this constraint a little bit by saying that every code system name starting with "99" be local. + +
3.2.1 State of a State Machine
+ +
One particular kind of technical concept identifier will occur very + often in HL7 messages: state. Since the HL7 version 3 message design + methodology bases the definition of messages on State-Transition + models, the communication of state attributes will be standardized and + stylized. + +
The notion of a State of a State-Machine will not be defined here + in all detail, instead we refer to the HL7 Message Development + Framework, to the Unified Modeling Language Specification, and to a + vast amount of literature on that matter. Note that the study of + Automata (State-Transition-Models) is one of the oldest areas of + Computer Science and a basic part of computer literacy. + +
Objects have identity and state. Identity is fixed by an identifier + attribute of an object (or a reference to an object). An object is in + one and only one state at any time. The state is the total of all the + current values of attributes and all the current associations to other + object. Thus, generally speaking, state is far more than could be + represented in one state variable; in other words, the state of an + object is everything but its identity. + +
A State-Transition model often focuses at certain distingushed + features of an objects possible states. Thus, in a more narrow sense, + state variables explicitly capture those states of an object that are + defined in the State-Transition model of a class. Every state of a + State-Transition model stands for an entire class of actual states + that objects might go through in their life-cycle. + +
Many of such states defined by a State-Transition model will have + certain constraints that constrain the attributes and association that + must exist or that may not exist for an object in that defined state. + +
In the following we will use the term joint state to talk + about the overall state of an object according to a State-Transition + model. Note that at any given time an object is in one and only one + joint state, independent of the details of the State-Transition models + (e.g., no matter whether there are parallel sub-state-machines, or + nested state's used.) + +
We will use the term partial state to refer to the + sub-states that a State-Transition model distinguishes + individually. An object can be in multiple partial states at the same + time. The total of all partial states that are effective for an + object at any given time is the joint state for that object at + that time. Note that, generally speaking, all properties of an object + can be considered partial states, however, here we call partial states + (proper) only those partial states that are defined in the + State-Transition model.
+ + + + + +
+
Figure 3: Example State-Transition model.
+ +
For a very simple State-Transition model in UML there may be no + difference between partial states and joint states. However, in UML + concurrent State-Machines partial states are different from joint + states. For example, an order may be in the states new, + in-progress and done, as shown in Figure 3. At the same + time any order may be active or on-hold. Suppose that + transitions to put an order on hold are considered independent from + the other three possible states of an order. In that case, the joint + state of the order is described by mentioning one partial state of + {new, in-progress, done} and one of the states + {active, on-hold}. The set of all possible joint states + would be the cartesian product of the two sets of states:
+ +
+ + + + + + + +
new active
new on-hold
in-progress active
in-progress on-hold
done active
done on-hold
+
+ +
There is another variation of the term "state" distinguished by + UML: composite state (or nested state) vs. simple + state. Composite states are more coarse-grained states that one + may want to distinguish because a transition may be applicable to each + of the component-states nested within the composite state. + +
For example, one may want to allow an order in both of the states + new and in-progress to be interrupted. So, one might + define another state: interrupted and one transition from each + of the states new and in-progress. To express that there + is really no difference betweem new and in-progress for + the purpose of interrupting, one can define a super-state, e.g., + called not-done, to nest both new and + in-progress. Thus, only one "interrupt"-transition would be + used from the super-state. + +
State-Transition diagrams that use nested states are easier to read + and comprehend, since they provide abstratctions and generalizations + and thus reduce the number of similar transitions. However, the + information about super-states does not need to be mentioned + explicitly, since it is always implied by its component state. In our + example, if either new or in-progress is effective, we + know that the super-state not-done is also effective. Thus, + explicit information about super-states is always redundant. + +
Alternatives for designing a data type for state.
+ +
ISO 11404 (language-independent data types) defines a data type for + state. However ISO defines the state as a simple enumeration of state + code. Thus you could only communicate one symbol per joint state in a + variable of that type. If you have multiple parallel state machines, + in other words, if multiple partial states would be effective at the + same time, you would need to precoordinate the list of parallel state + codes. + +
Precoordination of the table of state codes for any given + class has its merits. With a precoordinated code, you know that any + given value is actually legal. Conversely, for a postcoordination of + codes, you do not know whether you have a legal combination unless you + explicitly test for it. In our example, in a precoordinated joint + state code you were sure noone could utter a state that at the same + time includes both in-progress and interrupted. + +
Precoordination, however, defers the burden to the time when the + information needs to be interpreted. A precoordinated code requires a + table that helps to separate the different partial states from the + joint state code. Even small changes to the state transition model may + entail a number of joint state codes to be added or taken away from + the table. On the other hand, if the processing of those state codes + were in reality based on a table, there is a lot of built-in + flexibility, since a table driven processor should continue to work + properly as the driving table is updated. So, a precoordinated state + code with one entry per joint state is a good choice. + +
Obviously the opposite of precoordination is + postcoordination and thus, we could define the state data type + as a vector of partial state code. If the possible partial state codes + can be factored into multiple orthogonal axes, it makes sense to label + each of the components of that vector of partial states with some + descriptive name, in other words, to represent state as one record + of joint states. + +
A related alternative to representing the joint state in one + attribute of a record type would be to allow the state to be expressed + in multiple attributes. An example for this is Wayne Tracy's + Clinical_document_header class with the four attributes completion + status, availability status, authentication status, and storage + status. Wayne's approach is currently not conformant with the MDF + style, however, Wayne's approach existed before the MDF style and that + has the honor of the elder, meaning it can not simply be dismissed as + a style guide violation. However, in the following I will stick to the + notion of a single state variable per object. + +
In a postcoordinated code for states the question arises what to do + with composite states. As noted above, composite states need + not be sent in a message since they are always implied by their + component states, thus, composite states are, strictly speaking, + redundant. However, just as mentioning the generalized composite + states in a State-Transition model simplifies definition of the model, + having the generalized states on hand might simplify the processing of + state information. Indeed, if all a given application is interested in + is a super-state to be effective, it is simpler to check for the + existence of that super-state flag in a collection of state flags, + rather than having to test for every possible sub-state flag. + +
In our example, the diagram says that the transition "interrupt" is + possible from the super-state not-done that encloses the + sub-states new or in-progress. It would be convenient + for an application to test whether not-done is among the set of + state flags in the state variable (one test), rather than to test + whether the either state new or in-progress is effective + (two tests). + +
The postcoordinated approach with explicit super-states also + simplifies seamless evolution. The following evolutionary developments + of State-Transition diagrams are supported:
+ +
+
Refinement of a state to include sub-states. This is probably + the most likely development. The scenario is that some applications + will know earlier than others that the state not-done would + have turned into a super-state containing new and + in-progress. Since the not-done state flag will be + continued to be sent in the state variable, old applications continue + to work, if they ignore the unknown state flags. Ignoring the unknown + state flags is quite natural, since one would rarely iterate over all + state flags in the state variable, rather than testing whether + particularly known state flags of interest are within the set. +
+ +
"Recoarsement" (antonym of "refinement",) i.e. turning a + super-state with sub-states into a state without sub-states. This is + probably quite rare. It could occur if a we had an over-design in a + State-Transition model, providing features that nobody wants to use + and that cause more confusion than benefit. In this scenario, the + not-done state that had sub-states will turn into a state + withgout the sub-states. Since most (if not all) applications in this + scenario never asked for the sub-states and only tested for the + super-state, they will not even notice that the sub-states are no + longer defined in the model. +
+ +
Introduction of a super-state. In our example, suppose our + state-transition diagram started without the not-done state and + two "interrupt" transition were defined from both new and + in-progress. The model would later be simplified to include + the state not-done with only one transition named "interrupt". + Note that the introduction of super-states is a very mild change, and + properly designed applications that conformed to the old model will + also be conformant to the new model. However, old applications + would not send the super-state flag explicitly in their state + variables, which could lead to problems with new applications that do + rely on that state-flag to be sent. +
+ +
Introduction of parallel sub-state-machines. In our example, + suppose our State-Transition model did not contain the active - + on-hold sub-state-machine. The introduction of the new + parallel states will introduce new state flags in the state variable, + but applications that do not depend on those states will just ignore + them. In the reverse direction, new applications that do handle the + parallel state-machine, need to assume a default state active + if not otherwise mentioned.
+
+ +
Conversely, the pre-coordinated status code would have changed + significantly with every of the above changes and the kind of + flexibility we have with the post-coordinated code could be achieved + only with an intermediary table for interpretation and mapping between + message status codes and application status codes. + +
I have some UML issues that reinforce me to recommend + a little un-dogmatic UML modeling style, which however is not a big + difference. In UML a tranbsition from a super-state to one of it's + internal sub-states is not defined. Rather UML suggests to use nested + initial pseudo-states. However, this requires to explicitly mention + both states active and on-hold which is really + redundant. Having both states in the model is redundant because + active is considered just the negation of the on-hold + state and does not add any functionality or clarity to the model. The + evolution is easier if on-hold would just be added as a new + feature and the default being automaticly active, if on-hold + not being mentioned. + +
Finally another alternative is to use a post-coordinated state + code without mentioning super-states. On the first glance, the + above-mentioned evolution paths rely on the super-state information to + be sent. However, one tiny step of indirection in the interpretation + of the state variable would open the same evolution path for the + minimal set of state flags. + +
Remember that states are essentially predicates or assertions about + objects. The named states, e.g., new will be used in predicate + statements such as: "if state is new + do stuff," or more formally: "if new(state) do + stuff." How would those predicate tests be implemented? + +
If we had a precoordinated state code, or if we had only one state + flag at a time, the program would ask whether the current state equals + some state to test for: + +
+ +
+ + If you have to test for the state not-done if it is not sent + explicitly you need to do + +
+ +
+ + If not-done is sent explicitly, the state variable can not be + just one code but a set of state flags. That is, the test would look + like + +
+ +
+ + if the state variable were a set and super-states, such as + not-done were not mentioned, you had + +
+ +
+ + or alternatively (with * being the intersection operator) + +
+ +
+ + now, even if super-states would not be mentioned explicitly, we could + use a table of constants that let the application work the same no + matter whether super-states are mentioned explicitly or not: + +
+ +
+ + The advantage of this method is that your application code is + invariant to whether states are represented explicitly or not. + In addition one can test for special state constellations such as + in-progress AND on-hold: + +
+ +
+ + As a conclusion, it seems to be very flexible to assume state variable + uniformly to be a set of state flags and to test for state flags + indirectly through intersections with "mask" sets testing for the + non-empty set (OR) or equality with the mask (AND). + +
In the same way one can conduct checks for the state variable to + represent a legal state, e.g., to test for either new + or in progress to be effective, but not both: + +
+ +
+ +
The set operations as shown in the above examples seem to require + special programming language support, however, in fact they do + not. Sets in Pascal or MODULA 2 are nothing but bit-fields, and the + intersection operator is nothing but the bit-AND operation on bit + fields. Thus this mechanism is implemented with ease on any + programming language such as C, BASIC, you-name-it. + +
To summarize the above discussion we have found: +
+
that a pre-coordinated state code enforces only legal states to + be communicated, but interpretation and evolution is difficult and + requires a table to interpret and map state codes to something the + application can handle;
+ +
that a redundant post-coordinated state code, that sends + super-state information is easy to handle and allows for smooth + evolution and interoperability between applications with a different + interest in the details of a state-machine;
+ +
that a post-coordinated state code that does not send + super-state information is even more flexible given that state + predicates are tested based on state "masks" that can be defined + in a simple table. +
+ +
that a pre-coordinated state code will always fit in a single + code value;
+ +
that a post-coordinated state code will rarely fit in a single + code value and treating it as a set up-front is a requirement for the + discussed evolution rules;
+ +
that a post-coordinated state code can alternatively be sent in + a record of state variables or in multiple state variables, in which + case the described flexibility of evolution and interpretation is + lost. [There are ways to consolidate multiple state variables in an + application, but that is more complex for the sole reason to have + multiple state variables in the RIM.]
+
+ +
No decision has been made as of yet. My proposal is to: +
+
Define a data type called "State" which makes the actual state + representation opaque to the application layer. I don't want to bother + the domain TCs with this "CV or SET" discussion.
+ +
Stick to the MDF rule of one state variable and try to pursue + Wayne that this would work for his part of the standard. However, wait + with making the final decision until Wayne has agreed to the + harmonization proposal to merge his four state-variables into + one. Wayne has the right of the elder here.
+ +
Use the non-redundant post-coordinated state representation and + propose to implementors to test for states uniformly using + "masks". Alternatively to go to the redundant post-coordinated + alternative, if opposition gets too nervous.
+
+ +
3.3 Real World Concepts

The old CE data type and its interim proposed successors (with various names LCE/CWE and CE/CNE) were basically one pair of Code Value plus a display data string that could ! be used to convey the original text in an uncoded fashion.
The new data type for real world concepts is essentially a generalization the CE. The Concept Descriptor is defined as a *************** *** 4005,4011 ****

! Concept Descriptor
--- 4397,4403 ----

! Concept Descriptor (CD)
*************** *** 4047,4053 **** original text ! Free Text --- 4439,4445 ---- original text ! Display Data *************** *** 4070,4076 ****

! Code Translation
--- 4462,4468 ----

! Code Translation (CDXL)
*************** *** 4151,4157 **** quality ! Floating
Point
Number
[0..1] --- 4543,4549 ---- quality ! Real
Number
[0..1] *************** *** 4178,4184 ****

! Code Phrase

--- 4570,4576 ----

! Code Phrase (CDPH)

*************** *** 4559,4564 **** --- 4951,4963 ---- of ways for people to abuse its power and hardly any idea about how to use the power properly. +
Note that from the SNOMED camp there is probably support for an + even more complex definition of the Code Phrase that would basically + be a keyword-value structure containing small conceptual + graphs. [cf. Spackman KA. Compositional concept representation using + SNOMED: towards further convergence of clinical terminologies. Proc + Annu Symp Comput Appl Med Care. 1998 Oct. p. 740-4.] +
3.4 Technical Instances
*************** *** 4747,4753 ****

! Technical Instance Identifier
--- 5146,5152 ----

! Technical Instance Identifier (TII)
*************** *** 4890,4896 **** !

Figure 3: The the hierarchy of ISO Object Identifiers and how it could be used by HL7.

--- 5289,5295 ---- !

Figure 4: The the hierarchy of ISO Object Identifiers and how it could be used by HL7.

*************** *** 5043,5060 ****
3.4.3 Technical Instance Locator
!
Another data type of technical instance identifiers is dereferencable ! identifiers, or "locators". The Technical Instance Locator (TIL) is ! shaped similar to Universal Resource Locator (URL). That ! is TIL has the two components protocol and address where ! the format of address would be determined only by the ! protocol. Telephone number, e-mail address, and the locator for the ! reference pointer type would be of this data type.

! Technical Instance Locator
--- 5442,5460 ----
3.4.3 Technical Instance Locator
!
Another kind of data type for technical instances is the Technical ! Instance Locator (TIL), which is a dereferencable identifiers, ! reference, or (technical) address. The Technical Instance Locator ! (TIL) is shaped similar to Universal Resource Locator ! (URL). That is TIL has the two components protocol and ! address where the format of address is determined by the ! protocol. Telephone number, e-mail address, and the locator for an ! image reference pointer would be of this data type.

! Technical Instance Locator (TIL)
*************** *** 5156,5162 **** :address "+13176306962") -
3.4.4 Outstanding Issues

We will still define as successor of the reference pointer (RP) to --- 5556,5561 ---- *************** *** 5164,5176 **** the thing that is referred. This would also include an expiry date after which the locator can not be expected to be usable.
3.5 Real World Instances
!
We refer to things in the "real world" generally by giving them names. Assigning names to people, things and places are a public acts: the more people know some name, the more will later understand what is ! meant by some name. In archaic cultures, knowing the name of something meant having some power over it. Indeed, knowledge is power and without a name, we can not talk about things, we can barely think of things, and we can not collect knowledge about them. The record --- 5563,5586 ---- the thing that is referred. This would also include an expiry date after which the locator can not be expected to be usable. +
The use of the TIL for phone numbers needs more explanation and + rationale. + +
The TIL may need to be wrapped in a History. + +
The TIL may need some "use code", to capture the qualifiers + "business", "home", "cellphone", etc. for phone numbers. How does this + "use code" generalize to other communication addresses? Why is it + needed? + +
3.5 Real World Instances
!
We generally refer to things in the "real world" by giving them names. Assigning names to people, things and places are a public acts: the more people know some name, the more will later understand what is ! meant by that name. In archaic cultures, knowing the name of something meant having some power over it. Indeed, knowledge is power and without a name, we can not talk about things, we can barely think of things, and we can not collect knowledge about them. The record *************** *** 5224,5244 **** locations tend to be extremely stable over a long period of time determines the structure of the address kind of names. Addresses determine locations by stepwise refinement of a scope (country - city ! - street - house - floor). Most scope-name has all the characteristics ! of names, i.e. arbitrarily assigned, non-descriptive, not ! unique. Apart from scope refinement all kinds of spacial descriptors ! can be part of an addres (e.g. right hand side, opposite side.)
3.5.1 Real World Instance Identifier
-
Note: This section is a proposal of the Data Type working - group and still needs to be negociated with PAFM. -
External identifiers for real world people and things occur ! frequently. Examples for people identifiers are Social Security ! Number, Driver License Number, Passport Number, Individual Taxpayer Identification Number. Identifiers for organizations are, e.g., the federal identification number or the Employer Identification Number. The current approach in the RIM is to use the Stakeholder_identifier --- 5634,5652 ---- locations tend to be extremely stable over a long period of time determines the structure of the address kind of names. Addresses determine locations by stepwise refinement of a scope (country - city ! - street - house - floor). Most scope-names have all the ! characteristics of names, i.e. arbitrarily assigned, non-descriptive, ! not unique. Apart from scope refinement all kinds of spacial ! descriptors can be part of an addres (e.g. right hand side, opposite ! side, north, east, etc.)
3.5.1 Real World Instance Identifier

External identifiers for real world people and things occur ! frequently. Examples for people identifiers are Social Security Number ! (SSN), Driver License Number, Passport Number, Individual Taxpayer Identification Number. Identifiers for organizations are, e.g., the federal identification number or the Employer Identification Number. The current approach in the RIM is to use the Stakeholder_identifier *************** *** 5247,5262 ****
Here are some of those identifiers used in the U.S.
!
SSN used as a legal individual person identifier
ITIN (Individual Taxpayer Identification Number), like an SSN but ! issued by IRS for aliens not eligible for an SSN. !
EIN (employer identification number) used by IRS for organizations !
FIN (Federal Identification Number?) for corporations !
DLN (Driver License Number). U.S. driver licenses are issued by ! the states. Driver licenses in the U.S. are used as identity cards. !
The "Universal" (meaning "U.S.American") Health Identifier - if ! it will ever come. !
Health Care Provider Identification Number (?)
Passport Number
--- 5655,5683 ----
Here are some of those identifiers used in the U.S.
!
Social Security Number (SSN and ITIN) - for U.S. persons; ! !
Employer Identification Number (EIN) - for U.S. corporations; !
ITIN (Individual Taxpayer Identification Number), like an SSN but ! issued by IRS for aliens not eligible for an SSN; ! !
Driver License Number (DLN) - for U.S. residents, are issued by ! the states, U.S. are used as identity cards. ! !
HIPAA Provider Identification Number - for U.S. healthcare provider ! !
HIPAA "Universal" (meaning "U.S.American") Health Identifier - if ! it will ever come. ! !
Inventory Numbers - for desks, computers, and coffee makers in ! everyone's office ! !
Credit Card Numbers - for people and their CC accounts ! !
Medical Record Numbers - for a patient as the subject of a medical ! record !
Passport Number
*************** *** 5270,5665 **** pretty reliable person identifier. Banks and employers must collect the SSN of their customers and employees (resp.) for tax purposes. !
However, there are other such identification numbers, not issued ! for persons. Those numbers have basically the same semantics and the ! same requirements, except that those numbers might be assigned for ! real world instances other than people or organizations. Examples are things, such as devices and durable material (inventory numbers), lot numbers, etc. !
The public health / animal proposal, for example, has a concrete ! need for the following identification numbers: !
!
lip tattoo - horses !
leg tattoo - dogs !
ear tags - food animals !
microchips - all species !
breed registry number - dogs !
jockey club - thoroughbred horses !
quarterhorse association !
US trotting association !
Holstein association regsitry - cows !
!
Such real world instance identifiers are assigned not only by big ! organizations but also by smaller organizations. For example, ! virtually every organization puts tags with numbers on their ! inventory. !
Medical Record Numbers (MRN) as used in the world of Paper Medical ! Records are another example for such real world instance ! identifiers. Note that in the computer world, we would not need MRNs, ! since we could use Technical Instance ! Identifiers (TII) to refer to computerized medical ! records. However, Wes Rishel and I think that as a rule of thumb, TIIs ! should not be communicated through human middlemen in order to keep ! reliability in their correctness high. Thus, as long as MRNs are typed ! in by clerks and other people, one should separate them from TIIs. -
The basic structure of such a real world instance identifier is: !
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
value CharacterString the identifier value itself
validity period Interval OF PointInTime covers effective date and expiration, begin and end date/time, ! etc.
kind Code Value A rough classification telling you what kind of identifier ! this is (e.g. SSN, DLN, Passport, inventory, etc.) !
assigning authority ? An organization that has authority over and issued an identifier.
name space ? An organization may maintain ! different name spaces without necessarily creating organizational ! subdivisions. Thus one assigning authority may maintain multiple name ! spaces. !
!
The main methodological question is how we represent the identifier ! assigning authority. This would usually be an organization, and hence ! would an issuing authority be represented by an association to the ! Organization class. This is basically what the Stakeholder_identifier ! class does in RIM 0.88. ! !
However, this is also a problem. We are able to carry quite a lot ! of information about the identifier assigning authority, which is ! good. But the structure is rather complex, which is bad. Particularly, ! while we all know that SSN, DLN, etc are issued by organizations, we ! do not care so much about that organization. The only thing we want to ! know is that a given number is an SSN. ! !
However, things become tricky if we try to shortcut. The problem is ! that SSN and DLN are valid in realms defined by the issuing ! authorities. For example, for a DLN we need to know the state. For an ! SSN in an international context, we need to know the country. ! !
With a mandatory link to an assigning authority, an Indiana drivers ! license would be represented as having the "Indiana Bureau of Motor ! Vehicles (BMV)" as an issuing authority. This is troublesome because ! someone in California might not know that there is a BMV in ! Indiana. The BMV, of course, is an affiliate of the state of Indiana, ! but communicating this as a super-organization may be too much. In ! international contexts do, we would have to go once more through the ! stakeholder-affiliate loop so that the receiver can find out that ! Indiana is actually a part of the U.S. While this may be the correct ! solution, it seems to be rather impractical. !
The following principle options exist: !
!
Association with stakeholder (or organization) as the assiging ! authority. A clean, but somewhat verbous heavy weight way, as ! described. ! !
! ! ! !
Real World Instance Identifier (RWII)
value CharacterString
authority reference to Organization
! !
In this alternative we pointing out to an Organization class ! instance from inside the data type? This is a weird construct that we ! have never seen before in the world of the RIM vs. Data Types ! dichotomy.
! !
The Organization as an assigning authority would itself have ! one or more RWIIs. Thus, one represent the assigning authority ! recursively as a RWII. ! !
! ! ! !
Real World Instance Identifier (RWII)
value CharacterString
authority RWII
! !
This is a specific way to make the reference to an assigning ! authority Organization, i.e. by looking up the organization through ! its RWII.
! !
An OID for assigning authority, which structurally renders the ! RWII similar to the TII but with a very different ! semantics. ! !
! ! ! !
Real World Instance Identifier (RWII)
value CharacterString
authority ISO Object Identifier
! !
This alternative, while structurally similar to the TII is in fact very different. The TII is ! supposed to be globally and dependably unique. This dependable ! uniqueness, can not be required from real world identifiers, that are ! ofthen reported orally or on paper. Morover, such numbers are often ! reused either accidentially (roll-over of counters) or voluntarily ! (old number considered outdated).
! !
The traditional way to represent assiging authority would be ! through a single "code" from some "master table"
! !
! ! ! !
Real World Instance Identifier (RWII)
value CharacterString
authority CharacterString
!
!
Options 3 and 4 are seemingly simple but they do lead to ! practicability problems: They don't scale. The OID is pseudo-unique ! and not meaningful (e.g. what is the OID of the state of Indiana?) In ! both options 3 and 4 you have to interpret the authority part from ! some unknown table or directory. This would not be a real problem if ! RWIIs would only be such official things as SSN, ITIN, EID, FID, DLN, ! etc. But the traditional medical record numbers are assigned ! locally. Also Inventory numbers for devices are assigned locally. ! !
Options 2 through 4 use various schemes of forreign keys to refer ! to organizations, which violates the MDF rules that forreign keys must ! be turned into explicit associations. Alternative 1 is principally ! open to whether or not forreign keys are used, but if Datatypes are ! considered different from RIM classes the question is how such an ! association from a data type to a RIM class could be made? ! !
Regardless whether the MDF deprecate forreign keys, this identifier ! data type "wants to be a forreign key" (as Mark Tucker puts it.) ! Indeed, this data type embodies the fact that we use "keys" in order ! to refer to things accross (foreign) models. ! !
Mark Tucker further offered the following "trick" to make ! alternative 4 useable and - to a certain extent - interoperable: ! People could use use local codes for assiging authorities within their ! usual communication horizon, assuming that master tables would be ! synchronized. For outside communication, a "row" of such a master ! table could just be included in the message. This master table row ! would be used to map "strings" to "things". ! !
This allows for very short forms of identifiers, which is ! good. Conversely, representing assiging authority as an Organization ! instance (alternative 1) would lead to ugly lengthy messages. ! !
However, two problems arise: ! !
It is not guarranteed that the strings for assigning authorities ! wouls be unioque within a message. ! !
How would we represent this "master file" construct? ! !
The Stakeholder hierarchy basically is such a master file ! structure. Thus the question is why we would represent associations to ! "master" stuff differently for this data type than for all other RIM ! classes? ! !
There is no easy way out of this dilemma, which suggests to put ! this Real World Instance Identifier "data type" as a class directly ! into the RIM. This allows the "data type" to associate with other ! classes, such as organization. From this "data type" we can define ! CMETs and we can implement those on ITSs however we like, i.e. we do ! not have to rely on a stereotypic automatism to derive lengthy ITS ! representations when a short form would be more exonomical and more ! pleasing to the "look and feel" of the message. ! !
There is a number of RIM changes pending that need a discussion and ! vote jointly with PAFM and CQ in the upcoming HL7 meeting (Toronto.) ! Figure 4 shows the structure around ! Stakeholder_identifier as of RIM 0.88. ! !
! ! !
Figure 4: Stakeholder_identifier as of RIM 0.88t.
!
The changes in detail are as follows:
!
PAFM
! !
PAFM (Richard Ohlmann) suggested to pass Stewardship of the ! Stakeholder_identifier class over to Control/Query. ! !
Rationale: this class will undergo a broadening of scope. PAFM ! therefore no longer has to take the burden of maintaining this class ! for everyone else. That's what Control/Query is for.
! !
CQ
! !
Rename class "Stakeholder_identifier" to ! "Real_world_instance_identifier". ! !
Rationale: to signify the broadening of this classe's scope.
! !
Rename attribute "id" to "value" in order to disambiguate this ! attribute from a technical instance identifier. ! !
Assign Data Type Character String (ST) to the attribute "value". ! !
Rename Attribute: "effective_dt" to "validity_period". ! !
Assign Data Type: "Interval of Point in Time" to attribute ! validity_period. ! !
Delete Attribute: termination_dt ! !
Rationale: the two attributes effective_dt and termination_dt were ! used to signify the validity period of the identifier. A period of ! time can more properly (and more compact) be represented by the new ! data type Interval of Point in Time. This allows for infinite as well ! as unknown begin and termination dates.
! !
Delete Attribute: issued_dt ! !
Rationale: it is unclear why date of issuing differs from effective ! date. There seems to be no usecase to me (PAFM folks: please confirm ! or defend!)
! !
Delete Attribute: qualifying_information_txt. ! !
Rationale: the use of this attribute is in part taken over by ! "namespace". Where it is not handled through namespace different ! assiging authorities should be used. This prevents the same ! information to be representable in different ways.
! !
Rename class: "Identifier_assigning_authority" to ! "Identifier_namespace"
! !
Definition: A list of identifiers owned and managed by an ! organization stakeholder. An organization that manages a name space is ! an identifier assigning authority.
! !
Remove all attributes.
! !
Rationale: This is no longer a role-class. Nobody could define the ! use case of the old role-class and the begin/end time attributes. It ! seems to have been created as modeling stereotype that was not uesful ! in practice.
! !
Add attribute "name" of type Character String (ST).
! !
Definition: The name of a namespace is a symbol that might be used ! as a short form for the namespace in messages. This accomodates the ! practice that assigning authorities are just kept in a table of ! symbols, without attaching any real information about the ! organization.
! !
Change role-names and multiplicities as shown in Figure 5.
! !
PAFM
! !
Move Attribute: citizenship_country_cd from Person to Stakeholder. ! !
Rationale: in an international use context of HL7 it is necessary ! to keep track of the "citizenship" of organizations as well as of ! individual persons.
!
Rename Attribute: "citizenship_country_cd" to "citizenship_cd". !
Rationale: A shorter name is easier to read, write, speak and memorize.
!
Delete Attribute: "nationality_cd" !
Rationale: The difference between citizenship and nationality is ! unclear, did not exist in HL7 v2.x, and thus, can be deleted.
!
PAFM
!
The following are suggestions for simplification of the stakeholder ! affiliation loop. These changes are not essential to the Control Query ! related requirements. Nevertheless, since the stakeholder affiliation ! loop would be used by all of Control Queries "customers" we have an ! interest in this to be as cumberless as possible. !
!
Move Attribue: "family_relationship_cd" from Stakeholder_affiliate ! to Stakeholder_affiliation. !
Reroute Association: from Stakeholder_affiliation ! "secondary_participant" to attach directly at Stakeholder. !
Delete Class: Stakeholder_affiliate. !
Rationale: This additional relationship class on the "secondary" ! leg of stakeholder affiliate was primarily a modeling stereotype of ! little known practical use. The familiary relationship can as well be ! carried by the stakeholder affiliate class where applicable. This ! leads to a model that is simpler to use and simpler to understand ! while maintaining the same level of expressiveness and explicity.
! !
Delete Association loop "subdivision" at Organization. ! !
Rationale: this subdividing of organizations is a kind of ! "affiliation" relationship, which would also be expresed by the ! "Stakeholder_affiliation" class. There should be only one way of ! expressing affiliations (including ! subdivision). Stakeholder_affiliation.family_relationship_cd should ! have a value reserved for subdivision of organizations. Note that ! affiliation_type code is to express the "purpose" of a particular ! affiliation (e.g. emergency contact), while family_relationship is the ! durable relationship between stakeholders throughout all purposeful ! affiliations.
! !
Others
! !
21.New Association: classes that would have a real world ! instance identifier, such as, "Durable_medical_equipment" should be ! associated to the Real_world_instance_identifier class. To exemplify ! that the new class can be used not only to identify stakeholders but ! also things and animals.
! !
We can also reuse this data type in order to put the identifiers ! for stakeholders in their proper place in the model, instead of ! pushing them all up into the highest level of the hierarchy, i.e. the ! Stakeholder class.
!
!
The following diagram shows the effect of the proposed changes. ! !
! !
Figure 5: The Stakeholder_Identifier has become the "Real ! World Instance Identifier" and is thus useful for other things, such ! as the inventory number of medical devices.

!
This is basically a stepwise RIM change as would be required for ! Harmonization. We will discuss this with PAFM and other affected ! technical committees at the next HL7 meeting (Toronto). --- 5691,6053 ---- pretty reliable person identifier. Banks and employers must collect the SSN of their customers and employees (resp.) for tax purposes. !
While many of such identifiers are assigned to people and ! organizations, what characterizes those numbers is not what they are ! assigned to, but who assigns them, how they are assigned, and how they ! are used. There is a need for such numbers to be assigned to real ! world instances other than people or organizations. Examples are things, such as devices and durable material (inventory numbers), lot numbers, etc. !
The following challenges exist for exchanging real world instance ! identifiers: !
!
"Communication Horizon" - if you communicate an identification ! number in-house, there is usually good understanding and no ambiguity. ! For inter-institutional communication there is possible ambiguity in ! the primary identifiers and the secondary identifiers for assigning ! authorities. !
Information about assigning authorities is relevant or irrelevant ! depending on the scope of a message. !
Systematizing identifier types and usage in an international ! context is difficult. !
!
Organizations as assigning authorities
!
The following kinds of organizations assign real world instance ! identifiers: !
!
National governmental agencies (e.g., SSN, HCFA provider ID) !
State/Province governmental agencies (e.g., DLN) !
Professional organizations (e.g. AMA) !
Insurers, Banks, Credit Card Companies (e.g., Kaiser, BC/BS, VISA ! for their customers) !
Health provider organizations (e.g., Hospital Medical Record ! Numbers, Inventory Numbers.) !
Departments and other sub-organizations (e.g., special MRN rings for ! stat assignments.) !
non-formal units or task forces within an ! organization.(e.g. clinical trial enrollment number) !
!
Considering health provider organizations (as the main users of HL7 ! messages,) we can distinguish three general cases where the assigning ! authority is treated slightly different:
!
National and state agencies' numbers are "well known," e.g. nobody ! ever wants to see the address and phone # of the U.S. Social Security ! Administration (SSN) or the Indiana Bureau for Motor Vehicles (DLN) in ! an HL7 message.
! !
Moreover, the identifier types themselves are an "institution" much ! more important than the assigning authorities. For example, the SSN ! data field will often times contain Individual Taxpayer Identification ! Numbers (ITIN) that are compatible to SSNs but are assigned by the IRS ! rather than the SSA. The distinction between SSN and ITIN is tricky ! and mostly irrelevant for HL7 users.
! !
Professional organizations are usually treated as "well known." E.g., ! if you have a doctor's medical license number valid for the U.S., you ! don't need to communicate the details of the issuing organization ! (e.g. AMA.)
! !
Insurers, Banks and Credit Card Companies are "third party" ! organizations that are external to health provider organizations. This ! means, most HL7 messages will want to add some minimal information ! about the assigning authority as an organization because those third ! party organizations are neither "well known" nor do they belong to any ! one provider organization.
! !
Provider organizations and their sub-units. These are the issuers ! of the vast majority of numbers communicated in everyday messaging. ! For all "in-house" messages, the assigning authority is the same or ! closely related to the HL7 user. So, there is no need to communicate ! much information about that organization.
! !
For external communication, however, the assigning organization needs ! to be identified with more detail. Generally, the less routinely ! messages are sent to a particular external recipient the more detail ! information about assigning authorities is appreciated.
! !
Finally there are cases where the same organization assigns different ! numbers of the same type. For example, patient identifiers are issued ! for routine care, but the same health care organization runs several ! clinical trials where patients get separate identifiers or enrollment ! numbers. Thus, the same organization that runs different trials will ! want to build partitions of the overall set of assigned identifiers ! (sub-namespaces.)
!
!
Identifier types and their use
!
We intuitively know that there are different types of identifiers and ! that we want to keep track of the identifier type. The first ! identifier type that comes to mind in a U.S. context is the Social ! Security Number (SSN). This example shows two difficulties that any ! "typology" of identifiers runs into and must deal with: !
!
Semantics (meaning) and pragmatics (use) of one type of identifier ! may be completely different and not even related. For example, the ! meaning of the SSN is that it identifies every U.S. person's social ! security record. But the SSN is only in 5% (estimated) of all uses ! cases related to a person's social security matters. Much more often ! (40%), the SSN is used as a person's taxpayer's identification number ! (by the IRS or by withholding agents, such as employers, banks, or ! mutual fund/IRA services.) Most health provider organizations use the ! SSN as a pretty good national person identifier (40%). In addition ! all kinds of companies collect SSNs from their customers for various ! purposes.
! !
Identifier type concepts do not easily translate between ! different realms (e.g. countries.) Take Social Security Numbers (SSN) ! for example: most countries that have a nationally organized social ! security system will have social security numbers. However, as noted ! above, the purpose of collecting SSNs in the U.S. health care industry ! is not social security, but person identification. Germany has SSNs ! too, but nobody uses the German SSN as a general person identifier. ! German SSNs are exclusively used in communications with the German ! social security administration about genuine social security ! issues.
!
!
The same case can be made for the Driver License Number. In Europe, ! driver licenses are primarily used as a certification to run a motor ! vehicle, and thus in 90% of the cases shown to police officers and ! highway patrols. In the U.S. the situation is completely different: ! here, more than 50% of driver license checks occur in bars and night ! clubs to gain entrance and to be served alcoholic beverages. Another ! 20% of driver license are shown when people write checks. Another 20% ! fall on miscellaneous identity checks, while in less then 10% of the ! cases a traffic policeman will be the one to see your driver license. ! Clearly, in the U.S. driver licenses are identity cards. In Europe, ! people have government issued identity cards. However, the numbers ! are much less often recorded. ! !
In conclusion, designing a terminology of "identifier types" is ! difficult and has to account for the difference between what an ! identification number is and what it is used for. ! !
Naively one would like to post-coordinate identifier type and ! country/state code, however, as noted above an (SSN, US) is something ! completely different than an (SSN, DE), which means that identifier ! type and country are not really orthogonal. The better approach ! therefore seems to be to assign separate identifier types for each ! type and country of identifier, that is, to pre-coordinate the ! identifier type code. Thus the U.S. SSN would be uniquely identified ! and no other country's SSN would be assigned to the same type. An ! example of a completely pre-coordinated identifier type code is shown ! in the following table.
+ + + + + + + + + + + + + + + + + + + + + +
Examples of a pre-coordinated terminology of identifier types
code type country state issuer notes
001 SSN US national person identifier
002 DLN US AB Alabama
003 DLN US AL Alaska
004 DLN US AZ Arizona
... ... ... ... ...
053 DLN US WN Wisconsin
054 med. license US AMA License for U.S. certified Internists.
008 med. license DE. BW LGM Baden-Württemberg
009 med. license DE BA LGM Bayern
010 med. license DE B LGM Berlin
... ... ... ... ... ...
024 med. license DE SWH LGM Schleswig-Holstein
011 citizen + id DE the number on the + ID card (German "Personalausweis.")
012 citizen id DK
013 citizen id FR
... ... ... ... ... ...
123 patient-id any any any medical record number, requires issuing auth.
124 inventory any any any inventory number, requires issuing auth.
+ +
However, there is a downside to pre-coordinated non-hierarchical codes + with meaningless identifiers. While these codes comply to the + currently touted "good vocabulary practices," the administrative + systems that will be using those codes will not be able to make much + use from those identifier types. The problem is most obvious when it + comes to U.S. driver licenses or German medical licenses. These are + issued on a state-level (sub-national governmental agencies.) + Therefore, there are 50 codes for U.S. driver licenses and 16 codes + for German medical licenses. While this detail is rarely needed, the + simple test for "is this a driver license?" is much more difficult + than with a simple code "DLN" with the state post-coordinated. + +
Those will be the issues that need to be considered when defining + the terminology for identifier types. While they are not a core part + of this harmonization proposal, they do affect the current information + model design and this extended documentation is necessary for the + record. + +
Definition in the Information Model
+ +
The definition of the Real World Instance Identifier (RWII) is + based on a class by the same name in the HL7 Reference Information + Model (RIM.) This is so because there is an association between the + RWII and an organization as an "assigning athority" of the + identifier. This presents a methodological challenge: the RWII should + be available as a data type but the data type is associated with an + information model class. + +
The Unified Modeling Language correctly makes no difference between + an attribute's data type and a class, any class can be used as a data + type for an attribute. The HL& Modeling and Methodology Committee has + decided to accept the notion of a "DMET", that is a Common Message + Element Type (CMET) useable in the RIM as a data type. That way we + avoid a large bundle of associations connecting from every other class + to the RWII class. The following figure shows the new structure of the + RIM as of June 1999. ! !
! ! !
Figure 5: The ! Real_world_instance_identifier as an information model class. "Users" ! of this class may not associate to it but will refer to the RWII DMET ! as a data type, as shown in the Stakeholder class' "real_id" ! attribute.
!
Definition of the DMET
!
The DMET definition of the RWII data type is as follows !
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
! ! Real World Instance Identifier (RWII) DMET !
! ! An identifier for a "real world instance". A real world instance is ! any person, organization, provider, patient, device, animal, or any ! other thing that some organization recognizes and assigns an ! identifier to. Examples are Social Security Number, Driver License ! Number, Inventory Number, HCFA Provider ID, Medical Record Number. ! Typically, real world instance identifiers are assigned and reused ! outside of HL7 communication. These identifiers tend to be less ! reliable than Technical Instance Identifiers that are assigned and ! maintained exclusively by HL7 communication systems. Other classes ! use this class not by associations but by declaring attributes of type ! "RWII." !
component name type/domain optionality description
value_txt Character String mandatory ! The character string value of the identifier. For example the ! character string "123-45-6789" for a U.S. Social Security Number." !
type_cd Code Value mandatory ! A code representing the type of identifier. For example, codes to ! represent the US National Provider ID, US National Payor ID, US Health ! Care ID, medical record number, social security number. !
qualifier_txt Character String conditional ! Information used to limit the applicability of a real world instance ! identifier, such as the state or province in which the identifier is ! valid. Use and interpretation depends on the type_cd. !
valid_tmr Interval of Point ! in Time optional ! The time range in which the identifier is valid. May be undefined on ! either side (effective or expiration). !
assigned_by Organization (RIM class CMET) conditional ! The assigning authority of the identifier if not implicit in the ! type_cd. The Organization CMET used here is likely to be very ! terse. !

!
While the value_txt is always a mandatory part of a real world ! identifier, the qualifier_txt must, may, or must not be valued ! depending on the identifier type_cd. This is independent of whether a ! precoordinated or a postcoordinated identifier type coding scheme is ! used. As the above table suggests, there is no way to completely ! precoordinate identifier type codes when the issuer organizations are ! not "well known" (e.g., providers, insurers.) ! !
For example, the state of the U.S. driver license is either ! precoordinated in the identifier type_cd or it is post-coordinated in ! the qualifier_txt. The qualifier_txt can be used for patient ! identifiers to allow issuing authorities to maintain multiple ! namespaces (e.g., for multiple clinical trials.) ! !
The actual use of the real world instance identifier should not be ! coded in the type_cd but should be given implicitly through ! establishing many more attributes in a many classes that have the data ! type RWII (a DMET.) For example, rather than pushing all stakeholder ! identifiers up to the highest level, the Stakeholder class should have ! an identifier only for such identifiers as SSN, EIN, ITIN, passport ! number, person id. Medical record numbers (patient id) should be ! declared as an attribute of the Patient class. Provider license ! numbers should be declared in the Individual_health_care_provider ! class, etc. ! !
The identifier issuing authority is a conditional component of the ! real world instance identifier. The organization will not be ! mentioned in a message for "well known" issuers (e.g., SSN, DLN, etc.) ! The organization will be mentioned by a brief object stub for in-house ! communication. For third-party organizations and for inter-enterprise ! communication, there will be more information given for the issuing ! organization. ! !
Finally, it must be noted that technical instance identifiers (TII) ! are a much more economic structure to identify patients and things in ! HL7 messages for routine use. After external identifiers (RWIIs) have ! been exchanged once, follow-up messages should generally suffice with ! TIIs. ! !
Medical Record Numbers (MRN) as used in the world of ! Paper Medical Records are another example for such real world instance ! identifiers. Note that in the computer world, we would not need MRNs, ! since we could use Technical Instance ! Identifiers (TII) to refer to computerized medical ! records. However, Wes Rishel and I think that as a rule of thumb, TIIs ! should not be communicated through human middlemen in order to keep ! reliability in their correctness high. Thus, as long as MRNs are typed ! in by clerks and other people, one should separate them from TIIs. !
*************** *** 5755,5761 ****

! Postal and Residential Address
--- 6143,6149 ----

! Postal and Residential Address (AD)
*************** *** 5815,5821 ****

! Address Part
--- 6203,6209 ----

! Address Part (ADXP)
*************** *** 6270,6276 **** Word.
!
Hopkins R. Strategic short study: names and numbers as identifiers. CEN TC251. Available as PDF or Word. --- 6658,6664 ---- Word.
!
Hopkins R. Strategic short study: names and numbers as identifiers. CEN TC251. Available as PDF or Word. *************** *** 6294,6312 ****
Data Type Specification for Person Name
!
Earlier discussions included class person name and person name ! variant, but we found the requirement to model person name as a RIM ! class. What we did not realize is that, similar to the stakeholder id, ! our RIM class already exists, it only needs to be polished. ! !
The RIM class Person_name will be developed from the class ! Person_alternate_name of RIM 0.88 jointly with PAFM. A person may have ! multiple instance of the person name class, reflecting the multiple ! names the person is or was known by. ! !
Within this RIM class, there is a code that indicates what purpose ! a given name is to be used for. Most people in the world will have one ! name that is currently used. --- 6682,6700 ----
Data Type Specification for Person Name
!
The Person_name is a RIM class as of June 1999. This class is ! correctly associated with the class Person and the multiplicities of ! this association allow one person to have multiple names. A second ! association ("is_used_by") to the class Statkeholder allows a person ! name to be scoped to some organization (or even another individual ! person.) ! !
Within this RIM class Person_name, there is an attribute that ! indicates what purpose a given name is to be used for ("reason_cd") ! Most people in the world will have one name that is currently ! used. The following table is the Control Query recommendation to PAFM ! for a mandatory vocabulary for Person_name.reason_cd. We also suggest ! to rename this attribute to "purpose_cd".
Name Purpose Codes
*************** *** 6332,6350 ****
Name Purpose Codes
! Note that name purpose codes apply to an entire name that usually consists of several of the name parts described below.
There is also a way to specify the validity time of a name. !
This class also contains a representation of a single name variant ! as a list of person name parts that may or may not have semantic tags. ! !
Those RIM changes will have to be discussed jointly with CQ and ! PAFM at the Toronto meeting in April 1999. We will seek definite ! closure on the issue in Toronto after which Harmonization will be but ! a formal issue, since all relevant parties will have agreed to one ! proposal.
--- 6720,6733 ----
!
Note that name purpose codes apply to an entire name that usually consists of several of the name parts described below.
There is also a way to specify the validity time of a name. !
This class also contains an attribute "nm" which contains a single ! name variant as a list of person name parts that may or may not have ! semantic tags. This person name data type (PN) is defined as follows:
*************** *** 6372,6378 ****

! Person Name Part
--- 6755,6761 ----

! Person Name Part (PNXP)
*************** *** 6405,6410 **** --- 6788,6801 ----

+
Note that the Person Name (PN) data type is different from the + Person_name class. The data type is not a CMET or DMET of the class + but is used by the class as the data type of one of its + attributes. The naming overlap is to indicate that this HL7 version 3 + PN data type is the successor of the HL7 version 2 PN data type, while + the Person_name class can be understood as the successor of the + version 2 XPN data type. +
*************** *** 6507,6513 **** ! !
Name Part Classifiers
invisible 0 (zero) Indicates that a name part is not normally shown. For instance, traditional maiden ! names are not normally shown. Middle names may be invisible too.

weak W Used only for prefixes and suffixes (affixes). A weak affix has a weaker association --- 6898,6913 ----
invisible 0 (zero) Indicates that a name part is not normally shown. For instance, traditional maiden ! names are not normally shown. "Middle names" may be invisible too.
middle MIN Emphasizes that ! a name part is "the middle name" in the classic U.S. American ! First-Middle-Last name scheme. This classifier may only appear once in ! the entire name and may only be ascribed to the second given ! name part. No other use is permitted. Note that this tag is optional ! and completely redundant since the second of two given names can ! always be assumed to be "the middle name". It has been adopted only ! to satisfy public demand.

weak W Used only for prefixes and suffixes (affixes). A weak affix has a weaker association *************** *** 6673,6679 **** the data type definition. Not that nesting is a bad idea per se. However, since the nesting depth appears to be limited to three levels, the generality of nesting seems to not outweigh the ! wimplicity of a simple linear list.
There are other ramifications though, such as prefixes that consist of more than one part such as in French "Eduard de l'Aigle". Here "de --- 7073,7079 ---- the data type definition. Not that nesting is a bad idea per se. However, since the nesting depth appears to be limited to three levels, the generality of nesting seems to not outweigh the ! simplicity of a simple linear list.
There are other ramifications though, such as prefixes that consist of more than one part such as in French "Eduard de l'Aigle". Here "de *************** *** 6878,6905 **** distinct name forms that we decided to threat as separate Person names without trying to relate those name parts accross the variants. -
The following is the first example of a complete Person Name - structure. -
Bob Dolin, Robert Dolin, or Robert H. Dolin
! (SET ! (Person_name ! :value (PN (PersonNamePart :value "Bob" :classifiers (SET given nick)) (PersonNamePart :value "Dolin" :classifiers (SETfamily)))) ! (Person_name ! :value (PN (PersonNamePart :value "Robert" :classifiers (SETgiven)) (PersonNamePart :value "Dolin" :classifiers (SETfamily)))) ! (Person_name ! :value (PN (PersonNamePart :value "Robert" :classifiers (SETgiven)) (PersonNamePart :value "H." --- 7278,7300 ---- distinct name forms that we decided to threat as separate Person names without trying to relate those name parts accross the variants.
Bob Dolin, Robert Dolin, or Robert H. Dolin
! (PN (PersonNamePart :value "Bob" :classifiers (SETgiven nick)) (PersonNamePart :value "Dolin" :classifiers (SETfamily)))) ! ! (PN (PersonNamePart :value "Robert" :classifiers (SETgiven)) (PersonNamePart :value "Dolin" :classifiers (SETfamily)))) ! ! (PN (PersonNamePart :value "Robert" :classifiers (SETgiven)) (PersonNamePart :value "H." *************** *** 6908,6913 **** --- 7303,7309 ---- :classifiers (SETfamily)))))
+ we did not classify the person name variants here, since this would open up another can of worms. It almost seems like there is a gradual scale of formality which tells which of the various person names to *************** *** 7349,7355 **** or "deed poll". There is considerable overlap with the unmarried name classifier and the other classifiers of Axis 2. Consequently we had to relax the notion that axis 2 classifiers need ! to be mutual exclusive. Initials --- 7745,7751 ---- or "deed poll". There is considerable overlap with the unmarried name classifier and the other classifiers of Axis 2. Consequently we had to relax the notion that axis 2 classifiers need ! to be mutually exclusive. Initials *************** *** 7426,7461 **** unless there is any significant objection we can just stick to a v2.3-like solution. ! ! ! ! ! ! ! ! ! Organization Name (ON) ! ! ! A collection of organization name variants. ! ! SET OF Organization Name Variant ! ! ! ! --- 7822,7837 ---- unless there is any significant objection we can just stick to a v2.3-like solution. ! ! Organization Name Variant ! This type is not used outside of the Organization Name data type. Organization ! Names are regarded as a collection of organization name variants each ! used in different contexts or for a different purpose. component name *************** *** 7467,7473 **** --- 7843,7849 ---- *************** *** 7476,7492 **** ! Organization Name Variant (ON) ! A name for an organization. (What else is there to say?) component name Code Value optional ! A type code indicates what an organization name is to be used for. Examples are: alias, legal, stock-exchange. value Code Value optional ! A code indicating what an organization name is to be used for. Examples are: alias, legal, stock-exchange. value mandatory ! This contains the actual name data as a simple character string. ! ! ! ! --- 7852,7881 ----
mandatory ! The actual name data as a simple character string.

+
Note: this has changed. In a previous draft the + Organization Name (ON) was a set of Organization Name Variants + (ONXV) with no additional information. It is therefore simpler to + define ON in parallel with PN as representing one name variant and let + PAFM handle the rest in the RIM. +
Note: a harmonization request to PAFM is required + for the Organization class to +
+
delete attribute: Organization.organization_name_type_cd
+
Rationale: Attribute duplicates the ON.type component of the + Organization name data type.
+ +
rename attribute: Organization.organization_nm to "nm"
+
Rationale: Name does not conform to the MDF style guide as it + repeats the name of its class.
!
assign data type: Organization.nm : SET<ON>
!
*************** *** 7525,7555 **** can not be considered exact.
Most computer programming languages distingush between the two data ! types integer and floating point number. Some know rationals and ! complex numbers. Whereas HL7 v2.x had only one data type for numbers, ! HL7 v3 will distinguish between interger and floating point. This distinction is suggested not just by technological considerations ! (both are implemented quite differently). !
The main reason for distinguishing integer and floating point ! numbers is about semantics. Integer numbers are exact results of ! counting and enumerating. In natural science and real life, integer ! numbers are rather rare. Measurements, estimations, and many ! scientific computations have floating point numbers as their results, ! imprecise real numbers. Measurements are but approximations to the ! quantitative phenomena of nature.
There are other distingished quantitative phenomena that can be partially described by numbers but which have a meaning beyond numbers. Among such quantitative phenomena are physical measurements with units of measure, money, and real time as measured by clendars. !
This specification defines data types for integer and floating ! point numbers, for physical measurements, money, and calendars. There ! are many more quantitative phenomena that we may or may not define ! data types for in the future. Examples for those we will define are ! vectors, waveforms, and possibly matrices. We will probably not ! consider complex numbers, except if a concrete use case appears.
4.2 Integer Number
--- 7914,7944 ---- can not be considered exact.
Most computer programming languages distingush between the two data ! types integer and real (floating point) number. Some know rationals ! and complex numbers. Whereas HL7 v2.x had only one data type for ! numbers, HL7 v3 will distinguish between interger and real. This distinction is suggested not just by technological considerations ! (both are implemented quite differently). !
The main reason for distinguishing integer and real numbers is ! about semantics. Integer numbers are exact results of counting and ! enumerating. In natural science and real life, integer numbers are ! rather rare. Measurements, estimations, and many scientific ! computations have real numbers as their results, imprecise real ! numbers. Measurements are but approximations to the quantitative ! phenomena of nature.
There are other distingished quantitative phenomena that can be partially described by numbers but which have a meaning beyond numbers. Among such quantitative phenomena are physical measurements with units of measure, money, and real time as measured by clendars. !
This specification defines data types for integer and real numbers, ! for physical measurements, money, and calendars. There are many more ! quantitative phenomena that we may or may not define data types for in ! the future. Examples for those we will define are vectors, waveforms, ! and possibly matrices. We will probably not consider complex numbers, ! except if a concrete use case appears.
4.2 Integer Number
*************** *** 7557,7563 **** ! --- 8233,8239 ---- ! *************** *** 7900,7906 **** defines a generalization of rational numbers, the Ratio. A ratio is any quotient of two quantities. Those can be two integers, in which case we have an exact rational number. But the quotient can be built ! as well from floating point values, or physical measurements or any combination thereof.
Note that the ratio has the semantics of a quotient. The ratio data --- 8300,8306 ---- defines a generalization of rational numbers, the Ratio. A ratio is any quotient of two quantities. Those can be two integers, in which case we have an exact rational number. But the quotient can be built ! as well from real number values, or physical measurements or any combination thereof.
Note that the ratio has the semantics of a quotient. The ratio data *************** *** 7911,7917 ****

! Integer Number (Integer, IN)
--- 7946,7952 ----

! Integer Number (INT)
*************** *** 7617,7635 **** !
4.3 Floating Point Number
--- 8006,8031 ---- ! !
4.3 Real Number (was: Floating Point Number)
! !
Note: can we change the name in the ! last minute? I realized too late that calling it "Floating Point ! Number" is incorrect, since that name refers to a particular ! computer-representation of a number. I would now much rather call it ! "Real".

! Floating Point Number (Float, FPN)
! Floating point numbers are approximations for real numbers. Floating ! point numbers occur whenever quantities of the real world are measured or estimated or as the result of calculations that include other ! floating point numbers.

component name
*************** *** 7649,7672 ****

! Real Number (was: Floating Point Number, FPN)
! A data type that approximates real numbers to a certain precision. ! Real numbers occur whenever quantities of the real world are measured or estimated or as the result of calculations that include other ! real numbers.

component name Integer Number required ! The precision of the floating point number in terms of the number of significant decimal digits.

Semantic components vs. representational components
! A floating point number has the semantic components value ! and precision, however, this does not necessarily mean ! that any representation of a floating point number will be a structure ! of two distinct components. Especially, since we do not specify a data ! type for true real numbers of infinite precision, the value component is not of an existing data type.
Precision
!
The precision of a floating point number is defined here as the number of decimal digits. According to Robert S. Ledley [Use of computers in biology and medicine, New-York, 1965, p. 519ff]: "A number composed of n significant figures is --- 8045,8074 ----
Integer Number required ! The precision of the real number in terms of the number of significant decimal digits.

Semantic components vs. representational components
!
A real number has the semantic components value and ! precision, however, this does not necessarily mean that ! any representation of a floating point number will be a structure of ! two distinct components. Especially, since it is not possible to ! define a data type for true real numbers of infinite precision, the value component is not of an existing data type. +
Rather than being components of the data type "value" and + "precision" that can be evaluated on the application layer. These + properties must be kept invariant throughout all ITS + implementations. This is especially an issue if binary floating point + numbers are used, such as IEEE 754. +
Precision
!
The precision of a real number is defined here as the number of decimal digits. According to Robert S. Ledley [Use of computers in biology and medicine, New-York, 1965, p. 519ff]: "A number composed of n significant figures is *************** *** 7723,7732 **** are well known in the medical profession. However, these statistical methods are quite complex, and exact probability distributions are often unknown. Therefore, we want to keep those separate from a basic ! data type of floating point numbers. However, floating point numbers ! are approximations to real numbers and we want to account for this ! approximative nature by keeping a basic notion of precision in terms ! of significant digits right in the floating point data type.
In many situations, significant digits are a sufficient estimate of the uncertainty, but even more important, we must account for --- 8125,8134 ---- are well known in the medical profession. However, these statistical methods are quite complex, and exact probability distributions are often unknown. Therefore, we want to keep those separate from a basic ! data type of real numbers. However, a data type for real numbers can ! only be an approximation to true real numbers and we want to account ! for this approximative nature by keeping a basic notion of precision ! in terms of significant digits right in the real number data type.
In many situations, significant digits are a sufficient estimate of the uncertainty, but even more important, we must account for *************** *** 7738,7766 ****
No fixed arbitrary limits on value range
!
No arbitrary limit is imposed on the range or precision of floating ! point numbers. Thus, theoretically, the capacity of any binary representation is exceeded, whether 32 bit, 64 bit, or 128 bit size. Domain committees should not limit the ranges and precision of ! floating point numbers only to make sure the numbers fit into current ! data base technology. Designers of Implementable Technology ! Specifications (ITS) should be aware of the possible capacity limits ! of their target technology. ! !
The infinity of floating point numbers is represented as a special ! value. The representation of floating point numbers is up to the ! ITS. In our instance notation we use the special symbol ! #finf for positive infinity (Aleph₁), ! #nfinf for negative infinity (- ! Aleph₁.) Note that #nfinf = - ! #finf.
Constraints on value ranges

In cases where limits on the value range are suggested semantically by the application domain, the committees should specify those ! limits. For example, probabilities should be expressed in floating ! point numbers between 0 and 1.
Although we do not yet have a formalism to express constraints, we should not hesitate to document those constraints informally. We will --- 8140,8167 ----
No fixed arbitrary limits on value range
!
No arbitrary limit is imposed on the range or precision of real ! numbers. Thus, theoretically, the capacity of any binary representation is exceeded, whether 32 bit, 64 bit, or 128 bit size. Domain committees should not limit the ranges and precision of ! real numbers only to make sure the numbers fit into current data base ! technology. Designers of Implementable Technology Specifications (ITS) ! should be aware of the possible capacity limits of their target ! technology. ! !
The infinity of real numbers is represented as a special value. The ! representation of real numbers is up to the ITS. In our instance ! notation we use the special symbol #finf for positive ! infinity (Aleph₁), #nfinf for negative ! infinity (- Aleph₁.) Note that #nfinf ! = - #finf.
Constraints on value ranges

In cases where limits on the value range are suggested semantically by the application domain, the committees should specify those ! limits. For example, probabilities should be expressed in real numbers ! between 0 and 1.
Although we do not yet have a formalism to express constraints, we should not hesitate to document those constraints informally. We will *************** *** 7769,7791 ****
ITS Presentation and Literals
!
We allow floating point numbers to be represented by character ! string literals containing signs, decimal digits, a decimal point and exponents. An ITS for XML will most likely use the string literal to ! represent floating point numbers. Other ITSs, such as for CORBA, might ! choose to represent floating point numbers by variable length bit ! strings or by choices of either a native (IEEE) floating point format ! or a special long floating point format. ! !
Decimal floating point numbers can be represented in a standard ! way, so that only significant digits appear. This standard ! representation always starts with an optional minus sign and the ! decimal point, followed by all significant digits of the mantissa ! followed by the exponent. Thus 123000 is represented as ! ".123e6" to mean .123 × 10⁶; 0.000123 is ! represented as ".123e-3" to mean .123 × ! 10^-3; and -12.3 is represented as "-.123e2". ! to mean -.123 × 10².
The reason why we define decimal literals for data types is to make the data human readable. To render the value 12.3 as --- 8170,8191 ----
ITS Presentation and Literals
!
We allow real numbers to be represented by character string ! literals containing signs, decimal digits, a decimal point and exponents. An ITS for XML will most likely use the string literal to ! represent real numbers. Other ITSs, such as for CORBA, might choose to ! represent real numbers by variable length bit strings or by choices of ! either a native (IEEE 754) floating point format or a special long ! floating point format. ! !
Decimal real numbers can be represented in a standard way, so that ! only significant digits appear. This standard representation always ! starts with an optional minus sign and the decimal point, followed by ! all significant digits of the mantissa followed by the exponent. Thus ! 123000 is represented as ".123e6" to mean .123 × ! 10⁶; 0.000123 is represented as ".123e-3" to ! mean .123 × 10^-3; and -12.3 is represented as ! "-.123e2". to mean -.123 × 10².
The reason why we define decimal literals for data types is to make the data human readable. To render the value 12.3 as *************** *** 7833,7839 ****
::= sign digits | digits
float ::= mantissa e exponent | mantissa ::= sign digits | digits
real ::= mantissa e exponent | mantissa

! Ratio
--- 8311,8317 ----

! Ratio (RTO)
*************** *** 7943,7949 ****
A Quantity is a generalization of the following data types:

Integer Number !
Floating Point Number
PhysicalQantity
MonetaryAmount
Ratio (recursively) --- 8343,8349 ----
A Quantity is a generalization of the following data types:

Integer Number !
Real Number
PhysicalQantity
MonetaryAmount
Ratio (recursively) *************** *** 7983,7989 ****

! Physical Quantity
--- 8383,8389 ---- ! !

! Physical Quantity (PQ)
*************** *** 7997,8003 **** description

value Floating Point Number required The magnitude of the quantity measured in terms of the unit. --- 8397,8403 ---- description

value Real Number required The magnitude of the quantity measured in terms of the unit. *************** *** 8101,8107 **** ! ! + + + + + + + + + + + + + + + + + + +

! Monetary Amount
--- 8501,8507 ---- ! !

! Monetary Amount (MO)
*************** *** 8115,8121 **** description

value Floating Point Number required The magnitude of the monetary amount in terms of the currency unit.. --- 8515,8521 ---- description

value Real Number required The magnitude of the monetary amount in terms of the currency unit.. *************** *** 8255,8261 **** not essentially quantitative items.
Of course, you can count tablets (like you can count all kinds of ! things), of course, a tablet, as a physical body does have volume, length, width, and depth. But the essence of a tablet is its form and not any specific kind of quantity. Conversely the essence of a meter is a certain amount of length, the essence of a --- 8655,8661 ---- not essentially quantitative items.
Of course, you can count tablets (like you can count all kinds of ! things); of course, a tablet, as a physical body does have volume, length, width, and depth. But the essence of a tablet is its form and not any specific kind of quantity. Conversely the essence of a meter is a certain amount of length, the essence of a *************** *** 8308,8320 ****
4.6 Time
!
4.6.1 Point in Time

! Point in Time
--- 8708,8804 ----
4.6 Time
+ The treatment of dates and times has always been somewhat of a sticky + issue in most data type specifications. The problem being that humans + usually keep time using calendars which are traditional and quite + complex numerical and ordinal constructs. The western world today uses + the Gregorian calendar consistently and keeps time using the Universal + Coordinated Time system. However, as the developed post-industrial + western world becomes more and more aware of inter-cultural issues, + other calendar systems than the Gregorian calendar are increasingly + recognized. For example, Java has adopted a separation between a data + type for points in time (the class java.util.Date) + and an abstract type for various calendars (java.util.Calendar,) + of which the Gregorian/Julian calendar was first implemented (java.util.GregorianCalendar) + IBM has made + additional calendar classes available for Buddhist, Hebrew, Islamic + and Japanese Imperial calendars. + +
While we believe that the western calendar will prevail to be the + domiant calendar in healthcare we want to keep the separation of an + abstract concept of time from calendar dependent time notations. While + we do not specify support for any other than the Gregorian calendar in + this document, we assume that local HL7 user groups can do so in a + compatible way. + + This specification clearly distinguishes the following concepts + related to time: + +
+
Point in Time +
One specific point on the real time axis.
+ +
Interval of Point in Time +
A continuous period of time between one start and one end + time. Start and end times may be infinite or unspecified, which allow + the expressions of e.g. open ended periods.
+ +
Duration of Time +
A duration is a quantity of time with no particular start and + end time just as a length is a quantity without a particular place in + space. Durations of time are nothing else than a Physical Quantity in the dimension of + time.
+ +
Periodic Points or Intervals of Time
+
There is a number of time expressions we use to specify + periodic events. Those may be as simple as "every day at 08:00" or as + complex as "the second Sunday of the month May."
+ +
Arbitrary sets of Time
+
In scheduling recurrent events one may want to build arbitrary + sets of points in time or sets of intervals of time, recurrent in + possibly very complex patterns. Not yet supported in all generality.
+
+ + + + + +
+
Figure 6: Time-related phenomena recognized by this + data type specification.
+ +
We do not consider event- or activity-related concepts as in the + scope of time. For example, such expressions as "at the hour of sleep" + or "before meal" or "since a particular accident" are not genuinely + concepts of time. Events and activities are naturally related to time, + but are still quite different from time. + + + +
4.6.1 Time Durations
+ +
Some recently developed type systems define a special data type for + durations (e.g. for instance the one developed by M. Stonebreaker for + the POSTGRES object-relational data base project) The Arden syntax + also knows such a concept. In this v3 data type model, however, time + durations are but a special case of a physical quantity. Durations of + time are nothing else than measurements in the dimension of time. Thus + those durations have the units 1 s, 1 min, 1 hr, 1 d, 1 wk, 1 mo, 1 a, + etc. + + !
4.6.2 Point in Time

! Point in Time (TS) (also called "time stamp")
*************** *** 8327,8341 ****

!
The natural time scale is, almost like the temperature scales (Celsius or Fahrenheit), an interval scale ! (aka. difference scale). While the Celsius temperature scale ! defines a zero point at the freezing point of water and a standard ! degree as 1/100 of the boiling point of water, the Christian calendar ! defines the zero point at the birth of Christ, and the basic unit of ! time as the second. There are obvious problems with the determination ! of the zero point of the Christian calendar, but the principle is the ! same.
Zero points on the natural time axis are chosen arbitrarily, and called the "epoch". --- 8811,8835 ----

!
We conceptualize time in a naive sense as one universal continuous ! and even dimension of time, just as in a fourth Euclidean dimension in ! our coordinate system of events (space and time.) This notion of time ! is naive in the same sense as our typical notion of space is naive ! given the Relativity theory. However, this notion reasonably ! approximates "reality" for all purposes of earthly health care. ! !
This natural time scale is, almost like the temperature scales (Celsius or Fahrenheit), an interval scale ! (aka. difference scale). Such interval scales are ! characterized by an arbitrary choice of the origin (zero-point.) ! While the Celsius temperature scale defines a zero point at the ! freezing point of water and a standard degree as 1/100 of the boiling ! point of water, the modern western calendar (Gregorian calendar ! combined with the Universal Cooordinated Time) defines the zero point ! at the birth of Christ and the basic unit of time as the second. There ! are obvious problems with the determination of the zero point of the ! Christian calendar, and there have been really two Christian calendars ! (Julian and Gregorian,) but the principle is the same.
Zero points on the natural time axis are chosen arbitrarily, and called the "epoch". *************** *** 8343,8396 ****
Many data type specifications for point in time are based on an epoch. Examples for epochs are: 1/1/1970 00:00:00 UCT on Unix, 1/1/1980 00:00:00 UCT on MS DOS, 12/31/1959 00:00:00 EST in the ! Regenstrief MRS, 10/15/1582 00:00:00 UCT in CORBA's COAS. Basic ! durations are seconds, milliseconds, microseconds, or nanoseconds measured from that epoch. This way of representing time is very ! simple. Although it is not easily human readable, it is very easy to ! compute with those standardized time values.
Traditionally the even flow of time is "convoluted" in many cycles defined by calendars. Such cycles are years, months, days, hours, minutes, seconds. Those cycles are not synchronized. Traditionally ! calendars have been define based on astronomical phenomena, however, calendar years, months and days are not attached directly to astronomical phenomena. The closest fit is the calendar day to the solar day, but the calendar month is definitely not the same as a lunar (synodal) month. !
Humans communicate points in time as calendar ! expressions. Calendars are quite complex constructs which are ! dependent on culture. Bali, for example, is said to uses 6 different ! calendars. ! !
To account for the calendar problem, the basic Java library defines ! two classes: java.util.Date and ! java.util.Calendar. Date is defined as a ! point in universal coordinated time of the form epoch/duration ! (Java's epoch is 1.1.1900 00:00:00 UTC). Calendar is a ! generalization of a GregorianCalendar an potentially ! other calendars. !
It is quite difficult to convert a calendar expression into an epoch/duration form. There are not just leap days (Feb. 29) added to leap years, but also leap seconds (added to leap days). The algorithms to determine leaps is difficult (leap year) or non-existent (leap ! second). The latter are taken from tables published in Astronomical ! Almanacs. But fortunately, conversion is done by most operating ! systems or the basic Java library. ! !
Calendar expressions are for humans to understand and are therefore ! represented as character string literals. The semantic components of a ! calendar expression may be different from the components identifiable ! in a particular surface form. ! !
Quite solid standards for expressions in the Gregorian calendar are ! HL7 v2.3's TS data type, and ISO 8601 (adopted in Europe as EN 28601). ! ASN.1's (ISO 8824) GeneralizedTime is a restricted form ! of ISO 8601. HL7's TS format is used by ASTM 1238 as well and lives on ! in ANSI HISPP MSDS CDT's DateTime format. Although HL7's ! TS format and ISO 8601 are similar, they also have considerable ! differences.
For HL7 v3 it seems worthwhile to consider adopting ISO 8601 [more about --- 8837,8939 ----
Many data type specifications for point in time are based on an epoch. Examples for epochs are: 1/1/1970 00:00:00 UCT on Unix, 1/1/1980 00:00:00 UCT on MS DOS, 12/31/1959 00:00:00 EST in the ! Regenstrief MRS, 10/15/1582 00:00:00 UCT in CORBA's COAS. Typical ! granularities are seconds, milliseconds, microseconds, or nanoseconds measured from that epoch. This way of representing time is very ! simple: all that is needed is a counter that counts the ticks of a ! clock since the epoch. Although this is not easily human readable, it ! is very easy to compute with those standardized time values. ! !
Translations between epoch-granularity-counter systems (clocks) are ! simple linear translations between coordinate systems. ! !
Even though clocks are based on some granularity, one can ! conceptually base a clock on a scale based on real numbers, so that ! theoretically the time is measured continuously in any unit of elapsed ! time from the epoch. For example, given an epoch of January 1 1996, ! one can specify points in time such as July 9 1999 2:45 PM simply as ! 30878.75 days. Obviously the granularity is unbounded, that is, given ! a precise measuring method one can specify the time exact to the ! millisecond, nanosecond, picosecond, and more. In the common floating ! point registers for real numbers on computers, the precision is ! reduced with greater distance from zero, which is just what one would ! expect, given that the epoche is reasonably near the present time. ! !
So, a representation of time based on an epoch and a Physical ! Quantity would be all that is needed. Indeed, since this ! representation of time comes closest to our conceptulaization of real ! time, an Implementable Technology Specification (ITS) may choose such ! a representation for time. This ITS-independent data type ! specification, however, concerns itself with time representation only ! for two reasons: ! !
!
To define literals used in examples and constraints, and ! electively used by an ITS. ! !
To understand and account for the fact that time expressions are ! often aligned to calendars. !
! !
Obviously the epoch-duration form of a point in time value is not ! very useful for a literal expression of time. Though astronomers use a ! simple counting of days elapsed since noon, Monday, January 1, 4713 ! B.C. on the Julian calendar, in the health care world we are not used ! to look at time in this way. So, our literal expression of time must ! be based on a our calendar. ! ! !
Calendars

Traditionally the even flow of time is "convoluted" in many cycles defined by calendars. Such cycles are years, months, days, hours, minutes, seconds. Those cycles are not synchronized. Traditionally ! calendars have been defined based on astronomical phenomena, however, calendar years, months and days are not attached directly to astronomical phenomena. The closest fit is the calendar day to the solar day, but the calendar month is definitely not the same as a lunar (synodal) month. !
Figure 7 below visualizes a calendar as a trajectory summed up from ! four such cyclical movements, year, month, day and hour. Imagine a ! clock that measures those cycle, but where the pointers are not all ! stacked on a common axis but each pointer is attached to the end of ! the pointer measuring the next larger cycle. ! ! ! ! ! !
!
Figure 7: A calendar "rolls" the time axis into a ! complex convolute according to the calendar periods year (blue), month ! (yellow), day (green), hour (red), etc. The cycles need not be ! aligned, for example, the week (not shown) is not aligned to the ! month.
! !
After rolling the time axes into the traditional cycles, a calendar ! expresses time as a sequence of integer counts of cycles, e.g., for ! year, month, day, hour, etc. !
Because of the complex and often uneven relationship between the ! cycles, it is quite difficult to convert a calendar expression into an epoch/duration form. There are not just leap days (Feb. 29) added to leap years, but also leap seconds (added to leap days). The algorithms to determine leaps is difficult (leap year) or non-existent (leap ! seconds.) Leap seconds, for example, are determined sporadic and ! published as tables in Astronomical Almanacs. But fortunately, ! conversion is done by most operating systems, or by other available ! modules, such as the Java core library. ! !
Literal Expressions for Point in Time
! !
Quite solid standards for literal expressions of points in time ! based on the western calendar are HL7 v2.3's TS data type, and ISO ! 8601 (adopted in Europe as EN 28601). ASN.1's (ISO 8824) ! GeneralizedTime is a restricted form of ISO 8601. HL7's ! TS format is used by ASTM 1238 as well and lives on in ANSI HISPP MSDS ! CDT's DateTime format. Although HL7's TS format and ISO ! 8601 are similar, they also have considerable differences.
For HL7 v3 it seems worthwhile to consider adopting ISO 8601 [more about *************** *** 8402,8408 **** YYYY-MM-DDThh:mm:ss ! the dashes between the date components, the colons between the time components and the "T" between date and time components may, according to ISO 8601, as well be omitted. The omission of those characters brings about a form very similar to ASN.1 or HL7's TS. The --- 8945,8951 ---- YYYY-MM-DDThh:mm:ss ! The dashes between the date components, the colons between the time components and the "T" between date and time components may, according to ISO 8601, as well be omitted. The omission of those characters brings about a form very similar to ASN.1 or HL7's TS. The *************** *** 8417,8423 ****
The W3C is considering a subset of ISO 8601 for adoption. W3C's subset requires the "T" between date and ! time.
Useful features of ISO 8601 that are not part of HL7's TS type are so called "ordinal dates" of the form --- 8960,8967 ----
The W3C is considering a subset of ISO 8601 for adoption. W3C's subset requires the "T" between date and ! time. The W3C schema working group, however, is using full featured ! ISO 8601 with all options.
Useful features of ISO 8601 that are not part of HL7's TS type are so called "ordinal dates" of the form *************** *** 8434,8444 **** of a year, or (3) the week of the year plus the day of the week.
Moreover, ISO 8601 allows omission of more significant components ! (the delimiter dash, colon, or "T" must occur in those cases). This ! changes the semantics of the expression from a point in time to a ! calendar modulo expression. For example "---2" means every Tuesday, ! but subtle variations may have big impact on the meaning: "-W-2" means ! Tuesday "of the current week" (whatever this means).
Both, HL7's TS and ISO 8601 handle time zones through offsets of the form "+hh:mm" or "-hh:mm" relative to UTC. TS adds a "Z" in front --- 8978,8989 ---- of a year, or (3) the week of the year plus the day of the week.
Moreover, ISO 8601 allows omission of more significant components ! (the delimiter dash, colon, or "T" must occur in those ! cases). This changes the semantics of the expression from a point in ! time to a periodic point in time. For example "---2" means every ! Tuesday, but subtle variations may have big impact on the meaning: ! "-W-2" means Tuesday "of the current week," which is a relative point ! in time.
Both, HL7's TS and ISO 8601 handle time zones through offsets of the form "+hh:mm" or "-hh:mm" relative to UTC. TS adds a "Z" in front *************** *** 8455,8531 **** most TS expressions compatible with ISO 8601 expression. Notably the "Z" should be used in the ISO 8601 style (i.e. only for UTC). !
4.6.2 Time Durations
! !
Some recently developed type systems define a special data type for ! durations (e.g. for instance the one developed by M. Stonebreaker for ! the POSTGRES object-relational data base project) The Arden syntax ! also knows such a concept. In this v3 data type model, however, time ! durations are but a special case of a physical quantity. Durations of ! time are nothing else than measurements in the dimension of time. Thus ! those durations have the units 1 s, 1 min, 1 hr, 1 d, 1 wk, 1 mo, 1 a, ! etc. !
4.6.3 Other issues and curiosities about Time
!
"I got sick at my birthday, about 20 years ago," is an expression ! that we might want to capture. One possible representation for this ! time would be "yyyy0219" if my birthday is February 19th and if yyyy ! is constrained to this year - yyyy is approximately 20 ! years. If from another source we gather that I got sick in "1976", but ! don't know the exact month and day, then we can conclude that I got ! sick in "19760219", because 1998 - 1976 = 22. This seems a somewhat ! rare use case, but definitely worth considering. !
"I got that cough in spring," might lead us to adjust probabilities ! for pollen allergy. The season of the year is of interest in ! epidemiology. Bob Dolin, in his JAMIA Article on Modeling the ! temporal complexities of symptoms, suggests accounting for ! "season" in time expressions. The difficulty here is that seasons ! depend on the geographical latitude and we can not infer the season ! from the month of the year. January is Summer in Australia, South ! Africa, Chile, and Argentinia while northern folks assume that January ! is the worst part of the Winter. Moreover, at the equator there are ! not the usual four seasons, however, in tropical regions, there is the ! Monsun season, which may be considered one of two seasons, or a fifth ! season. I propose to defer season as part of a point in time ! expression until the use and the implications become more clear. !
Noteworthy references on time expressions are CEN TC251's ENV 12381 ! Health care informatics; time standards for health care specific ! problems and the ARDEN Syntax. Those two standards not only ! define relations and operators on time values but also on events and ! episodes which are related in time. -
Relative times of the semantics NOW + duration offset stick out as - the most prominent feature defined by those and other time related - standards. We might thus consider the ability to specify relative - time. Some conventions use expressions like "t-1" to mean - "yesterday". Relative time expressions are of the data type point in - type, but the exact value depends on a parameter (the actual time) - specified elsewhere. !
4.6.4 Calendar Modulus Expressions

A modulus is the remainder of an integer division. For example, 12 ! modulo 7 is 5. If we have the time defined as epoch + duration in ! days, we can tell the day of the week of any date if we know the day ! of the week of the epoch. For instance, let our epoch be January 1 ! of 1582 (when the Gregorian calendar was introduced) was a Monday. We ! can easily tell the weekday of January 31 1582: the offset from the ! epoch is 30 days. A week has seven days, 30 modulo 7 is 2. Monday ! plus two days is Wednesday. The same way we can tell that the date ! epoch + 151840 days (some time in 1998) is a Thursday. !
Other such modulus expressions exist in calendars, all of which ! have the form:
! --- 9000,9166 ---- most TS expressions compatible with ISO 8601 expression. Notably the "Z" should be used in the ISO 8601 style (i.e. only for UTC). !
Furthermore the HL7 v2 TS format is very uniform and concise, which ! makes it suitable to be used as a model for literal expressions of ! other calendar times. Any new calendar to be defined needs to specify ! the calendar cycles, and their position and number of digits in the ! literal expression. In order to disambiguate literals from different ! calendars, the literal needs to be tagged by a calendar type ! code. This calendar type code will be prepended with a colon. The ! calendar type code for the western (Gregorian) calendar is "GREG" and ! need not be mentioned since it is the default. ! ! ! !
4.6.3 Time Interval
! !
A time interval is the continuous and uncountably infinite set of ! time points between a low bound and a high bound. The time interval is ! defined using the generic Interval data type ! defined further below. Low or high boundary can in principle be ! infinite but there is rarely any case where we need an infinite ! boundary, rather a boundary may be unknown, but is known to be ! finite. An interval can be specified incompletely by having an ! unspecified low or high boundary. For example, an Employment has a ! start date but need not have a fixed end date yet, in which case the ! high boundary is left unspecified. An interval also has a width which ! is the difference between high and low boundary. The width of the ! interval can be specified independently if neither the high boundary ! nor the low boundary is fixed. ! !
The literal expression of an interval of time is defined by the ! generic literal expression for interval and point in time. Examples ! are: !
!

unit₁ of the unit₂

day of the week

month of the year
! ! ! ! ! ! ! ! !
19980309-20000308 March 9 1998 to March 8 2000
<=20000308 until March 8 2000 with unspecified begin date
>=19980309 from March 9 1998 with unspecified end date
[26.58 h] unspecified boundaries but width of 26 hours 35 minutes is ! known
!
!
The width of an interval is specified using the data type for the ! difference of two points in time, which is a duration, and which in ! turn is expressed as a physical quantity. ! !
4.6.4 Periodic Time
! !
There are a number of problems that can neither be explained by a ! duration, nor a point in time, nor one specific time interval. The ! following are important examples in HL7: ! !
! !
Office hours, for example: ! ! ! ! ! . !
Monday to Wednesday 08:00 to 16:00,
Thursday 08:00 to 21:00,
Friday 08:00 to 12:00,
closed on holidays
!
! !
"Snowbirds," e.g., people who periodically change ! addresses. For example, who live in Minneapolis between May 15 and ! September 15 and in Phoenix, Arizona from September 15 to May 15. !
! !
Phone numbers to use on workdays from 08:00 to 17:00 and ! another phone number for evenings, weekends, and holidays.
! !
Medication schedules. For example, Amoxicillin 3 times a day ! for 10 days, or Cumarin 1 tablet Monday and Friday, and 1/2 tablet on ! Wednesday and on Sunday.
! !
Other schedules, e.g., home health care visit every other day ! in the morning, and every second saturday in the afternoon.
!
! ! !
The industry has developed nifty ways to specify these phenomena ! based on the western calendar. While this is intuitive for humans, it ! can not easyly be translated between calendars. Therefore we invest ! some effort here looking for a conceptualization of these phenomena ! that is independent from a particular calendar as much as possible. ! !
Periodic phenomena in physics are likened to waves which in turn ! are described by rotations. The rate of rotations is described either ! by frequency f (full rotations per time,) period duration ! T, or angular velocity omega. ! !
In addition to the rate of rotation, there is an offset angle ! phi, called phase. The concept of phase becomes clear if you ! imagine two wheels rotating with the same frequency but the first ! wheel has started rotating a short time before the second wheel. Thus, ! the first wheel is always ahead of the second wheels position by some ! rotational angle. We can measure phase in either the angle or in the ! amount of time elapsed between the start of the first wheel and the ! second wheel. ! !
Many periodic events are most naturally specified with a ! frequency. For example, in "amoxicillin 3 times a day" the periodic ! event "give amoxicillin" is timed by a frequency f = 3 /d. A ! frequency can be interpreted in two ways: either as an exact timing of ! the recurring event, such that it is distributed evenly, or as a ! statistical timing, such that the individual events occur at variable ! intervals but at average occur at the specified frequency. The latter ! is the typical case in medication scheduling. ! !
If the intervals between the recurrences are not even, and if they ! need to be specified precisely the frequency and phase is not ! enough. For example if we want the 3 events per day scheduled at 7:00, ! 11:30, 17:00, we use the time of day to specify when the event ! is to occur. But what exactly is time of day, or the similar ! phenomena, such as day of the week, month of the year or ! week of year? Obviously those expressions are closely related ! with the cycles of the calendar. ! !
On a simple digital calendar expression according to the HL7 TS ! format, such as "199907091956", we intuitively know what time of day ! and the like is: those expressions come about if we delete the ! high-order digits from the left. For example "yyyymmdd1956" ! stands for some day at 7:56 PM. ! !
Since a calendar divides the even flow of time into cycles and ! counts full cycles in integer number, we are reminded of congruences, ! modulus and remainders.
A modulus is the remainder of an integer division. For example, 12 ! modulo 7 is 5. A congruence is similar to an equation based on an ! equality operation that partitions the set of integer numbers into ! remainder classes. For example 5 = 12 (mod 7), but also 12 = 19 (mod ! 7). Such a congruence is like the integer variation of a rotation ! described above, where the modulus (e.g., 7) is the period and the ! remainder is the phase. ! !
If we have the time defined as epoch + duration in days, we can ! tell the day of the week of any date if we know the day of the week of ! the epoch. For instance, let our epoch be Monday January 1 of 1996. ! We can easily tell the weekday of July 9 1999. The duration between ! the epoche and the example date is 1285 days. Since the week is a ! cycle with period 7 days, we take 1285 mod 7 = 4. This is for Monday = ! 0, Tuesday = 1, ..., Friday = 4, a Friday. !
Other such congruences can be constructed, all of which have the ! form:
! *************** *** 8539,8570 ****

remainder of the modulus

day of the week

month of the year

!
Obviously, unit₁ must be less than ! unit₂. All those units are defined by the calendar ! and may be slightly different from related units defined for time ! durations. For instance, the average Julian month is 30.4375 days, but ! a calendar month varies between 28 and 31 days. Thus the modulo ! expression "month of the year" must be made available by the calendar ! and can not easily be calculated using the average month. ! !
How do we express complex modulo expressions that are not provided ! by the calendar? Things like "every other Tuesday" come to mind. We ! could tell whether or not a certain date is an every other ! Tuesdays by testing the the equation:
! date modulo ( 2 x 7 ) = 1; given that 0=Monday, 1=Tuesday, ...
while every Tuesday would be:
! date modulo 7 = 1; given that 0=Monday, 1=Tuesday, ...
!
We decided to ponder on the calendar modulo expressions for some ! time before coming back to it. --- 9174,9806 ----
!
All those units are defined by the calendar and aligned to the ! calendar and thus are different from related units defined as averages ! for time durations. For instance, the average Julian month is 30.4375 ! days, but a calendar month varies between 28 and 31 days. Thus the ! congruence expression "month of the year" must be made available by ! the calendar and can not easily be calculated using the average month. ! !
We can form more complex congruence expressions that are not ! provided by the calendar. For example, "every other Tuesday" is ! described by the congruence
! days since epoch = 1 (mod 2 x 7); given that 0=Monday, 1=Tuesday, ...
while every Tuesday would be:
! days since epoch = 1 (mod 7); given that 0=Monday, 1=Tuesday, ... !
! !
The congruences are a first step towards understanding what happens ! when we delete digits from the right of our calendar time ! expressions. Since the calendars are constructs of integer numbers, ! congruences are the way to express the calendar cycles. The ! disadvantage of this is that with one such modulus expression we can ! only express a recurring points in one such cycle. What if we ! want to specify recurring ranges, say, something like: every ! Monday from 9:00 to 15:00? ! !
We can use a period and a phase, instead of a modulus and a ! remainder. In fact, periodic continuous functions in analysis are the ! counterpart of congruences in number theory. Both, period and phase ! can be measured in elapsed time, and are not restricted to ! integers. Monday 0:00 would be: !
! period = 1 week
! phase = 0 days !
! assuming that our week-cycles start at a Monday. In order to get to ! Monday at 9:00 we need to increase our phase slightly by 3/8 = 0.375 ! days. Thus Monday 9:00 is !
! period = 1 week
! phase = 0.375 days !
! Thus we can move the phase within the one week period to any time we ! like. For example, Thursday 12:00 noon would be !
! period = 1 week
! phase = 3.5 days !
! This specifies a periodic point in time. If we want a periodic ! interval of time, such as Mondays 9:00 to 15:00, we can specify the ! phase as an interval: !
! period = 1 week
! phase = [0.375; 0.625] days !
! This works for other cycles just as well. For example, the 5th day of ! the month 0:00 is given as !
! period = 1 month
! phase = 5 days !
! And the entire 5th day of the month from 0:00 to 23:59:59.9999... is ! given using the right-open interval [5; 6[ !
! period = 1 month
! phase = [5; 6[ days !
! The time of day 8:00 to 9:00 is given as !
! period = 1 day
! phase = [8; 9[ hours !
! and the entire February, the second month of the year, is given as !
! period = 1 year
! phase = [1; 2[ month !
! because we count from January = 0, February = 1.
! !
Note that the ordinal numbers of the months and the weekdays ! depend on the selection of the epoch. A particularly useful epoch is ! January 1 1996 0:00 if one wants to start counting months and weekdays ! from 0-11 and 0-6 respectively, since January 1 1996 is a Monday. ! !
The Snowbird's address that is different between April 15 and ! September 15 than over the Winter, can be expressed as !
! period = 1 year
! phase = [3.5; 8.5[ month
! --> Minneapolis
! period = 1 year
! phase = [8.5; 3.5[ month
! --> Arizona !
! The telephone numbers being different from 9:30 to 19:45 (office) and ! from 20:30 to 9:00 (home) can be expressed as: !
! period = 1 day
! phase = [9.5; 19.75] hour
! --> office: 1 317 630 7960
! period = 1 day
! phase = [20.5; 9] month
! --> home: 1 317 816 0516 !
! Note that the bounds of the periodic interval need not be in order, ! since the period is cyclic. ! !
This method can explain what a "time of day" and a "day of the ! week" are, independent from any particular calendar. The challenge of ! these period/phase expressions, however, is that the calendar units of ! year, month, day, hour, and minute are variable due to leap years, ! 28-29-30-31 days per month, and due to UCT leap seconds. So, the ! timing needs to be aligned to our calendar "grid", which makes the ! evaluation of those expressions subject to the same difficulty that we ! find translating a time given as epoch-duration into a calendar ! expression. In the example "5th day of every month", the period 1 ! month differs in duration from month to month, so, to calculate a ! series of points in time in a epoch-duration form, the calendar must ! be repeatedly asked for how many days are in any given cycle. ! !
Periodic Times as Sets
! !
More complex periodic times can be expressed based on the simple ! period/phase model. This is easy if we realize that periodic points in ! time and periodic intervals of time are special kinds of sets. Those ! sets are not continuous, meaing that they have "holes" between the ! repreating points or intervals of times. Also, those sets are ! infinite, as the periodic time will be defined along the entire time ! axis from prehistoric past to distant future. ! !
We can combine sets using the operations for union ( union ) and intersection () to form the complex ! specification of business hours shown in the introductory example: !
! ! ! ! !
Monday to Wednesday 08:00 to 16:00,
Thursday 08:00 to 21:00,
Friday 08:00 to 12:00,
! !
! ( (period = 1 week; phase = [0; 3[ day) ! ! (period = 1 day; phase = [8; 4] hour) ) ! union
! ( (period = 1 week; phase = [3; 4[ day) ! intersection ! (period = 1 day; phase = [8; 21] hour) ) ! union
! ( (period = 1 week; phase = [4; 5[ day) ! ! (period = 1 day; phase = [8; 12] hour) ) !
! If we were open every Second saturday from 8:30 to 14:00 we could ! "add" the term: !
! union ! ( (period = 2 week; phase = [6; 7[ day) ! intersection ! (period = 1 day; phase = [8.5; 14] hour) ) !
! !
A few special cases are quite hard to express in our simple ! period/phase forms. For example, take another Snowbird who is out of ! Arizona between April 15 and October 15 and consider we need to be ! exact to the minute with this timing. While April has 30 days and thus ! April 15 cuts the month into half, the phase for April 30 is 3.5 ! month, which starts the Minneapolis season in the early morning of ! April 15 at 0:00 midnight. For October 15, we would like to use the ! phase 9.5, however this yields October 15 at 12:00 noon as the end of ! the Minneapolis period. For February 15 a phase of 1.5 would come out ! one entire day early February 14, 0:00 midnight on non-leap years and ! 12:00 noon on leap years. ! !
The problem is that the calendar month is not constant, so that we ! are sometimes off by a few days if we choose the month as the phase of ! our periodic time expressions. If we chose the day of the year, we ! could average our error of one day over the entire year, but we would ! not be exact to the hour let even to the minute. We could concatenate ! our periodic time expressions, however, to successivly narrow down the ! exact start and end of the period: !
! ( (period = 1 year; phase = [104; 288[ day) ! intersection ! (period = 1 month; phase = [14; 15[ day) ! union ! (period = 1 year; phase = [105; 287[ day) ) !
! This means, we first select an approximate range starting at April 15 ! on non-leap years and April 14 on leap years, and ending at October 16 ! on non-leap years and October 14 on leap years. That way we know that ! our maximal range is covered every year. Then we cut out the precise ! day, which is the entire 15th day every month (starting at 0.) This ! cuts out all times from the range but the 15th day of every month from ! April to October. The final expression "adds" the range from April 16 ! to October 15 (non-leap years) and April 15 to October 14 (leap years) ! back in. ! !
An other case where our period/phase expressions are difficult to ! use is for Mother's day, i.e. the second Sunday in May. Since the week ! cycle is not aligned to the month and year cycle, we have a hard time ! to figure out the second Saturday of a month. The following ! period/phase expression describes Mother's day: !
! ( (period = 1 year; phase = [4; 5[ month) ! intersection ! (period = 1 month; phase = [7; 14[ day) ! ! (period = 1 week; phase = [6; 7[ day) ) !
! First we set the scope into the entire month of May (Jan = 0, ..., May ! = 4), then we narrowed into the second 7 day period, under the ! assumption that the second Saturday of the Month will always fall in ! the second 7-day period. Finally we take the period to be the week ! cycle and choose Sunday. This will select the second Sunday in May. ! !
The last two examples show that our period/phase expressions are ! sometimes quite difficult to use. Note that our method did not fail in ! these examples, it was only difficult to apply. However, as of yet we ! have no mathmatically strong proof that there could never be an ! expression in some calendar that our method could not cover using a ! finite term of period/phase expressions concatenated by union and ! intersection operations. ! ! !
Literal Expressions for Periodic Times
! !
The period/phase expressions for periodic times are ! conceptualizations of periodic times that are independent from any ! representation of calendar dates. Given that such periodic times are ! supposed to be aligned to some calendar, our conceptulaization is ! still independent from any special calendar, and thus can be used with ! our Gregorian calendar as well as with the Hebrew, Islamic, Japanese ! Imperial, or other calendars. ! !
Alas, these expressions are not suited to be shown or entered by ! humans and require considerable thought to be implemented ! correctly. Therefore a good literal expression is even more important ! for periodic times as it is for simple points in time. ! !
Most of the nifty methods the industry came up with to specify the ! timing of periodic events are based on some representation (not on an ! abstract model such as our period/phase congruences.) Yet there seems ! to be no one representation that is widely accepted as a standard. ! [This statement is informed by a non-extensive research of the topic ! on the Internet. The most prominent such standards seems to be the UNIX "crontab" file format, and the work by the ! IETF Working Group on Calendaring and Scheduling, RFC 2445 "Recurrence ! Rule".] ! !
It is therefore justified to craft a representation for periodic ! times for HL7, that is not based on any one such standard. HL7 has its ! own legacy that has proven useful. Especially the time stamp (TS) ! literal representation form for points in time has its merits to be ! simpler and more uniform than the many optional forms suggested by ! ISO 8601. ! !
Our representation of periodic times will be based on the literal ! form of points in time. The general approach is that one can use a TS ! pattern to specify points in time conforming to a certain set of ! periodic points in time. The simplest such pattern being the removal ! of digits from the left to yield forms similar to the old HL7 TM data ! type. Nevertheless we will take ideas from existing specifications ! where appropriate. ! !
For example, where "199907121555" is a point in time (precise to ! the minute), "1555" is just the time of day "component" of that ! time. This maps to the period = 1 day, phase = 15.916. ! !
Note that the expression "1550" could as well be an TS precise to ! the year, since there is no way to distinguish what the value of the ! digits are. So, if we allow for leftmost digits to disappear, we need ! to either replace them by some other character to fix the position of ! the remaining digits (e.g., "########1550",) or we need to tag the ! remaining digits to indicate what they are (e.g., "H1550", where "H" ! stands for hour, the value of the following pair of digits.) ! !
The crontab file format also defines ! periodic points in time using a pattern approach on the calendar and ! crontab uses a positional identification of the meaning of the ! components. For each of the calendar cycles: minute of hour, hour of ! day, day of month, month of year, and day of week, crontab allows to ! set individual values, repeat ranges, and step values. ! !
Th HL7 TQ data type on the other hand, used a tagged approach to ! say "Q2H" for every two hours. However, this did not allow to specify ! the 55th minute of every hour. ! !
The following is the EBNF specification of the literal expression ! for periodic points and intervals of time: !
!
!
A period identifier is a short one or two letter code for a ! calendar cycle. Period identifier come in three forms: (1) continuous, ! (2) ordinal, and (3) implicit. A continuous period is measured from ! some initial date (e.g., the epoch or an order start date) and is not ! bound to the larger calendar cycles. For example, if something is to ! happen strictly every other day regardlesss whether months are 30 or ! 31 days long one would use a continuous period. Continuous periods are ! formed using the letter C before the period identifier ! (e.g., CD for continuous day.) ! !
An ordinal period identifier is aligned to the larger calendar ! cycles. For example, if something is to happen on every odd day of the ! month (1, 3, 5, ..., 27, 29, (31)) an ordinal period is used. Ordinal ! periods are specified using two period identifiers, one for the period ! in which to count and another for the larger period which we want to ! align to, (e.g. DM ordinal day of the month, ! DW ordinal day of the week.) Ordinal periods are counted ! from either 0 or 1 depending on the customs of the calendar. For ! example, in the western calendar day of the month and month of the ! year is usually counted from 1, while hour of the day and minute of ! the hour is counted from 0. ! !
Implicit periods are those periods identified by the one letter ! period code, because it is so common to use it in either the ! continuous or the ordinal sense. For example, the year is counted ! continuously because there is no larger cycle (except for decimale ! multiples decade and century, which are not real ! calendar cycles.) Weeks are usually counted in a continuous way ! (i.e. not aligned to the calendar year,) while most other calendar ! cycles are aligned to each other (month-day-hour-minute-second.) !
! !
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
Period Identifiers in the Gregorian (western) Calendar
implicit two-letter meaning starts with digits
Y CY year 0 4
M MY month of the year 1 (January) 2
CM month (continuous) 0
W CW week (continuous) 0
WY week of the year 1 2
D DM day of the month 1 2
CD day (continuous) 0
DY day of the year 1 3
J DW day of the week 1 (Monday) 1
H HD hour of the day 0 2
CH hour (continuous) 0
N NH minute of the hour 0 2
CN minute (continuous) 0
S SN second of the minute 0 2 with '.' and decimales
CS second (continuous) 0
!
! !
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
Examples
Paradigmatic examples
M09 September
MY09 September (using explicit ordinal ! two letter code)
M0915 September 15
M091516 September 15 at 4 PM
M09151630 September 15 at 4:30 PM
M0915163044.12 September 15 at 4:30:34.12 PM
M01,03,07 January, March, and July
M/2 every even month
M/2%1 every odd month
M04-09 April to September
M04-09/2 every second month from april to september
J6 Saturday
J1,3,4 Monday, Tuesday, Thursday
J/2 Tuesday, Thursday, Saturday
J/2%1 Monday, Wednesday, Friday, Sunday
J1-5 Monday to Friday
J1-5/2%1 Monday, Wednesday, Friday
W/2 every other week
W/2 J6 every other Saturday
WY20 the 20th week of the year
WM2 the second week of the month
DY128 the 128th day of the year
WM2 J6 Saturday of the 2nd week of the ! month
M05 WM2 J6 Saturday of the 2nd week of May ! month
M05 DM8-14 J6 Mother's day
Examples from above
W/2 J2 every other Tuesday
J2 every Tuesday
J1 H0000 Monday 0:00
J1 H0900 Monday 9:00
J4 H1200 Thursday 12:00 noon
J1 H0900-1500 Mondays 9:00 to 15:00
D050000 5th day of the month at 0:00
D05 entire 5th day of the month from 0:00 to 24:00
H0800-0900 time of day 8:00 to 9:00 is given as
M02 entire month of February
! M0415 --> Minneapolis ! M0915 --> Arizona The Snowbird's address that is different between April 15 and ! September 15 than over the Winter
! H0930-1945 --> office ! H2030-0900 --> home telephone numbers from 9:30 to 19:45 (office) and from 20:30 to ! 9:00 (home)
J1-3 H0800-1600 + ! J4 H0800-2100 + ! J5 H0800-1200 business hours shown in the introductory example
W/2 J6 H0830-1200 every other Saturday from 8:30 to 14:00
M04150000-10142400 Snowbird who is out of Arizona between April 15 0:00 and ! October 14 24:00 exact to the minute
!
! !
The literal expression provides a short human-comprehensible ! notation for periodic time points and intervals. A computer can ! translate the literal notation into the period/phase notation ! relatively easily. The Snowbird example is much simpler and more ! precise to state in the literal form than in the period/phase ! form. Mother's day, however, still comes with the same ! difficulty. Note that the second Saturday of the month is not the same ! as Saturday of the second ordinal week of the month (e.g. if the month ! starts on a Sunday, the second saturday is in the third ordinal week ! of the month.) ! !
Besides defining periodic cycles, a calendar defines irregular ! events, such as holidays. For office hours, one sometimes need to ! refer to holidays. Even though, there are rules to determine holidays ! for a calendar, some involving even more cycles (e.g., Easter and the ! phases of the moon,) we can not expose these rules in our time ! expressions. Rather we need to have a shorthand way to refer to ! holidays. We do so using the period identifiers starting with 'J'. The ! following codes are defined for the Gregorian calendar: ! !
! ! ! ! ! ! !
Period Identifiers for Holidays
JH holiday
JH'EAS' the Easter holiday
JB regular business day (Monday to Friday ! excluding holidays)
JE regular weekend (Saturday and Sunday ! excluding holidays)
!
! !
For example, opening hours may be every second Saturday from 8:30 ! to 14:00 if that Saturday does not fall on a holiday: ! "JE J6 H0830-1400". ! !
Holidays can be named using a code in single quotes. Such holiday ! codes are highly localized and should be defined locally. Holiday ! rules can involve a second non-business day the Friday before or the ! Monday after a holiday that falls on a weekend. Those holidays should ! also be coded in order to avoid such complexity of holiday rules to be ! exposed in the literal expression. For the U.S. the following holiday ! table is defined: ! !
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
Codes for U.S. Holidays !
XME Christmas Eve
XMS Christmas
NEW New Year
GFR Good Friday
EAS Easter
PEN Pentecoste
PRE Presidential Day
MEM Memorial Day
MEM5 Friday before Memorial Day Weekend
MEM1 Monday after Memorial Day Weekend
JL4 4th of July
JL45 Friday before 4th of July Weekend
JL41 Monday after 4th of July Weekend
LBR Labor Day
LBR1 Friday before Labor Day Weekend
LBR5 Monday after Labor Day Weekend
!
! !
Other Friday-before and Monday-after codes may need to be added, ! but can be constructed in the stereotypical way by appending a 5 for ! Friday and a 1 for Monday. Other countries, and, as in Germany, ! different states or provinces will have other holidays. Most western ! countries share the major christian holidays, which need not be ! redefined. ! ! ! !
4.6.5 Other Issues and Curiosities About Time
! !
"I got sick at my birthday, about 20 years ago," is an expression ! that we might want to capture. One possible representation for this ! time would be "yyyy0219" if my birthday is February 19th and if yyyy ! is constrained to this year - yyyy is approximately 20 ! years. If from another source we gather that I got sick in "1976", but ! don't know the exact month and day, then we can conclude that I got ! sick in "19760219", because 1998 - 1976 = 22. This seems a somewhat ! rare use case, but definitely worth considering. ! !
"I got that cough in spring," might lead us to adjust probabilities ! for pollen allergy. The season of the year is of interest in ! epidemiology. Bob Dolin, in his JAMIA Article on Modeling the ! temporal complexities of symptoms, suggests accounting for ! "season" in time expressions. The difficulty here is that seasons ! depend on the geographical latitude and we can not infer the season ! from the month of the year. January is Summer in Australia, South ! Africa, Chile, and Argentinia while northern folks assume that January ! is the worst part of the Winter. Moreover, at the equator there are ! not the usual four seasons, however, in tropical regions, there is the ! Monsun season, which may be considered one of two seasons, or a fifth ! season. Rather than refering to the season symbolically one should ! attempt to capture an uncertain date with a variance of about two ! months. ! !
Noteworthy references on time expressions are CEN TC251's ENV 12381 ! Health care informatics; time standards for health care specific ! problems and the ARDEN Syntax. Those two standards not only ! define relations and operators on time values but also on events and ! episodes which are related in time. +
Relative times of the semantics NOW + duration offset stick out as + the most prominent feature defined by those and other time related + standards. We might thus consider the ability to specify relative + time. Some conventions use expressions like "t-1" to mean + "yesterday". Relative time expressions are of the data type point in + type, but the exact value depends on a parameter (the actual time) + specified elsewhere. However, the use case of relative time in data + communication and data storage seems unclear, since one needs to + fixate NOW at some point. *************** *** 8591,8596 **** --- 9827,9833 ---- HREF="#TypeConversion">implicit type conversion. This method virtually "overlays" extended types on top of the base types. +
5.1 Interval
*************** *** 8598,8608 **** --- 9835,9845 ----

! Interval
! Generic data type that can express a ranges or intervals of values. An interval is a set of consecutive values of any totally ordered data type. An interval is thus a continuous subset of its base data type.
*************** *** 8618,8624 **** --- 9855,9861 ---- *************** *** 8657,8662 **** --- 9894,9913 ---- boundary. For a boundary to be closed, a finite boundary must be provided, i.e. unspecified or infinite boundaries are always open. + + + +

! Interval (IVL)
! Generic data type to express a range of values of the base data type. An interval is a set of consecutive values of any totally ordered data type. An interval is thus a continuous subset of its base data type.
OrderedType Any ordered type can be the basis of an interval. It does not matter ! whether the base type is discrete or continuous or whether any algebraic operators are defined for that type.

OrderedType Any ordered type can be the basis of an interval. It does not matter ! whether the base type is discrete or continuous, or whether any algebraic operators are defined for that type.

width dif(T) required
+ mostly derived + For base types with a difference operation, the width is the + difference between high and low boundary. When both boundaries are + known, width is a derived value. When one bounday and the width is + known, the other boundary is also known. When no boundary is known, + the width may still be known. For example, one knows that an activity + takes about 30 minutes, but one does not need to know when that + activity is carried out. For a pure ordinal base type without a + difference operation, the width is the cardinality of the interval. +

Ranges or intervals of values are most abundant as ranges of *************** *** 8689,8695 **** circumstances (e.g. an order scheduled to begin at 3:15 and end at 4 o'clock);
! one single unknown value supposed lie within the range of values given (e.g. a measurement which turns out to be off the lower absolute limit and therefore can be reported only as a range with an upper boundary); --- 9940,9946 ---- circumstances (e.g. an order scheduled to begin at 3:15 and end at 4 o'clock);
! one single unknown value supposed to lie within the range of values given (e.g. a measurement which turns out to be off the lower absolute limit and therefore can be reported only as a range with an upper boundary); *************** *** 8742,8747 **** --- 9993,10018 ---- boundary is an infinity or unknown, the interval can not be closed at that boundary. +
In order to treat incomplete information uniformly we must + accomodate the case where only the width of an interval is known + whereas both boundaries are unknown. Otherwise we would force one case + of incomplete information to be represented by a different data type, + and thus a different dependent attribute, which would force the + constraints of dependency between the interval and its width to be + handled outside. This would violate the rule of encapsulation. + +
The fact that the width is kept as a component of the interval + illustrates once more that data type components in this specification + are semantic components and not components of any + particular representation. This means that if a representation of an + interval is based on low and high boundary, the width will only be + made explicit in the exceptional case where both (!) boundaries are + undefined. Another representation may be based on low boundary and + width, in which case the high boundary will only be sent in the + exceptional case where low boundary and width are undefined. Every + representation will have to deal with one such exceptional case + though. +
Although, we do distinguish between surface form and semantic components with intervals as with any other data type, we specify a character string literal form for interval expressions that is tuned *************** *** 8800,8806 ****

[n,m] [n; m]
(Interval :low n --- 10071,10077 ----

n - m [n; m]
(Interval :low n *************** *** 8809,8817 **** --- 10080,10124 ---- :highClosed #true)

n -< m [n; m[ (high open)
+ (Interval :low n + :lowClosed #true + :high m + :highClosed #false) +
n >- m ]n; m] (low open)
+ (Interval :low n + :lowClosed #false + :high m + :highClosed #true) +
[ w ]
+ (Interval :width w) +
+
Note that the column headed "interval form" does not define + literals. Note also that literal forms of multiple different data + types are not designed to be intermixed in a single expression. If + they are, the literals need to be tagged by the data type. +
As always, various constraints can be made on data types. I.e., the components of the interval data structure can be constrained to certain allowable values or combinations of values only. As a notable *************** *** 8819,8824 **** --- 10126,10165 ---- value would have to have an unknown (or infinite) boundary at one side. + +
5.1.1 Intervals as Sets - The Notion of Set Revisited
+ +
Intervals are continuous sets of elements of the base data + type. Thus intervals have a relationship with set-collections. Discrete intervals can be + converted into an enumerated set-collection. We thus have to revisit + our notion of set as defined initially. A set is no longer just an + enumerated collection of discrete unordered elements. The various + kinds of sets are described by the following taxonomy: +
+ +
+
set-collection (finite, discrete, enumerated set) +
interval (continuous ordered subset) +
+
finite countable interval (e.g., integers 1-3) +
unbounded infinite countable interval (e.g., all integers) +
partially bounded infinite countable interval (e.g., integers > 3) +
totally bounded infinite uncountable interval (e.g., real 0.0 - 1.0) +
+
periodic point in time (sparse, infinite, discrete, ordered subset + of point in time) +
periodic interval of time (sparse, infinite, partially continuous, + ordered subset of point in time) alternatively: set of interval of + point in time. +
set derived from other sets through set operations (union, + intersection.) +
+ +
At this point all of the above mentioned kinds of sets are defined, + except for the general derivative set that is specified as a set + algebra term from other sets.
+ *************** *** 8828,8853 **** comments. Up until now, there is no such construct for HL7 version 3. The NTE segment was a very useful construct to communicate information that can not be communicated otherwise. NTE segments ! usually contain free text, meant to be shown to human users. Th v2 NTE ! segments had the disadvantage that they would occur only at certain ! places in the message. A comment in an NTE segement was scoped to ! parts of the message structure, however, the scope could not be narrowed down to the level of a single data element or component.
The following generic type for annotations can be overlayed over a value of any other data type. An implicit conversion rule exists that will convert any annotated T to a T at the ! receiver side. !

! Annotated Information
! Generic data to give allow arbitrary free text annotations for any ! message element instance.

GENERIC TYPE --- 10169,10196 ---- comments. Up until now, there is no such construct for HL7 version 3. The NTE segment was a very useful construct to communicate information that can not be communicated otherwise. NTE segments ! usually contain display data, meant to be shown to human users. Th v2 ! NTE segments had the disadvantage that they would occur only at ! certain places in the message. A comment in an NTE segement was scoped ! to parts of the message structure, however, the scope could not be narrowed down to the level of a single data element or component.
The following generic type for annotations can be overlayed over a value of any other data type. An implicit conversion rule exists that will convert any annotated T to a T at the ! receiver side.
!

! Annotated Information (ANT)
! Generic data to give allow arbitrary display data annotations for any ! message element instance. An annotation can not change the ! meaning of the annotated value and must not be used when the value ! would be wrong without the annotation.

GENERIC TYPE *************** *** 8875,8887 **** The information itself.

note ! Free Text required ! The annotation as free text to be eventually displayed to a user or administrator.

--- 10218,10230 ---- The information itself.

note ! Display Data required ! The annotation as display daya to be eventually displayed to a user or administrator.

*************** *** 8897,8903 **** to human users. For instance, a lab value might be sent annotated, in which case the medical record user interface program might shows a little marker in the respective cell of the flowsheet. When the user ! clicks on that mark, a text box pops up that displays the free text annotation.
However, annotations in version 2 NTEs were sometimes used like a --- 10240,10246 ---- to human users. For instance, a lab value might be sent annotated, in which case the medical record user interface program might shows a little marker in the respective cell of the flowsheet. When the user ! clicks on that mark, a text box pops up that displays the display data annotation.
However, annotations in version 2 NTEs were sometimes used like a *************** *** 8941,8948 **** elements, we should drive a use case analysis from there suggesting improvements to the RIM. -
Disclaimer: we will get back to this as an open issue. -
5.3 The Historical Dimension
--- 10284,10289 ---- *************** *** 8965,8971 ****

! History
--- 10306,10312 ----

! History (HIST)
*************** *** 9001,9007 **** !

! History Item
--- 10342,10348 ---- !

! History Item (HXIT)
*************** *** 9034,9058 **** The information itself.

validity period Interval<Point in Time> required ! The time interval the given information was, is, or is expected to be ! valid. The interval can be open or closed infinite or undefined on ! either side.

-
When no validity period is known, it does not make sense to send a - history item for the information, therefore, both components are - required. However, an interval can be defined open and undefined or - infinite on both sides. This should not be done unless in a case where - infinite or undefined validity periods are semantically justified. - - -
5.4 Uncertainty of Information
--- 10375,10392 ---- The information itself.

validity period General Set<Point in Time> required ! The set of time the given information was, is, or is expected to be ! valid. This set of time can be a simple interval of time or a periodic ! point or interval of time for cyclic events. The interval can be open ! or closed infinite or undefined on either side.

5.4 Uncertainty of Information
*************** *** 9150,9156 **** and is responsible for them. This shows that there is not one correct proability that would "objectively" qualify any given statement. !
When this newly drafted value-probability-pair is communicates further along to someone else, the sender may or may not quote both of his input-statements plus his own conclusion. In any case, the receiver of that information would again penalize and combine what he --- 10484,10490 ---- and is responsible for them. This shows that there is not one correct proability that would "objectively" qualify any given statement. !
When this newly drafted value-probability-pair is communicated further along to someone else, the sender may or may not quote both of his input-statements plus his own conclusion. In any case, the receiver of that information would again penalize and combine what he *************** *** 9183,9188 **** --- 10517,10558 ---- this data type model, since attribution is modeled properly in the RIM. +
New issues that the editor believes are much more important
+ +
A much more important open issue is the relationship between sets, + bags, intervals and periodic sets and uncertainty. It appears as if + general notion of a set can be used where multiple possible values + exist without any particular probability distribution. This would + translate to the uniform probability distribution over the set. The + question is whether the data type definitions for probability + distributions should not be better aligned to the notion of sets. + +
An second related issue is the fact that we sometimes want to use a + probability distribution (parametric or non-parametric) in order to + describe a frequency distribution. Sometimes laboratory observations + on population samples are reported in such a "consolidated" way using + histograms. Although the distinction between "probability" and + "frequency" is blur, the wording in this specification may need to be + changed to invite the probability constructs to be used for + frequencies as well. + +
A third related issue is whether we want to support other "weights" + of certainty and importance that have become well-known in the + decision support community. Examples are the weights of logistic + regression and neural nets, all kinds of plausibility measures + (Dempster-Shafer possibilities, Fuzzy membership functions, + Shortliffe's certainty factors, etc.), and the heuristic numbers used + in Internist-I/QMR (evoking strength, frequency, import), or Medcin + and others. + +
A third related issue is that the probability distributions and + especially the parametric probability distribution can be used to + describe distribution quantities other than probabilities. For + example, a probability distribution "multiplied with" a flow rate may + describe the setting of a ventilator. Should we extend our definition + to embrace quantities that are neither probabilities nor frequencies + nor any other uncertainty measure? + *************** *** 9194,9200 ****

! Uncertain Discrete Value using Probabilities (UDV-P).
--- 10564,10570 ---- ! !

! Uncertain Discrete Value using Probabilities (UDVP).
*************** *** 9227,9234 **** The value to which a probability is assigned.

probability Floating Point ! Number
0.0 to 1.0. required The probability assigned to the value. --- 10597,10603 ---- The value to which a probability is assigned.

probability Real Number
0.0 to 1.0. required The probability assigned to the value. *************** *** 9286,9292 ****

! Non-Parametric Probability Distribution
--- 10655,10661 ----

! Non-Parametric Probability Distribution (NPPD)
*************** *** 9324,9333 ****

-
Type cast rules allow conversion between and uncertain discrete - value using probabilities and non-parametric probability distribution - and vice versa. -
The values in a discrete probability distribution are generally considered alternatives. It is understood that only one of the possible alternative values may truly apply. Because we may not know --- 10693,10698 ---- *************** *** 9361,9366 **** --- 10726,10739 ---- green and blue (or magenta, yellow, and cyan in subtractive color-mixing). +
Type cast rules allow conversion between a singular uncertain + discrete value using probabilities and non-parametric probability + distribution and vice versa. + +
A bag-collection can be cast to a non-parametric probability + distribution, where the probabilities for each item of the bag are the + quotient of the count of that item devided by the size of the bag. +
An example for a discrete probabilities would be a differential diagnosis as a result of a decision support system. For instance, for a patient with chest discomfort, it might find the following *************** *** 9429,9435 ****

! Parametric Probability Distribution
--- 10802,10808 ---- --- 10832,10838 ---- Any ordered type (anything that is unambiguously mapped to numbers) can be the basis of an uncertain quantity. Examples are Integer Number, ! Real Number, and PhysicalQuantity. *************** *** 9541,9547 **** ! --- 10914,10920 ---- ! *************** *** 9564,9570 **** observed. ! --- 10937,10943 ---- observed. ! *************** *** 9587,9593 **** occurs. ! --- 10960,10966 ---- occurs. ! *************** *** 9710,9719 **** distribution. ! ! --- 11083,11092 ---- distribution. ! ! *************** *** 9812,9821 **** ! ! --- 11185,11194 ---- ! ! *************** *** 9837,9846 **** the curve. ! ! --- 11210,11219 ---- the curve. ! ! *************** *** 9959,9966 ****
It would be awesome if we could define and implement an algebra for uncertain quantities. However, the little statistical understanding ! that I have tells me that it is a non-trivial task to tell the ! distribution type and parameter from a sum, or product of two distributions or from the inverse of a distribution. --- 11332,11339 ----
It would be awesome if we could define and implement an algebra for uncertain quantities. However, the little statistical understanding ! that I have tells me that it is a non-trivial task to know the ! distribution type and parameters of a sum, or product of two distributions or from the inverse of a distribution. *************** *** 9972,9978 ****

! Parametric Probability Distribution (PPD)
*************** *** 9459,9465 **** Any ordered type (anything that is unambiguously mapped to numbers) can be the basis of an uncertain quantity. Examples are Integer Number, ! Floating Point Number, and PhysicalQuantity.

n > 1

p probability of success Float p between 0 and 1

n > 1

p probability of success Real p between 0 and 1

p probability of success Float p between 0 and 1

p probability of success Real p between 0 and 1

p probability of success Float p between 0 and 1

p probability of success Real p between 0 and 1

alpha Float alpha > 0

beta Float beta > 0

E mean

alpha Real alpha > 0

beta Real beta > 0

E mean

µ mean of the resulting normal distribution Float

sigma standard deviation Float

E mean of the original skewed distribution

µ mean of the resulting normal distribution Real

sigma standard deviation Real

E mean of the original skewed distribution

alpha Float alpha > 0

beta Float beta > 0

E mean T

alpha Real alpha > 0

beta Real beta > 0

E mean T

! Uncertain Value using narrative expressions of confidence
--- 11345,11351 ---- ! --- 11364,11370 ---- ! *************** *** 10049,10055 **** not.)
Only in cases where no numeric probabilities are available ! (e.g. coding of narratives) is should the narrative expressions of confidence be used. --- 11422,11428 ---- not.)
Only in cases where no numeric probabilities are available ! (e.g. coding of narratives) should the narrative expressions of confidence be used. *************** *** 10064,10070 ****

! Boolean
A boolean value is the domain of two valued logic: either true or false tertium non datur and all the stuff everyone should --- 11437,11443 ----

! Boolean (BL)
A boolean value is the domain of two valued logic: either true or false tertium non datur and all the stuff everyone should *************** *** 10073,10079 **** oriented data analysis.
! No Information
A No Information value can occur in place of any other value to express that specific information is missing and how or why it is --- 11446,11452 ---- oriented data analysis.
! No Information (NULL)
A No Information value can occur in place of any other value to express that specific information is missing and how or why it is *************** *** 10081,10087 **** certain flavor of missing information.
! Character String
A character string is a primitive data type that contains Unicode characters. A single character is not considered an HL7 data --- 11454,11460 ---- certain flavor of missing information.
! Character String (ST)
A character string is a primitive data type that contains Unicode characters. A single character is not considered an HL7 data *************** *** 10091,10119 **** application layer is not supposed to deal with the peculiarities of different character encodings. !
! Multimedia Enabled Free Text !
Free text may be anything from a few formatted characters to complex documents or images. This data type is defined similar to the ! ED data type that in turn is based on the MIME standard.
! Technical Instance Identifier
Technical instance identifiers are unique and unravelable through the consistent and required use of the ISO OBJECT IDENTIFIER (OID).
! Technical Instance Locator
A technical instance locator is a reference to some technical thing (e.g., image, document, telephone, e-mail box, etc.) It is a generalization of the well-known URL concept.
! Postal and Residential Address
This Address data type is used to communicate postal addresses and --- 11464,11492 ---- application layer is not supposed to deal with the peculiarities of different character encodings. !
! Display Data (DD) (was: Free Text, FTX) !
Display Data may be anything from a few formatted characters to complex documents or images. This data type is defined similar to the ! HL7 v2.3 ED data type that in turn is based on the MIME standard.
! Technical Instance Identifier (TII)
Technical instance identifiers are unique and unravelable through the consistent and required use of the ISO OBJECT IDENTIFIER (OID).
! Technical Instance Locator (TIL)
A technical instance locator is a reference to some technical thing (e.g., image, document, telephone, e-mail box, etc.) It is a generalization of the well-known URL concept.
! Postal and Residential Address (AD)
This Address data type is used to communicate postal addresses and *************** *** 10123,10129 **** HREF="#AddressPart">Address Parts.
! Person Name
This type used in the RIM class Person_name that will be developed --- 11496,11502 ---- HREF="#AddressPart">Address Parts.
! Person Name (PN)
This type used in the RIM class Person_name that will be developed *************** *** 10134,10140 **** culturally.
! Organization Name
A collection of organization name --- 11507,11513 ---- culturally.
! Organization Name (ON)
A collection of organization name *************** *** 10143,10156 **** purpose or at a different time.
! Code Value
A code value is used to refer to technical concepts and is also the basic building block for construcing more complex concept descriptors for real world concepts.
! Concept Descriptor
Concept descriptors are the way to refer to real world concepts (e.g. diagnoses, procedures, etc.). Just as with the old CE data type --- 11516,11529 ---- purpose or at a different time.
! Code Value (CV)
A code value is used to refer to technical concepts and is also the basic building block for construcing more complex concept descriptors for real world concepts.
! Concept Descriptor (CD)
Concept descriptors are the way to refer to real world concepts (e.g. diagnoses, procedures, etc.). Just as with the old CE data type *************** *** 10163,10191 **** a multi axial codeing system and vice versa.
! Integer Number
Embody the usual concept of integer numbers. Integers are used almost only for counts or values derived from counts by addition and subtraction. !
! Floating Point Number !
Embody the abstract concept of real numbers. Floating point ! numbers have a built-in notion of precision in terms of the number of ! significant decimal digits.
! Ratio of Quantities
A quotient of any two quantities. Quantities currently defined are

Integer Number !
! Floating Point Number
Physical Quantity --- 11536,11564 ---- a multi axial codeing system and vice versa.
! Integer Number (INT)
Embody the usual concept of integer numbers. Integers are used almost only for counts or values derived from counts by addition and subtraction. !
! Real Number (REAL) !
Embody the abstract concept of real numbers. Real numbers have a ! built-in notion of precision in terms of the number of significant ! decimal digits.
! Ratio of Quantities (RTO)
A quotient of any two quantities. Quantities currently defined are

Integer Number !
! Real Number
Physical Quantity *************** *** 10199,10205 ****

! Physical Quantity
A physical measurement with units. --- 11572,11578 ----

! Physical Quantity (PQ)
A physical measurement with units. *************** *** 10209,10279 ****
An amount of money in a certain currency unit.
! Point in Time
A difference scale quantity in the physical dimension of time. Usual expressions of points in time are made based on calendars, which are quite complex "coordinate systems" for time. This is basically the old "TS" data type. !
! Calendar Modulus Expressions !
Expression of the form day-of-the-month, or day-of-the-week, ! month-of-the-year, hour-of-the-day, all have a common structure ! (x of the y). This data type is not yet defined. We may ! end up with one or many data types to cover what was called TM (time) ! or "week day" in HL7.
! Interval
Also called "range". A continuous subset of an ordered type. Intervals are expressed by boundaries of the base type. Boundaries may be undefined.
! Annotated Information
Whenever a sender feels that "there is more to say" about a data element, the annotation structure can be sent that contains the data element together with some free form annotation. The annotation is meant to be interpreted by humans. !
History
Generic data type that allows the history of some data element to be sent. A History is a list of History Items.
! History Item
A History Item can be used wherever a validity time (effective date/time, expiry data/time) is essential part of some data. Used primarily as the element of a History.
! Uncertain Discrete Value using Probabilities
A discrete value and an associated probability for that value to apply in a given context.
! Non-Parametric Probability Distribution
A collection of Uncertain Discrete Value using Probabilities to specify a probability distribution.
Parametric ! Probability Distribution
Contains mean, standard deviation and also a distribution type plus its parameters. This is useful, for example, to specify "precisely" the accuracy of a measurement or to specify results of clinical trials.
! Uncertain Value using Narrative Expressions of Confidence
A discrete value and a narrative expression of confidence for that value to apply in a given context. Those "narrative expressions" are --- 11582,11656 ----
An amount of money in a certain currency unit.
! Point in Time (TS)
A difference scale quantity in the physical dimension of time. Usual expressions of points in time are made based on calendars, which are quite complex "coordinate systems" for time. This is basically the old "TS" data type. !
! Periodic Point in Time !
A sparse set of points in time that describes periodic ! events, e.g., every Friday morning at 8 o'clock. ! !
! Periodic Interval of Time ! !
A sparse set of time intervals that describes periodic ! events with a duration, e.g., every Friday morning from 08:00 ! 09:30.
! Interval (IVL)
Also called "range". A continuous subset of an ordered type. Intervals are expressed by boundaries of the base type. Boundaries may be undefined.
! Annotated Information (ANT)
Whenever a sender feels that "there is more to say" about a data element, the annotation structure can be sent that contains the data element together with some free form annotation. The annotation is meant to be interpreted by humans. !
History (HIST)
Generic data type that allows the history of some data element to be sent. A History is a list of History Items.
! History Item (HXIT)
A History Item can be used wherever a validity time (effective date/time, expiry data/time) is essential part of some data. Used primarily as the element of a History.
! Uncertain Discrete Value using Probabilities (UDVP)
A discrete value and an associated probability for that value to apply in a given context.
! Non-Parametric Probability Distribution (NPPD)
A collection of Uncertain Discrete Value using Probabilities to specify a probability distribution.
Parametric ! Probability Distribution (PPD)
Contains mean, standard deviation and also a distribution type plus its parameters. This is useful, for example, to specify "precisely" the accuracy of a measurement or to specify results of clinical trials.
! Uncertain Value using Narrative Expressions of Confidence (UVN)
A discrete value and a narrative expression of confidence for that value to apply in a given context. Those "narrative expressions" are *************** *** 10281,10283 **** --- 11658,11662 ---- chance of", etc.
+ +

! Uncertain Value using narrative expressions of confidence (UVN)
*************** *** 9991,9997 **** description

T Any data type that is allowed here, discrete or continuous.
description

T Any data type that is allowed here, discrete or continuous.

set	an unordered collection of unique element type instances.
bag	an unordered collection of element type instances. Instances may ! occur more than once in the bag.
list	an ordered collection of element type instances.