V3DT conference call minutes for Mon, Nov 30, 1998.

The HL7 version 3 data type task group has had its seventh conference call on Monday, November 30, 1998, 11 to 12:30 AM EDT.

Attendees were:

Joann Larson,
Mike Henderson,
Stan Huff
Mark Shafarman,
Greg Thomas,
Mark Tucker,
Robin Zimmerman,
Gunther Schadow.

Agenda items were:

reconsider the rational data type (should we define "ratio of floating point numbers" instead?)
Measurements (physical quantities)
Currency (monetary quantities)
Time
Ordinals

QUANTITIES AND NUMBERS (continued)

AGENDA ITEM 1: reconsider the rational data type (should we define "ratio of floating point numbers" instead?)

In the last conference we defined a data type Rational Number as the composite of two Integer Numbers as numerator and denominator. Because we did not define rational numbers for their exactness but for their common representation, we now withdrew that decision and defined a data type "Ratio (of two quantities)" instead.

DATA TYPE "RATIO"

Note that this proposed data type has evolved from Rational Number. Some notes and comments there may still apply.

Ratio
A ratio quantity is a quantity that comes about through division of a numerator quantity with a denominator quantity. Ratios occur in laboratory medicine as "titers", i.e., the maximal dissolutions at which an analyte can still be detected.
component name	type/domain	optionality	description
numerator	Quantity	required default is 1	The numerator quantity.
denominator	Quantity	required must not be zero default is 1	The denominator quantity.

NOTES

What is a Quantity anyway?

A Quantity is a generalization of the following data types:

Integer Number
Floating Point Number
... other quantitative data types defined in this and the following conference(s).

This is a good use case why our data type model is to use inheritance, which allows us to assert generalization/specialization relationships between types. We will discuss the formal data type meta model as a wrap-up of our work shortly.

The use of ratios

HL7 v2.3 defined the data type "structured numeric" (SN) for various purposes. Among those purposes was to cater the need to express ratios that often occur as titers in laboratory medicine. A titer is the maximal dissolution at which an analyte can still be detected. Typical values of titers are: "1:32", "1:64", 1:128", etc. Powers of 1/2 or 1/10 are common. Sometimes titer results are written inconsistently as ratio or only as the denominator. Nevertheless, these are rational numbers. Although mathematically, rational numbers are exact, titers are a result of a measurement process, that can never be exact.

Thus, in theory, a titer of 1:128 could be reported as 0.0078125. However, noone would understand that result. One could recover the original ratio using the inverse of 10000000/78125 which is 128, but to do that, the receiver would have to know that the given number is to be represented as a ratio of 1/n.

Currently we do not know of any other use of rational numbers except for titiers. We do know, however, that blood pressure values, commonly reported as 120/80 mm Hg, are not ratios, but rather a composite of systolic and diastolic blood pressure.

AGENDA ITEM 2: Measurements (physical quantities)

All versions of HL7 v2.x had the data type "Composite Quantity with Unit" (CQ) defined. This data type, however, was not normally used in measurement observations (OBX). Instead, in an OBX you would send a numerical result (value type NM) and send the units in a separate OBX field. Moreover, units used to have different code tables depending on whether the CQ type or the OBX mechanism was used. We want to clean this up. It seems to be so natural to define a data type for measurements (or "dimensioned quantities") that many other standardization groups adopted (reinvented) this two component data type over and over again.

CEN TC251, WG 1, PT 26's first working document Health Informatics; Electronic Healthcare Record Communication; Part 1: Extended Architecture in table 25 [p. 52f] defines a type "quantity" as "A measurement expressed as a numeric value and unit of measurement" with the two component structure (value, unit).

The current draft 5 of CORBAmed's Clinical Observation Access Service (COAS) specifies an "MeasurementElement" that basically contains value and unit, however, the structure is slightly different.

DATA TYPE "MEASUREMENT"

Measurment
A measurement is a dimensioned quantity expressing the result of a measurement act. It consists of a value and a unit.
component name	type/domain	optionality	description
value	Floating Point Number	required	The magnitude of the quantity measured in terms of the unit.
unit	Concept Descriptor	required	The unit, which is a real world concept.

NOTES:

Units are mathematical structures, quite different from other vocabularies. Armed with a little bit of mathematics, dealing with units is much simpler than dealing with the usual medical concepts. Units are hard to attack with semantic networks, but easy to deal with in simple algebraical structures. A detailed description can be found here, as soon as I am permitted to distribute final drafts of my pending JAMIA article (in about one month).

Existing codes for units of measure are:

ISO 2955 (1983)
ANSI X3.50 (1986)
HL7 ISO+/ANSI+ (Clem McDonald), equals ASTM 1238 (Clem), equals HISPP MSDS CDT (Dean Bidgood, Wayne Tracy, based on Clem's ISO+).
I will submit a Unified Code for Units of Measures to either ANSI X3.50, ISO TC12, or as an HL7 defined code (probably maintained by Regenstrief, similar to LOINC). My code is much more complete, does not suffer from ambiguities, and has precise semantics.

Regardless of what coding system HL7 ends up recommending (or mandating) we will be able to accommodate this in the above defined structure.

Not all physical kinds of quantities (or dimensions) are applicalble in every use of the measurement data type. Subsets of units of measures are defined through the semantics of units and could be specified in either of three ways:

with a special code for kinds of quantities,
with a special expression language (similar to the units code itself),
with a paradigmatic unit to which a given unit must be convertible.

Ad. 1: Examples for a special code for kinds of quantities is the "property" code of LOINC. I.e. "TIME" for time durations (e.g., seconds)

Ad. 2: Examples for a special expression language is the way dimensions are commonly specified, "T" for time, "L" for length, "LT^-1" for velocity, "LT^-2" for acceleration and "LT^-2M" for force.

Ad 3: If an attribute "encounter duration" is defined as a measurement then one could give the paradigmatic unit "s" (second) in the definition of that attribute, meaning that every value of this attribute must be convertible to seconds. This would be true for all measurements with units such as minute, hour, day, and many more.

Our participants of Kaiser brought up the issue of medication units of application (e.g. tablet, capsule, vial, spray, etc.) My (Gunther's) strong oppinion is that those are not units of measure, because they are not quantities. While a metre is inherently a quantity (worth approx. 3.4 foot), a tablet or vial has no magnitude by itself. A given tablet, vial or spray may have properties, such as strength or volume, but those are different for any different tablet, vial or spray under consideration. Conversely, a metre does not have different quantitative properties, it is essentially a quantity. Tablet, vial, spray are not essentially quantitative items. Of course, you can count tablets (like you can count all kinds of things), of course, a tablet, as a physical body does have volume, length, width, and depth. But the essence of a tablet is its form and not any specific kind of quantity. Conversely the essence of a meter is a certain amount of length, the essence of a second is a certain amount of time, and the essence of a dollar is a certain amount of money (see below). Not every kind of an object is a candidate unit.

Stan Huff did not agree to this strong position. He gave the example of international Units (i.U.) as units that do not have a fixed magnitude associated with them.

International Units are arbitrary units defined for every analyte by some international organization IUPAC (?). Examples are i.U. for penicillin, insulin, streptokinase, urokinase, and other medications, but i.U. are defined for many enzymes, hormones and antibodies. The rationale for those units is twofold:

these are functional units that measure a certain biolochemical function rather than a specific molecule, because many slightly different molecules can carry out to the same biolochemical function;
the measurement process has so many parameters which all need to be standardized that it is not possible to come up with comparable units, standardized accross all analytes.

The units U (= 1 umol/min) and katal (= 1 mol/s) of catalytic activity try to be standardized for all enzymes. However, the measurement conditions still need to be standardized because 1 katal of Phosphofructokinase measured at pH 7.4, 37 degree Celsius, in a Ringer solution, with this much ADP and no 1,2-Bisphosphoglycerate present, is quite different from 1 katal of the same analyte measured at pH 7.5, 28 degree Celsius, in plain water with only that much ADP present.

International Units are still essentially quantitative concepts (they are defined for no other purpose than measuring quantities). This is quite differently with tablets, vials, and sprays.

The order/results committee will have to work out the specifics on the relationship between units of application and units of measures in its information model.

AGENDA ITEM 3: Currencies (monetary quantities)

Expressions of monetary amounts are of the same abstract form as physical quantities, i.e. a composite of a value and a unit (the currency unit). As with physical quantities, this composite can be regarded as a product (multiplication) of the value and the unit. As with we have submultiples of currency units (e.g., dollar and cent, pound and penny, mark and pfennig, rupee and paisa, etc.) Currencies appear to be just another dimension of measured quantities.

Expressions of monetary units and physical units may be mixed as in price expressions, such as 5 U.S. dollar (USD) per milliliter (price), or 20 USD per hour (salary). This is another reason to treat monetary units the same way as physical units.

The BIG difference between monetary and physical units is that, while "exchange rates" of physical units are pretty stable over many decades, the value of monetary units is negotiated differently each day in different places of the world. While and international inch is 2.54 centimeters exactly (since 1959), a U.S. dollar may (USD) be 1.795 deutsche mark (DEM) today and 1.659 DEM tomorrow. The same USD may be worth 1.795 DEM in New York and 1.801 DEM in Frankfurt (Germany) at the same time. This suggests handling currencies differently from physical quantities.

Current codes for units of measures do not include monetary units, although I could add a chapter in my UCUM specification to include monetary units.

The way this would work is that we would define an eighth base unit in addition to the seven existing base units. This would probably be the U.S. Dollar (if there is no United Nations world currency unit, similar to what the ECU used to be in Europe), or one troy ounce of gold -- traditionally used as the standard currency by the World Monetray Fund.

Lexically, the currency units would be treated just like any other unit. Semantically, however, their value would be taken from a dynamic table, which could be an interface connecting directly to the Wall Street or any Bank institution regarded as authoritative in any given realm.

Constraining measurements for certain attributes to only allow monetary quantities would be done just like measurements are constrained for other attributes (e.g., time).

Mark Shafarman anticipated some backpressure from HL7 members who would fear that monetary units would get lost among the physical units. Both Marks tended to favor to define a separate type for monetary quantities. I (Gunther) was open to either solution. Stan Huff favored the combining of monetary units and physical units. The consensus shifted towards the combining of monetary quantities in meaurements, which leverages the above mentioned advantages (simplicity of the approach, easy price expressions).

ISO 4217 is an international code for currency units. Although the standard text itself is copyrighted and available only for big bucks, the values themselves are freely useable and are listed here. This code does only cover the "major" currency units of each country, e.g. U.S. dollar but not cents, British pound but not penny, German mark, but not pfennig, Indian rupee but not paisa, etc. This shouldn't be a major problem, since most currency submultiples are 1/100 worth the major unit (yes the British turned towards a decimal system as well, no "shilling" any more; was 1/16 pound sterling.)

AGENDA ITEM 4: Time

We mentioned four general kinds of time-related quantities:

Point in time,
Interval (with a certain begin time and end time),
Duration (not fixed to a certain point in time),
Things like: time of day, day of week, week of year, week of month, every other Tuseday, etc.

Intervals of time are just like intervals of any other quantity. (e.g. 0 to 1, 1/16/1998 to 1/18/1998.) We will define a generic data type for intervals of all quantities in our next conference call.

Duration of time is a measurement of time, just as length is a measurement of space. No special data type is required.

DATA TYPE "POINT IN TIME"

Point in Time
A point in time is a scalar defining a point on axis of natural time. This naive concept of an absolute time scale is not concerned with relativity of time as is important in astrophysics and cosmology.
PRIMITIVE TYPE [see text]

NOTES:

The natural time scale is, almost like the temperature scales (Celsius or Farenheit), an interval scale (aka. difference scale). While the Celsius temperature scale defines a zero point at the freezing point of water and a standard degree as 1/100 of the boiling point of water, the Christian calendar defines the zero point at the birth of Christ, and the basic unit of time as the second. There are obvious problems with the determination of the zero point of the christian calendar, but the principle is the same.

Zero points on the natural time axis are chosen arbitrarily, and called the "epoche".

Many data type specifications for point in time are based on an epoche. Examples for epoches are: 1/1/1970 00:00:00 UCT on Unix, 1/1/1980 00:00:00 UCT on MS DOS, 31/12/1959 00:00:00 EST in the Regenstrief MRS, 10/15/1582 00:00:00 UCT in CORBA's COAS. Basic durations are seconds, milliseconds, microseconds, or nanoseconds measured from that epoche. This way of representing time is very simple. Although it is not easily human readable, it is very easy to compute with those standardized time values.

Traditionally the even flow of time is "convoluted" in many cycles defined by calendars. Such cycles are years, months, days, hours, minutes, seconds. Those cycles are not synchronized. Traditionally calendars have been define based on astronomical phenomena, however, calendar years, months and days are not attached directly to astronomical phenomena. The closest fit is the calendar day to the solar day, but the calendar month is definitely not the same as a lunar (synodal) month.

Humans communicate point in times as calendar expressions. Calendars are quite complex entities which are dependent on culture. Mark Shafarman reports that Bali uses 6 (?) different calendars (officially ?).

To account for the calendar problem, the basic Java library defines two classes: java.util.Date and java.util.Calendar. Date is defined as a point in universal coordinated time of the form epoche/duration (Java's epoche is 1.1.1900 00:00:00 UTC). Calendar is a generalization of a GregorianCalendar an potentially other calendars.

It is quite difficult to convert a calendar expression into an epoche/duration form. There are not just leap days (Feb. 29) added to leap years, but also leap seconds (added to leap days). The algorithms to determine leaps is difficult (leap year) or non-existent (leap second). The latter are taken from tables published in Astronomical Almanachs. But fortunately, conversion is done by most operating systems or the basic Java library.

Calendar expressions are for humans to understand and are therefore represented as character string literals. The semantic components of a calendar expression may be different from the components identifiable in a particular surface form.

Quite solid standards for expressions in the Gregorian calendar are HL7 v2.3's TS data type, and ISO 8601 (adopted in Europe as EN 28601). ASN.1's (ISO 8824) GeneralizedTime is based on ISO 8601 (with some constraints). HL7's TS format is, of course, used by ASTM 1238 as well and lives on in ANSI HISPP MSDS CDT's DateTime format. Although HL7's TS format and ISO 8601 are similar, they also have considerable differences.

For HL7 v3 it seems worthwile to consider adopting ISO 8601 [learn more about ISO 8601]. However, ISO 8601 has some "features" that may be considered a disadvantage. First of all, ISO 8601 has too many unnecessary alternatives. While a somewhat canonical date/time form is

YYYY-MM-DDThh:mm:ss

the dashes between the date components, the colons between the time components and the T between date and time components may as well be ommitted. The ommission of those characters brings about a form very similar to ASN.1 or HL7's TS. The way of handling precisions in TS of HL7 v2.3 (after v2.2) is to leave out the less significant digits as required. However, without the T between date and time, this would be ambiguous with certain other valid ISO 8601 forms. ISO 8601 allows ommision of the T by mutual agreement and only if no ambiguities are introduced -- a clause that is usually hard to enforce (and therefore harmful) in standards.

The W3C is considering a subset of ISO 8601 for adoption. W3C's subset requires the T between date and time.

Useful features of ISO 8601 that are not part of HL7's TS type are so called "ordinal dates" of the form

YYYY-DDD
YYYY-Www
YYYY-Www-D

These allow to specify a date as (1) the day of a year, (2) the week of a year, or (3) the week of the year plus the day of the week.

Moreover, ISO 8601 allows ommission of more significant components (the delimiter dash, colon, or T must occur in those cases). This changes the semantics of the expression from a point in time to a calendar modulo expression. For example "---2" means every tuseday, but subtle variations may have big impact on the meaning: "-W-2" means tuseday "of the current week" (whatever this means).

Both, HL7's TS and ISO 8601 handle time zones through offsets of the form "+hh:mm" or "-hh:mm" relative to UTC. TS adds a "Z" in front of the time zone suffix, while ISO 8601 uses the "Z" to mean UTC specifically (thus in ISO 8601 an offset expression following the Z would be contradictory).

Other worth-having features are missing in ISO 8601, however. Those missing features include the concept of significant digits available in TS, where you can say "198" to mean any year from 1975 to 1985.

It seems justified if HL7 sticks with its own tradition of the TS data type but with some slight changes applied that render most TS expressions compatible with ISO 8601 expression. Notably the "Z" should be used in the ISO 8601 style (i.e. only for UTC).

Other issues and curiosities

"I got sick at my birthday, some 20 years ago," is an expression that we might want to capture. One possible representation for this time would be "yyyy0219" if my birthday is February 19th and if yyyy is constrained to this year - yyyy is approximately 20 years. If from another source we gather that I got sick in "1976", but don't know the exact month and day, then we can conclude that I got sick in "19760219", because 1998 - 1976 = 22. This seems a somewhat rare use case, but definitely worth considering.

"I got that cough in spring," might lead us to adjust probabilities for pollen allergy. The season of the year is of interest in epidemiology. Bob Dolin, in his Article on "modeling the temporal complexities of symptoms," suggests accounting for "season" in time expressions. The difficulty here is that seasons depend on the geographical latitude and we can not infer the season from the month of the year. January is Summer in Australia, South Africa, Chile, and Argentinia while we northern people assume that January is the worst part of the Winter. Moreover, at the equator there are not the usual four seasons, however, in tropical regions, there is the Monsun season, which may be considered one of two seasons, or a fifth season. I propose to defer season as part of a point in time expression until the use and the implications become more clear.

Noteworthy references on time expressions are CEN TC251's ENV 12381 Health care informatics; time standards for health care specific problems and the ARDEN Syntax. Those two standards not only define relations and operators on time values but also on events and episodes which are related in time.

Relative times of the semantics NOW + duration offset stick out as the most prominent feature defined by those and other time related standards. We might thus consider the ability to specify relative time. Some conventions use expressions like "t-1" to mean "yesterday". Relative time expressions are of the data type point in type, but the exact value depends on a parameter (the actual time) specified elsewhere.

Calendar modulus expressions

A modulus is the remainder of an integer division. For example, 12 modulo 7 is 5. If we have the time defined as epoche + duration in days, we can tell the day of the week of any date if we know the day of the week of the epoche. For instance, let our epoche be January 1 of 1582 (when the Gregorian calendar was introduced) was a Monday. We can easily tell the weekday of January 31 1582: the offset from the epoche is 30 days. A week has seven days, 30 modulo 7 is 2. Monday plus two days is Wednesday. The same way we can tell that the date epoche + 151840 days (some time in 1998) is a Thursday.

Other such modulus expressions exist in calendars, all of which have the form:

unit₁ of the unit₂

day of the week

month of the year

day of the month

week of the year

day of the year

hour of the day

minute of the hour

second of the minute

Obviously, unit₁ must be less than unit₂. All those units are defined by the calendar and may be slightly different from related units defined for time durations. For instance, the average Julian month is 30.4375 days, but a calendar month varies between 28 and 31 days. Thus the modulo expression "month of the year" must be made available by the calendar and can not easily be calculated using the average month.

How do we express complex modulo expressions that are not provided by the calendar? Things like "every other Tuesday" come to mind. We could tell whether or not a certain date is an every other tuesday by testing the the equation:

date modulo ( 2 x 7 ) = 1; given that 0=Monday, 1=Tuesday, ...

while every Tuesday would be:

date modulo 7 = 1; given that 0=Monday, 1=Tuesday, ...

We decided to ponder on the calendar modulo expressions for some time before coming back to it.

Next conference call is next Monday, December 7, 1998, 11 EST.

Agenda items are:

generic data type for intervals (aka. "ranges"), [slide]
generic data type for uncertainty (aka. "probability distributions") [slide]
hang over items: ordinals, [slide]
"modulo" time expressions.

regards

-Gunther Schadow


unit₁	of the	unit₂

day	of the	week
month	of the	year
day	of the	month
week	of the	year
day	of the	year
hour	of the	day
minute	of the	hour
second	of the	minute