Conflicts and Ambiguities in Unit Symbols

Even though this work is primarily about the semantics of units, communication does not go without a syntax. We have to distinguish communication among computers from communication involving humans. Ideally, in communication between computers one would define a syntax that reflects the semantic structure one to one and that is easy to implement. If humans are involved, the syntax must be somehow understandable to them. Traditionally, most computer systems do not make much sense of units other than to present them to humans. On the other hand, data interchange protocols in healthcare (HL7, ASTM 1238) stick to human readable message formats. Therefore, we cannot design a syntax for units from scratch.

There are three standards for the notation of units of interest, ISO 2955, ANSI X3.50, their reception by HL7 (version 2.3) and ASTM 1238 and the European standard ENV 12435 (CEN).

ISO 2955 is a standard notation for units that suit the limited character sets available on most computers. Its scope is on SI units [ISO 1000], however, its general approach can be used for other systems of units as well. It gives both a case sensitive and a case insensitive notation that does not require greek letters or mathematical symbols, such as the fraction bar or superscripts. It is therefore useful for communication among computers, but especially the case sensitive variant of ISO 2599 retains a good readability for humans.

ANSI X3.50 is similar to ISO 2599 but is mainly concerned with anglo-american units that are not covered by ISO, yet important to cover the requirements in the United States. Certainly, a data interchange standard has to meet the reality and must not dictate a different terminology to health care practitioneers. Thus, any globally useful code for units of measure must unify the metric system and traditional units. This is also true for other non ISO units that are commonly used in healthcare, such as Torricelli's unit of pressure 1 mm Hg.

One problem of all standard code tables is that they are prone to name conflicts and ambiguities. Neither ISO 2955; nor ANSI X3.50 are free of such conflicts. Conflicts arise from the way units are constructed using decimal prefixes.

In the metric system a simple unit consists of an optional prefix symbol and a terminal unit symbol (in the following called unit atom) written side by side. ENV 12435, ISO 2599 and ANSI X3.50 follow this practice. Because the prefix is not delimied from the unit atom, the computer must analyze a simple unit lexically, i.e., by finding a match among all possible combinations of prefices and atoms. Such a combination of codes is prone to bear ambiguities. For instance, the Pascal (1 Pa) is indistinguishable from pico-Ampère (1 pA) in a case insensitive representation, which is why ISO 2599 assigned the rather unusual notation PAS to the Pascal.

In our analysis of the actual and potential ambiguities among simple units we identified the following types:

Type I: double unit atom

Ambiguous unit atoms. We find an example even within the case sensitive ISO 2955 uses a for the year (from latin annum) and for are (= 100 m2). This is the most severe problem which is easily detected.

Type II: metric vs. metric

Two valid prefix-atom combinations (including an atom without prefix) produce the same name. Valid combinations are those where the unit atom is ``metric.'' We call the property of a unit to be prefixed as ``metric.'' Note that there are non-metric units within the ISO system. For instance, day (d), hour (h), minute (min), and the various degrees do not get a decimal prefix. A type II conflict exists within the case insensitive version of ISO 2955 for PEV which can be peta-volt or pico-electronvolt. This is an error in the code system that needs to be resolved.

Type III: non-metric vs. non-metric

Invalid combinations of non metric units with prefices, where one can be just an unprefixed atom, collide with each other. For example, nmi as the nautic mile collides with nano miles. Since mile is a non-metric unit atom, for which a prefix is forbidden, no real ambiguity exists.

Type IVa: metric vs. nonmetric

An (invalid) combination of a non-metric unit with a prefix collides with a valid unit atom. For instance, again witin ISO 2599 (case sensitive), we find cd for candela while a centi-day is not explicitely ruled out. Taking into account that the metric property does not hold for the unit day, this ambiguity is resolvable without changing the code. Type IVa conflicts can also be resolved without valuation of the metric property: a precedence rule that would bind the most characters to the atom would find match the candela before the centi-day.

Type IVb: nonmetric vs. metric

A non-metric unit atom collides with a valid prefix-atom combination. An example is FT for the foot and PT for the pint colliding with fempto-tesla and pico-tesla resp. There is no way to resolve the conflict without changing the code. A reversed precedence rule that would bind the most characters to the prefix would cause the valid metric combination to be hidden.

Type V: nonmetric other

An invalid combination of a non-metric atom with a prefix collides with a metrically valid prefix-atom combination. This type exists for completeness only, we discovered no type V conflicts. If electronvolt was not a metric unit atom, the PEV conflict would be of type V.