Even though this work is primarily about the semantics of units, communication does not go without a syntax. We have to distinguish communication among computers from communication involving humans. Ideally, in communication between computers one would define a syntax that reflects the semantic structure one to one and that is easy to implement. If humans are involved, the syntax must be somehow understandable to them. Traditionally, most computer systems do not make much sense of units other than to present them to humans. On the other hand, data interchange protocols in healthcare (HL7, ASTM 1238) stick to human readable message formats. Therefore, we cannot design a syntax for units from scratch.
There are three standards for the notation of units of interest, ISO 2955, ANSI X3.50, their reception by HL7 (version 2.3) and ASTM 1238 and the European standard ENV 12435 (CEN).
ISO 2955 is a standard notation for units that suit the limited character sets available on most computers. Its scope is on SI units [ISO 1000], however, its general approach can be used for other systems of units as well. It gives both a case sensitive and a case insensitive notation that does not require greek letters or mathematical symbols, such as the fraction bar or superscripts. It is therefore useful for communication among computers, but especially the case sensitive variant of ISO 2599 retains a good readability for humans.
ANSI X3.50 is similar to ISO 2599 but is mainly concerned with anglo-american units that are not covered by ISO, yet important to cover the requirements in the United States. Certainly, a data interchange standard has to meet the reality and must not dictate a different terminology to health care practitioneers. Thus, any globally useful code for units of measure must unify the metric system and traditional units. This is also true for other non ISO units that are commonly used in healthcare, such as Torricelli's unit of pressure 1 mm Hg.
One problem of all standard code tables is that they are prone to name conflicts and ambiguities. Neither ISO 2955; nor ANSI X3.50 are free of such conflicts. Conflicts arise from the way units are constructed using decimal prefixes.
In the metric system a simple unit consists of an optional prefix
symbol and a terminal unit symbol (in the following called unit atom)
written side by side. ENV 12435, ISO 2599 and
ANSI X3.50 follow this practice. Because the prefix is not
delimied from the unit atom, the computer must analyze a simple unit
lexically, i.e., by finding a match among all possible combinations of
prefices and atoms. Such a combination of codes is prone to bear
ambiguities. For instance, the Pascal (1 Pa) is indistinguishable
from pico-Ampère (1 pA) in a case insensitive
representation, which is why ISO 2599 assigned the rather unusual
notation PAS
to the Pascal.
In our analysis of the actual and potential ambiguities among simple units we identified the following types:
Ambiguous unit atoms. We find an example even within the case
sensitive ISO 2955 uses a
for the year (from latin
Two valid prefix-atom combinations (including an atom without
prefix) produce the same name. Valid combinations are those where the
unit atom is ``metric.'' We call the property of a unit to be prefixed
as ``metric.'' Note that there are non-metric units within the ISO
system. For instance, day (d
), hour (h
),
minute (min
), and the various degrees do not get a
decimal prefix. A type II conflict exists within the case insensitive
version of ISO 2955 for PEV
which can be peta-volt
or pico-electronvolt. This is an error in the code system that needs
to be resolved.
Invalid combinations of non metric units with prefices, where one
can be just an unprefixed atom, collide with each other. For example,
nmi
as the nautic mile collides with nano miles. Since
mile is a non-metric unit atom, for which a prefix is forbidden, no
real ambiguity exists.
An (invalid) combination of a non-metric unit with a prefix
collides with a valid unit atom. For instance, again witin
ISO 2599 (case sensitive), we find cd
for candela
while a centi-day is not explicitely ruled out. Taking into account
that the metric property does not hold for the unit day, this
ambiguity is resolvable without changing the code. Type IVa conflicts
can also be resolved without valuation of the metric property: a
precedence rule that would bind the most characters to the atom would
find match the candela before the centi-day.
A non-metric unit atom collides with a valid prefix-atom
combination. An example is FT
for the foot and
PT
for the pint colliding with fempto-tesla and
pico-tesla resp. There is no way to resolve the conflict without
changing the code. A reversed precedence rule that would bind the most
characters to the prefix would cause the valid metric combination to
be hidden.
An invalid combination of a non-metric atom with a prefix collides
with a metrically valid prefix-atom combination. This type exists for
completeness only, we discovered no type V conflicts. If electronvolt
was not a metric unit atom, the PEV
conflict would be of
type V.