KotUniL = SI Units + Kotlin. Part Three: When only one unit test is enough
This is the third and final article in a series devoted to the KotUniL library of the Kotlin language for working with physical and other dimensions.
In this article we will see how fundamental scientific structures of SI-Uinit and specific features of Kotlin have affected the design of our library.
These are the articles of this series:
KotUniL = SI Units + Kotlin. Part One: Introduction to KotUniL
KotUniL = SI Units + Kotlin. Part Two: Advanced Features
KotUniL = SI Units + Kotlin. Part Three: When only one unit test is enough (this)
What kind of world do we live in?
Let’s start by remembering the basics. The physical world in which we live is very successfully described by formulas involving physical quantities. A physical quantity is characterized by a numerical value and dimensionality. (This dimensionality should not be confused with the geometric dimensionality of our space.)
For the so-called basic physical quantities, people have agreed and chosen units. For example, in the beginning they were physical artifacts called standards: the standard of a meter, a kilogram, etc.
One can compare any other quantity of the same kind with a unit of physical quantity and express their ratio as a number. So a physical quantity is a numerical value (a relation to a chosen unit) and an indication of the type of quantity within the chosen system of physical quantities (see International System of Units).
There are several systems of physical quantities, but all of them are conceptually close to each other and finite, i.e. they consist of a small number of dimensions or, what is clearer for programmers — types.
Dimensions or types are usually denoted by capital Latin letters. For example, the classical system of dimensions, on which the SI system is based, contains seven dimensions: LMTIΘNJ (length, mass, time, etc.).
It turns out that all physical information can be measured, calculated, predicted, etc. as a set of “dimensional quantities” — pairs of numbers and dimensions. The dimensionality is written in the form (if we work with LMTIΘNJ) Lp1Mp2Tp3… where pi is the degree of dimensionality.
For example, the dimensionality of acceleration is L1T-2, and the acceleration of free fall of a body at the surface of the Earth is 9.8 m/s2.
In other words, we live in a world in which any objective physical information is represented as a point from eight-dimensional space R*D where R is a number representing values of a physical quantity, and D is the space of seven-dimensional vectors of real numbers, representing degrees at corresponding dimensions.
Looking ahead, I note that nothing prevents us to “lengthen” these vectors by adding our own dimensional quantities, such as the price in Euro or the subjective attractiveness of the partner, measured on a ten-point scale.
But for simplicity we will focus on the space of seven-dimensional vectors.
What is the mathematical structure of the space of dimensions?
When physicists do some mathematical work on physical quantities, they work differently with numerical values of quantities and their dimensions. Working with numerical values in our context is trivial and uninteresting. Let us focus on the question of how operations on physical quantities are mapped to operations on dimensions.
The basic set of dimension space (in the SI system) is vectors of length 7. The elements of a vector are real numbers. Each element (element number) is associated once and for all with a certain physical dimension.
Comparison operations and classical arithmetic operations are defined over this set. But in a very specific way.
The order relation, is defined only over the same dimension vectors. We can compare two velocity values, but we cannot compare speed and distance.
The operations + and — on the pair D*D have a very limited area of definition: the operations are defined only if d1 == d2 (on the so-called diagonal of the product D*D).
The result of the operations is also very unusual:
d1 + d2 = d1 — d2 = d1 = d2
For example, by adding distances or subtracting lengths, we always remain in dimension L1
Thus our space is an Abelian group.
The operation of raising a physical quantity having a dimension with degree m to degree n is shown in multiplication of powers, and taking a root in dividing the first by the second.
Multiplication and division operations are defined everywhere, i.e. over any pairs of elements from D.
Multiplication and division are mutually irreversible. However, there is no zero by which division is forbidden in our case. It is possible to divide by any dimension.
The operation of multiplication of dimensions leads to addition of values of corresponding vectors. And the division operation means the subtraction of the values of the divisor from the values of the divisor.
Here by multiplication/division our space of physical dimensions seems to be exactly an Abelian group. Thus it turns out that the space of physical dimensions with certain arithmetic operations is an algebraic field in the sense of this definition.
If I am wrong, I will ask mathematicians among the readers to correct me.
I should also mention the application of functions like sine or integral to physical quantities. Here it is simple. They apply only to numerical quantities, and they do not change the vector dimension.
We need abelian groups and algebraic fields with their freakishly defined arithmetic operations not only for the sake of curiosity, but also for quite pragmatic reasons. And here’s why.
I dare to formulate a lemma:
Lemma: All physical formulas (at least those needed in practice:-) can be represented as an oriented computational graph in which nodes have one ancestor and one or two descendants and are interpreted as one of the operations with descendants from the set of operations; +,-, *, /,^ (power).
We omit the use of functions like sine or integral in formulas because they are defined over dimensionless quantities.
The magic of programming languages is that compilers know how to build these oriented computational graphs. And some languages, such as Kotlin, allow not only to define new objects, but also to define operations over them that apply to ordinary numbers — the same +,-, *, /,^ .
This means that we can then use the newly defined objects in formulas the same way we are used to using numbers.
Of course, not all syntactically correct formulas make physical sense. But this is another problem that must be solved when implementing objects and operations of the world being modeled (in our case, the world of physical quantities).
The operations used in formulas, as already noted, can be correct or incorrect from the point of view of use of dimensions. The fact that a formula is correct does not depend at all on the numerical values of physical variables used in it. In other words, in terms of dimensions, a formula is either always correct or always incorrect. If this fundamental property of dimensions can be implemented in the code, then to check the correctness of the formula from the point of view of dimensions, one unit test “running” on it will be enough.
Going forward, I would like to say that it seems to me that KotUniL has achieved this.
The Splendor and Poverty of Kotlin and DSL Design
So, I wanted to develop a special Domain Specific Language (DSL) for manipulating physical and other dimensional quantities.
Functionally, it should provide manipulation of dimensional quantities using mathematical formulas as similar as possible to the use of similar formulas in technical articles and documentation.
DSLs can be standalone and embedded into programming languages. We want to create an embedded subset of Kotlin without using preprocessors and generators. In other words, all our DSL constructs must be syntactically correct Kotlin constructs with their own specific semantics.
Of course, this restriction strongly “clips the wings” in the language design.
But let’s try to come up with such constructions, which will be understandable to people with technical background and at the same time will be accepted by Kotlin’s compiler.
Let’s start with the simplest school formulas:
x = 1 m
y = 1.5 m
s = x * y
This seems to be very simple and familiar from the fifth class.
But let’s take a closer look at the expression
x = 1 m
What does this really mean? How can this be implemented in a programming language?
In fact, this notation means using a function of two variables
f : R, N -> M(L)
where R is a set of generally real numbers
N — set of names of physical units
M(L) — the set of physical units of length
Of course, there are functions in the Kotlin language, but if instead of
x = 1 m
we write something like
f(1, "m")
it won’t make much sense to technical people.
Kotlin has infix functions, but they must have two arguments.
That is, you can write m and use it in the form:
2 m 5
but you can’t write an infix function m to use it as
1 m
In the end, it seems we are left with one option: use function extensions. Then the example
x = 1 m
y = 1.5 m
s = x * y
can be written in our language as:
x = 1.m
y = 1.5.m
s = x * y
Or we can write at once
s = 1.m * 1.5.m
About prefixes, suffixes and infixes in programming
I tried to organize my thoughts about functions, operations, arity/valences and prefixes, infixes and suffixes. You might find it interesting, too.
These may not be exact definitions, but the way I see it is this.
We start from the fundamental notion of mapping one set of spaces to other sets.
The number of parameters that are an input to our mapping is called its arity. Accordingly we speak about zero-ary, unary, binary, ternary, etc. mappings. Sometimes they talk about the valence of a mapping instead of arity.
The term “ arity” is most commonly used with respect to functions and their arguments.
According to the definition, functions are understood as unambiguous mappings. And some functions with a well-defined result are commonly referred to as operations.
As I understand it, separation of some functions into a special category of operations had mainly historical reasons. In particular, the familiar from school plus, minus, multiplication, etc. are usually called operations.
At school we were taught to write down functions in bracket notation:
y = f(x1, x2, …xn)
Binary functions can be written in a notation called infix:
y = x1 f x2
Familiar arithmetic operations are almost always written this way, for example
y = x1 + x2
z = a*y
Unary operations can be written in brackets
y = f(x)
Or it can be written in the prefix notation
y = f x
or in postfix notation
y = x f
Well-known examples of postfix operations are calculating percent (20%) or factorial (5!).
Examples of prefix operations implemented in programming languages: the logical negation not a or !a
as well as operations with assign:
y += 2 or y = ++x.
They also have a postfix variant: y = x++ with slightly different semantics.
Arithmetic operations in prefix, infix and postfix notation are (pre)defined in most programming languages.
The delicate question is whether it is possible to define your own such operations.
It turns out that language creators are more sympathetic to infix operations than to prefix and suffix operations. So, for example, in Kotlin you can define infix native functions, but prefix and postfix native functions cannot (see here).
So, we have made do without postfix functions in KotUniL for now.
In programming, it is customary to talk about types or classes, objects or instances. It’s not customary to talk about a half of instance, but in real life you see it all the time.
For example, you can buy 500 grams of wine. If you drink 200 grams of it, you have 300 grams left over.
To meet these requirements we transgress the restrictions of the fundamentalists of object-oriented programming and implement arithmetic operations on dimensional values:
val rest = 1.5.kg - 200.g
In this entry we have three different instances of the same type.
And multiplication or division results in a composite type that is different from the two original types. In my library, I call it Expression, although it is always a product, as the above theory suggests.
The inventors of Kotlin were very good to allow the use of most Unicode characters in identifiers. This allowed us to easily map the requirements of the SI system for the notation of some dimensions and dimensional prefixes into the code.
Another strong feature in language design was to allow the use of arbitrary characters enclosed in special quotation marks as identifiers:
val prise = 52.`€`/m2
This elegantly solved a number of problems, including identifier collisions.
Some personal impressions
Finishing the topic of the magic of physical dimensions, I would like to share my understanding of the process of their formation and even fascination with this process.
So, already in ancient times during trade, collection of taxes and construction they needed measures of weight and length. Reference local artifacts appeared as standards. In the Middle Ages in Europe, each fair and each city had its own measures of length and weight (sometimes volume), which created inconveniences in inter-city trade.
Science, too, needed accurate standards. These, along with many other useful things, were brought by the Great French Revolution. The French created and began to carefully store very precise etalons in the form of physical artifacts.
But it took a century and a half before the rest of the world more or less joined the standard.
The SI standard has improved and evolved. It’s well described on Wikipedia.
I was personally struck by the last two changes to the standard.
Especially the penultimate one, adopted by the General Conference on Weights and Measures in 2018 and effective in 2019.
Over the last two centuries, science has found many physical processes that always lead to the same results. For example — the speed of light is constant and equal to 299,792,458 m/s.
And there are many such physical constants. Measured them with measuring instruments built on the basis of physical standards.
And in 2018 they decided to do the reverse conversion. Not to define constants using standards, but to abandon physical artifacts and measure physical units based on natural constants. The picture from Wikipedia at the beginning of the article explains the new structure of the process of defining SI units.
Personally, it took me some time to comprehend the philosophical significance of this decision.
The last decision of the General Conference on Weights and Measures is not as fundamental, but it is noteworthy. And it happened after I had started working on my library.
More recently, the accuracy of expressions of results of calculations and measurements, expressed by prefixes (milli-, micro, kilo, hecto…) has been expanded due to practical needs from the degree interval (-24, +24) to (-30, +30). Recall that the prefix “nano”, is “only” -9. It is fantastic how far into the Cosmos and deep into the elements of matter modern science has penetrated with its measurements.
One last thing. I have made a proposal to extend the functionality of Kotlin with a library like the one described in this series of articles. If you find it useful, you can participate in the discussion.
Illustration: Reverse dependencies of the SI base units on seven physical constants, which are assigned exact numerical values in the 2019 redefinition. Unlike in the previous definitions, the base units are all derived exclusively from constants of nature. Source: Wikipedia