It’s right there in the (original) name – FORmula TRANslation. Formulas, or expressions, are an integral part of pretty much every Fortran program. The rules regarding evaluation of expressions in Fortran can sometimes be tricky, with some that will be surprising to those used to other languages.
The fundamental unit of a Fortran expression is the “Primary”. This is a literal constant, variable name, array constructor, function reference, and so on. Expressions consist of primaries, possibly combined with one or more operators. (An expression enclosed in parentheses is also a primary.) It’s here where we encounter our first surprise, with the definition of “literal constant”. For example, let’s look at the Fortran 2018 rule for an integer literal constant:
R708 int-literal-constant is digit-string [ _ kind-param ]
The syntax convention is that something in square brackets is optional. Notice anything missing? That’s right – there’s no spot for a sign! In Fortran, -42 is not a literal constant, it’s the literal constant 42 with the unary negation operator applied to it! This rule has some subtle effects we’ll explore a bit later.
In Fortran, when an expression has more than one operator, there are two kinds of rules that determine the order in which the operators are evaluated. The first of these is “precedence” – given the choice between two operators, which gets done first? (Not all languages have operator precedence – for example, APL doesn’t.) If you read the language standard, precedence is a side-effect of the way expressions are described as having nested “levels”, 1 through 5.
A Level-1 expression is simple:
R1002 level-1-expr is [ defined-unary-op ] primary
User-defined unary operators have the highest precedence, and will get performed before any others (when there is a choice to be made). Now let’s look at a Level-2 expression:
R1007 power-op is **
R1008 mult-op is * or /
R1009 add-op is + or –
Here, we see that * and / (multiplication and division) are treated the same, as are + and – (addition and subtraction). Let’s apply these rules to:
2 + 3 * 4
and see what we get.
Each of 2, 3 and 4 are level-1-expressions. Can we do the addition first? R1006 is where the addition operator can go, but it requires an add-operand. That’s defined in R1005 which is where the multiplication operator can go. From this, we see that we have to do the multiplication first (3 is a level-1-expr, but that is enough to form a mult-operand by R1004). Once we multiply 3 by 4, yielding 12, that makes an add-operand. Now we’re left with the 2, a primary and also a level-1-expr. As such it’s also a mult-operand and an add-operand. We can then follow R1006 to add 2 to 12 giving 14. (No, I’m not Doctor COBOL. – next room!)
Level-3 expressions are Level-2 expressions with an optional concatenation operator. (No, you can’t concatenate numbers, but a character literal or variable is a level-1-expr and thus also a level-2-expr.)
As we descend deeper, we encounter Level-4 expressions, which are Level-3 expressions with an optional relational operator. Level-5 expressions are, you guessed it, Level-4 expressions, optionally combined with one of the logical operators. As we saw in Level-2 expressions, there are sets of same-precedence operators. Here, .NOT. is done first, then .AND., next, .OR., and finally .EQV. and .NEQV. together.
Finally, we get to the rule for expressions as a whole:
R1022 expr is [ expr defined-binary-op ] level-5-expr
Just as user-defined unary operators were the highest precedence, user-defined binary operators are the lowest precedence.
You might be wondering what happened to unary + and -. They’re hiding in Level-2 expression rule R1006! See for yourself!
The standard helpfully gives a table of all the operators and their precedence, even though these are defined by the syntax rules:
|Category of operation||Operator||Precedence|
|Numeric||unary +. –||.|
|Numeric||binary +. –||.|
|Relational||EQ., .NE., .LT., .LE., .GT., .GE.,|
==, /=, <, <=, >, >=
Remember what I wrote above about things such as -42 being a combination of unary minus and an integer constant? Here’s where it can trip you up.
-3 ** 2
If you said positive 9, no prize for you. The exponentiation operator has higher precedence than unary minus, so this is evaluated as if it were:
-(3 ** 2)
or -9! The standard even gives an example of this in a Note, with
-A**2 being interpreted as
Ok, now what about:
A ** -2
? Go back to the expression syntax rules and see if you can work it out. I’ll wait…
Let me guess – you couldn’t find rules allowing this, right? Right! Fortran doesn’t allow consecutive operators! (Many compilers, Intel Fortran for example, will let you do this as an extension, but it’s non-standard.) To conform to the standard you would have to write this as
Now we get to the second aspect of expression interpretation, associativity. This governs what happens when you have two operators of equal precedence. As with precedence, the associativity rules fall out of the syntax rules, but it’s even less obvious. The general rule is that, with the exception of exponentiation, operators are left-associative, meaning that subexpressions are combined from left to right. Exponentiation, however, is right-associative. The standard offers examples, such as:
2.1 + 3.4 + 4.9
which is evaluated as if it were (2.1 + 3.4) + 4.9, and:
2 ** 3 ** 4
which is evaluated as 2 ** (3 ** 4).
This is not the end of the story, though. The standard gives compilers the freedom to evaluate “any mathematically equivalent expression, provided that the integrity of parentheses is not violated.” This means that the compiler may reassociate operations as long as the new expression is, mathematically, the same, and that parentheses are honored. (Intel Fortran by default doesn’t honor parentheses and may reassociate across them! You can disable this with
Most of the time, associativity doesn’t matter, but consider this example from a presentation on numerical reproducibility I gave at Supercomputing ’13:
CAM (Community Atmospheric Model) exampleImproving Numerical Reproducibility in C/C++/Fortran
A(I) + B + TOL
• TOL was very small and positive
• A(I) and B could be large
Compiler evaluated this as A(I)+(B+TOL)
Hoisted constant B+TOL out of the loop
TOL got rounded away…
Similarly, for logical and relational expressions, the standard allows compilers to evaluate any “equivalent” expression. This most often bites programmers when they write something like this:
if ((i > 0) .and. (a(i) /= 4)) then ...
and complain when the compiler evaluates
a(i) first and gets a subscript error. Unlike C, Fortran does not have strict left-to-right ordering, nor does it have “short-circuit” evaluation, where if (in this example), the condition
(i > 0) was false, then the second expression would not be evaluated. The standards committee has discussed various additions to the standard to provide for short-circuiting, such as an
.AND_THEN. operator, but there was insufficient support for it from the members. You’ll have to continue using nested IF-THEN instead.
Lastly, I should mention that the standard allows a compiler to evaluate an expression to any degree of completeness that will deliver the same mathematical or logical result. This means that in the case of something like:
a = f(x) * 0
the compiler may choose to not call function
f at all, since it can determine that the value of the expression is always zero.
As always, if you have comments or questions about this post, enter them below. I’m also open to suggestions for future Doctor Fortran topics!