Document: n926 Title: Sequence Point Analysis Author: Raymond Mak, IBM Corp. E-mail: rmak@ca.ibm.com Date: 17-Nov-2000 (Please use fixed width font to print; otherwise the trees will become all chopped up.) Abstract An idea was presented at the WG14 Toronto meeting to use abstract syntax tree to tackle the sequence point problem. This paper documents that presentation and expands the idea into an analysis tool. Abstract syntax tree is a tool in the description of expression semantics. The properties of trees are well understood; and trees have a natural graphical representation. The notation described in this paper can be used to facilitate discussions about the sequence point issues. The essential ideas presented below were inspired by Feather's work in n925. 0. Introduction There are read and write events occurring during the evaluation of an expression. The Standard imposes an ordering on some of these events, but leaves others unspecified. The result is a partial ordering. Under certain conditions, an implementation has more than one way to evaluate an expression. An analysis of an expression involves two steps: a) Determine the partial ordering of the events. b) Basing on (a), determine the status of the expression. We use an abstract syntax tree (or syntax tree for short) to determine the partial ordering, and use a set of rules to determine the status. 1. Abstract syntax tree Write the expression as an abstract syntax tree. For example: x = x * y becomes = / \ x * / \ x y The tree is a directed acyclic graph; each edge joining two nodes has an arrow pointing towards the leaf side. (The arrow is not drawn.) 2. Partial ordering The directed edges define a partial ordering. This ordering imposes a restriction on the order of evaluation. In the above example, the =-node must be evaluated after the *-node. If two nodes cannot be compared, there is no ordering restriction. (We will add rules later on to handle special ordering semantics; e.g. the one imposed by &&.) 3. Read and Write Events If the operation writes values to address locations, annotate the node with the addresses being written to. For conciseness, we denote the address location by an lvalue expression written inside a pair of square brackets. = [x] / \ x * / \ x y During the evaluation of an expression, whenever an lvalue is converted to an rvalue, mark it with parentheses. = [x] / \ x * / \ (x) (y) If y itself is a subtree, for example if the expression is: x = x * a[i]; annotate the rvalue at the operator: = [x] / \ x * / \ (x) \ \ [] (*(a+i)) / \ (a) (i) The notation will become obvious as we go through the examples later. 4. Relationship between nodes We are interested in nodes representing read and write accesses of the same or overlapping address location. When there are two accesses to the same or overlapping address location, we need to find out their relationship; i.e. whether one must come before the other, and whether there is an intervening sequence point. The annotated syntax tree described above is the starting point to analyze this relationship. 4.1 Any two nodes in a tree are connected by a unique path, consisting of the directed edges. In our earlier example, x=x+y, the path joining the x-locations is: = [x] \ * / (x) For ease of presentation, we can also straighten out the path into a horizontal line like the following: =[x] -->-- * -->-- (x) The arrows are put onto the edge to show the ordering. 4.2 Edges in a path need not be in the same direction. For example, the path connecting (x) and (y) is: * / \ (x) (y) If we straighten it out and show the arrows: (x) --<-- * -->-- (y) The edges points to different directions. 4.3 If the edges point to the same direction, the two nodes can be compared; the direction of the edges gives the ordering. In the example in 4.1, [x] > (x). That is, [x] must happen after (x). If the edges point to different directions, the two nodes cannot be compared. In the example in 4.2, (x) and (y) has no ordering; there is no guarantee on the relative timing of their evaluations. Note that there can be at most one node in a path with edges pointing to different directions. 5. The status of an expression 5.1 The status of an expression is determined as follows. Examine all paths joining nodes with the same or overlapping address location. For each path, determine the status of the path (i.e. determine the relationship between the two nodes): [Rule] R.1 If the two nodes are ordered: R.1.1 If the smaller one (the one happens earlier) is a read, or if there is a sequence point in between, it is well-defined. R.1.2. Otherwise it is undefined. R.2 If the two nodes are not ordered: R.2.1 If both are reads, it is well-defined. R.2.2 If there is a write guarded by a function call operator (see 5.2 below), it is unspecified. R.2.3 Otherwise it is undefined. [End Rule] 5.2 Events happening inside a function body are considered 'guarded' by the function call operator. These include both the reads and writes events. There is a sequence point before the function returns; and function calls do not overlap (DR087). These two together justify R.2.2. We will say more about this in 12.4. Note: Another way of looking at this is that function calls are evaluated as though they were 'atomic' to the rest of the expression. 5.3 After all paths are examined, the status of the expression is: - undefined if there is an undefined path, otherwise - unspecified if there is an unspecified path, - otherwise it is well-defined. Note that the true status of an expression has to be determined during runtime, as we do not know in all cases whether two lvalues designates the same or overlapping address location. 6. Sequence point Use 'S' to denote a sequence point, either put it beside an edge or inside a node. 6.1 For operators &&, ||, ?: and comma, put it beside the left edge. For example, (x && y) is: && / \ /S \ / \ x y 6.2 For the call operator, put it inside the node. For example, the tree for f(x) is: ( ) /\ / \ f x Putting in the sequence point, it becomes: (S) /\ / \ f x There is a sequence point right before the function is actually called, but none immediately after each of the evaluations of f or x. Therefore the sequence point cannot be put on an edge; also the order of evaluation of f and x (and other arguments if present) is unspecified. A sequence point is an event occurring during the evaluation of an expression. A sequence point on an edge leading to a subtree forces all side effects occurring inside it to take effect before the evaluation leaves the subtree. In a sense, the sequence point is a 'fence' guarding the subtree below it. There are two sequence points involved in a function call - one right after the evaluation of the function designator and all the arguments, another one before the function returns. To make the notation compact, the S represents both of them. Also, we hang the function designator and the arguments under the left parentheses on one side of the S to illustrate the fact that the S is not involved when these nodes are compared among themselves. 7. Order of evaluation 7.1 For operators other than &&, ||, ?: and comma, the order of evaluation of the subexpression is unspecified. 7.2 For &&, ||, ?: and comma, the order of evaluation is from left to right. This means that every nodes in the left subtree is ordered before all the nodes in the right one. We should keep two closely related concepts separate. The first is the concept of sequence point, which signifies the latest point in time when all side effects occurred during the evaluation of a subtree must be applied. The second is the concept of 'time-sequencing', which imposes an ordering among the events happening during the evaluation of a subtree. Sequence point by itself does not impose an ordering. It is just a flushing point for the side effects. The evaluation order of &&, ||, ?: and comma is an additional requirement imposed by the operators. (e.g. C99 6.5.13p4 for &&.) 8. An algorithm After we have identified all the reads and writes, and annotated the synatx tree, the operator of a node is not important anymore. We can replace them with a generic 'o' without lost of information. For operators that impose an evaluation order, we can replace them with a '^'. So an algorithm can be devised as follows. A.1 Draw the abstract syntax tree of the full expression. Partial expressions cannot be analyzed. A.2 Identify all the reads and writes and annotate the tree. A.3 Put sequence points on the left edge of &&, ||, ?: and comma, and inside the ()-node. A.4 Replace &&, ||, ? and comma with the generic '^'. Replace other operators with the generic 'o'. The colon in ?: is left unchanged. (See A.6.) A.5 Determine the ordering relationship between two nodes as follows: A.5.1 Connect the two nodes with a path. A.5.2 If there is a o-node on the path with edges pointing to different directions: n1 ... ----<---- o ---->---- ... n2 the two nodes (at the ends of the path) are not ordered. A.5.3 If there is a ^-node on the path with edges pointing to different directions: n1 ... ----<---- ^ ---->---- ... n2 the node on the left is less than the node on the right. (i.e. the event on the left must happen before the one on the right.) A.5.4 Otherwise the two nodes are ordered. The ordering is defined by the directed edge. (Note that A.5.2 and A.5.3 are mutually exclusive. There can be at most one node, 'o' or '^', with edges pointing to different directions.) A.6 The ?: consists of two subtrees merged together. We use the following notation: x ? a : b; ? / \ /S \ / \ x : / \ / \ a b which becomes, after A.4: ^ / \ /S \ / \ x : / \ / \ a b The colon indicates the merge. The colon has a special property; unlike any other operators we don't have to compare the nodes between the two sub-trees under the colon because only one of them will be evaluated. Note: As we will see later in the examples, we don't have to actually replace the operators with '^' and 'o'. These are just convenient devices to describe the algorithm. The notation is clear and flexible enough to allow different variations of presentation. Also, it is not necessary to expand all the subexpressions into subtrees; sometimes it is more convenient to just write out an subexpression as-is as a leaf node. [EXAMPLES] Before we proceed further, let us use some specific examples to illustrate how the notation is being used. Following are selected examples from n925. The numbering of the examples is the same as n925. EXAMPLE 2 int x; x = ++x; = [x] / \ / \ x ++ [x] / / (x) There is a path with two writes without an intervening sequence point: =[x] --->--- ++[x] The expression is undefined. EXAMPLE 3 x += x * x; += [x] / \ / \ (x) * / \ (x) (x) All paths have read before write. (The write is at the highest point in the tree.) This is well-defined. EXAMPLE 4 int x; extern int f(int); x = f(x++); = [x] / \ / \ x (S) /\ / \ f ++ [x] / (x) There is path with two ordered writes: =[x] --->--- (S) --->--- ++ [x] There is an intervening sequence point. The expression is well-defined. EXAMPLE 5 int x,y; (x=y) + x; + / \ / \ = [x] (x) / \ x (y) There is an undefined path: =[x] ---<--- + --->--- (x) EXAMPLE 6 int x, y, z; (x=y) + (x=z); + / \ / \ / \ = [x] = [x] / \ / \ x (y) x (z) There is an undefined path - two writes without a sequence point: =[x] ---<--- + --->--- =[x] EXAMPLE 9 struct s { double p; int q; int r; } *x, y; x = &y; x->q = x->r; = [(*x).q] / \ / \ / \ -> -> ((*x).r) / \ / \ x q x r Well-defined. EXAMPLE 11 int x; x++ && x--; && / \ /S \ / \ ++ [x] -- [x] / \ (x) (x) The path in question is: ^ / \ /S \ / \ ++ [x] -- [x] or to straighten it out horizontally: ++[x] ---<--- ^ --->--- --[x] S There is a sequence point between the two ordered writes. The expression is well-defined. EXAMPLE 12 int x, y; x++ * y++ ? x-- : y --; ? / \ / \ /S :------- / \ \ / \ \ * --[x] --[y] / \ \ \ / \ \ \ / \ (x) (y) ++[x] ++[y] \ \ (x) (y) There are two paths involving two writes. But both have a sequence point in between and the notes are ordered: ++[x] --<-- * --<-- ^ -->-- : --[x] S (The [y] path is similar.) EXAMPLE 13 int x[2], *y; y=x; *y = f(y++); = [*y] / \ / \ * (S) / /\ / / \ (y) f ++ [y] \ (y) There is a path with both a read and a write: (y) --<-- * --<-- = -->--(S)-->-- ++[y] Even though there is a sequence point in between, the two nodes are not ordered. The expression is undefined. Example T1 (This example was provided by O'Riordan at the Toronto meeting.) *( *p=2, p) = 7; = [*p] / \ / \ * 7 / , / \ / \ /S (p) / = [*p] / \ * 2 / (p) The two writes are related as follows: [*p] --<-- , --<-- * --<-- = [*p] S They are ordered with a sequence point in between. This path is well-defined. Other paths involving reads are also well-defined. [END OF EXAMPLES] 8.1 Cast, sizeof and compound literal Use 'cast' and 'sizeof' to represent the cast and sizeof operators. We also use a 'type' node to group the subexpressions of a variably modified type. For example: x = sizeof( char[x] ); = [x] / \ x \ sizeof / type / (x) The type name can be used to label the node instead of using the generic label 'type'. The left operand of a cast operator is the 'type' node and the right operand the cast-expression; the evaluation order of the two is unspecified. Similarly, use a 'lit' node to represent a compound literal. 9. Function call 9.1 If a function call has side effects, including writes to floating-point environment and flags, or reads that are relevant to the analysis of the expression, annotate them at the call operator node and treat them like other reads and writes. For example: int x, y=0; int foo(void) { x = y; return x;} x = foo(); = [x] / \ x \ (S) [x](y) / f The two writes are related as follows: =[x] ----->----- (S) [x] They are ordered with an intervening sequence point. 9.2 Two function calls cannot overlap (DR087). This imposes a restriction on the ordering of the events between two function calls. In our notation, a side effect annotated beside a call operator is considered 'guarded' by that call operator. Consider the following two examples. int x=1; x++ * x--; This is undefined as there are two writes without an intervening sequence point. On the other hand: int x=1; void f() {x++;} void g() {x--;} f() * g(); * / \ / \ / \ (S)[x] (S)[x] / / f g This is unspecified as both writes are guarded by function call operators. (R.2.2) 10. Floating-point flags 10.1 If an operator accesses the floating-point environment or flags, annotate them at the operator. Use FG to denote the floating-point flags. FG=1 means set, FG=0 means clear, and (FG) means read. There are more than one flags; so FG=1 doesn't mean setting all flags, just those that are affected. If there is ambiguity, we will use FG1, FG2, etc. to identify individual flags. Likewise for FG=0. For example, if x++ sets a floating-point flag: ++ [x] FG=1 \ \ (x) FG=x is a write event. 10.2 Stickiness Floating-point flags are sticky in the sense that they can be set more than once without an intervening sequence point and the result is still well-defined. Likewise for clears. But we need to distinguish between set and clear. That is, we treat floating-point flags updates like any other write events, except that two writes producing the same result are allowed without an intervening sequence point; the result in this case is still well-defined. Add the following rule for floating-point flags. [Rule] R.3 Read and write events to floating-point flags follow the same rule as other reads and writes, except for the following. If two FG writes are both sets (FG=1), or are both clears (FG=0), they are well-defined regardless of whether they are ordered, and regardless of whether there is an intervening sequence point. [End Rule] Note: More work is needed to make sure the above rule reflects the Standard's intention. Also note that floating-point flags can only be cleared by library functions; i.e. clears are always guarded by call operators. 10.3 Atomic It might be useful to generalize the rule for floating-point flags to cover sig_atomic_t. Two writes to a sig_atomic_t object is well-defined if they are writing the same value, even if they are unordered or have no intervening sequence point. Individual floating-point flags behave as if they are atomic. 11. Volatile 11.1 The black-box The collection of all volatile address locations accessible to the program forms a black-box to the program. Accessing any such locations implies writes to *all* volatile locations. An example would illustrate. volatile int x,y; int i; i = f(y)+x; = / \ i \ + / \ / \ / (x) [x][y] / ( ) /\ / \ f (y) [y][x] The read access on (x) implies writes to [x] and [y]. The same applies for the read on (y). After identifying all the volatile accesses, and annotating the write events accordingly, the analysis of the expression simply proceeds as before. 11.2 Volatile access is a difficult problem. The above attempts to handle it in the same way as other reads and writes. We have said more than the Standard; the latter simply says volatile accesses are side effects (C99 5.1.2.3p2, 6.7.3p6 and footnote 114). But our interpretation seems be consistent with the sequence point semantics discussed so far. The analysis in 11.1 is pessimistic, but is the best we can do in the absence of information about the black-box. If the programmer knows about what goes behind the volatile objects, a tighter analysis could be done by partitioning the set of volatile objects into disjoin sets. Accessing an object in one set would not affect objects in the other sets. Such partition is implementation defined. We can use V to collectively represent all volatile objects; or use V1, V2, etc. to represent a partition. This way, we can use [V] and (V) to annotate volatile accesses in the syntax tree instead of listing out all the individual objects. 11.3 Signal This paper does not address signal handling. This is better done after the bugs in the tool are ironed out. But we would like to make a quick comment. If an operation raises an interrupt, and if the interrupt handler returns to the point of interruption, we can treat the interrupt handler as a function call and annotate the read and write events occurring inside the handler beside the operator. Analysis would then proceed as before. There are two questions : a) Where is the sequence point for the handler's side effects? b) Does the "no overlap" rule for function calls apply? 12 The order of evaluation revisited 12.1 In general, the order of evaluation of the subexpressions is unspecified (C99 6.5p3). For example: int x = 1; (x++ , x) + (x-- , x) + / \ / \ / \ , , / \ / \ /S \ /S \ x++ x x-- x The + operator does not impose an order of evaluation. So the 'order of evaluation is unspecified'. But what exactly does it mean, though ? There are two ways to look at this. Ignore what the Standard says for the moment; just consider what are the possible definitions here. 12.2 Tree traversal view One way of looking at it is the tree-traversal view. That is, an evaluation of the expression is a traversal of the tree. We start at the +-node and then traverse and evaluate as we go along. Since the +-node does not impose an evaluation order, either the left or the right subtree can be traversed first. But if we go down one subtree, we will finish that one before we proceed onto the next one. In the above example, if we list out the events (reads and writes) we encounter on the route, we will find that we always have a sequence point after a write event before we encounter another access. The two traversal order thus give two results. This, of course, is not what the standard says. 12.3 Events interleaving view Another way of looking at it is the events interleaving view. There are events happening inside the left and right subtree of the +-node. Within the subtrees, the events have a certain ordering imposed on them due to the operations. At the +-node, however, the order of evaluation is unspecified; the events on the left can occur at an unspecified time relative to the events on the right. To put it in another way, the events on the left can interleave with those on the right, subjected to the ordering restrictions imposed by their own subtree. Let us take a more detailed look at the example in 12.1. Following the notations in n925, let us denote the write event on the left of the + by {3: W x} and the write event on the right by (6: W x}, and the two sequence points by {7: S} and {8: S} respectively. (The number before the colon identifies the event. W means write, S means sequence point; the x refer to the address locations affected. Event numbers that are skipped correspond to events that do not concern us in our discussion here.) There is an ordering imposed on the events on the left (e.g. 3<7) and the events on the right (e.g. 6<8); but there is no ordering restriction between 3,7 and 6,8. That is, we can interleave 3,7 with 6,8. Obviously there is a permitted ordering that put the two writes together without an intervening sequence point. For example, at the +-node, the following sequence of events is permitted: 3 6 7 8 The expression is therefore undefined. The sequence of events happening on one side can be seen as the start and end points of time intervals. The interval between events 3 and 7 ([3,7]) marks the period of time the side effect of writing to x has to take place. Likewise for the interval [6,8]. Since 3 and 6 can each start at an unspecified time relative to each other, the two time interval can overlap. Overlapping access intervals involving at least one write event causes undefined behavior. This gives rise to rule R.2.3. 12.4 Function call 12.4.1 There is a sequence point before the actual call, and another one before the function returns. Also two function calls cannot overlap. There is a lot of semantics jammed into our compact notation. We will go through the call operation in detail here. Let us use an example: int v=0; void f (int x) { v += x }; v += f(v); += [v] / \ (v) \ (S) [v] /\ / \ / \ (f) (v) The sequence of events is as follows: - Evaluate f and v (in an unspecified order), and then - a sequence point, and then - branch to the function, and then - a sequence point, and then return. The different nodes are related as follows. (f) and (v) (function designator and the arguments): (f) --<-- ( -->-- (v) They are unordered with no intervening sequence point. (The S inside (S) does not apply. Please refer to the appendix for further comments about the notation.) (v) and [v] (the argument and the function side effect): (v) --<-- (S) [v] They are ordered with an intervening sequence point (this is the first sequence point). (Well-defined) [v] and [v] (the function side effect and the assignment): +=[v] -->-- (S) [v] They are ordered with an intervening sequence point (this is the second sequence point). (Well-defined) (v) and [v] (the left of += and the function side effect): += / \ (v) \ (S) [v] (v) --<-- += -->-- (S) [v] They are unordered. The S is the second sequence point. [v] is guarded by a call operator. (R.2.2, unspecified.) 12.4.2 Two function calls cannot overlap. This is the interpretation of DR087. This makes function calls somewhat like the tree traversal described in 12.2. Consider the expression: f(1) + f(2) where f is defined as in the previous example. As discussed in 12.3, the time interval between a write event and the corresponding sequence point event is what we are interested in. In essence, if two such time intervals for the same address location overlap, we have undefined behavior. In the above expression, if we draw the time line from left to right, and use n925's notation for events, then the two write events for v can be represented as follows: (Events 1 & 2 belong to the left f(); 3 & 4 to the right.) -------------- time ------------------------> {1:W v} ....... {2:S} {3:W v} ....... {4:S} or {1:W v} ....... {2:S} {3:W v} ....... {4:S} But they cannot overlap like this: {1:W v} ....... {2:S} {3:W v} ....... {4:S} This is the rationale behind R.2.2. The tree diagram for the expression is: + / \ / \ / \ / \ (S)[v] (S)[v] /\ /\ / \ / \ f 1 f 2 The two write nodes are related as follows: [v](S) --<--- + --->--- (S)[v] They are unordered, but both are guarded by function calls. The path is unspecified. 13. Conclusion and future work We described a tool for analyzing sequence point semantics. The basic device in this tool was the annotated syntax tree notation, which was used with a set of rules to provide an interpretation of the Standard. The tree notation can be used independent of the rules. This enables us to ask what-if questions, and to examine the effect of different variations of the rules on an expression. Further work is needed to study the rules for volatile and for floating-point flags. This paper did not address signal handling; that area is left open for future work. ACKNOWLEDGMENT Special thanks to Hugh Redelmeier and Fred Tydeman for their time in reviewing this paper. Their comments and suggestions had been invaluable. I did not address all their concerns, and they had not seen the final version of this paper. Any errors and inadequacies were mine. APPENDIX X.1 The sequence point notation A comment about the notation for sequence point. Let us use an example: x && y && / \ /S \ / \ x y If we treat the evaluation of the expression as a traversal of the syntax tree, then we start from the &&-node and walk down the left-side of the left-edge, around the x-node, then up again along the right-side of the left-edge, and then proceed onto the right subtree. During this walking tour, we encounter the sequence point on our way up from the x-node, not on our way down. This is why the S is put on the right-side beside the edge. For a function call: f(x) (S) /\ / \ f x If we walk this tree, similar situation arises. We walk the f, x (and other arguments if present) in an unspecified order; but there is no sequence point along the way until we come back to the ()-node, where we encounter a sequence point right before we call the function, and then another one later when it returns. So we put the S inside the pair of parentheses. Also we hang the function designator and the arguments off the left parentheses of the call operator. This is to represent the fact that the sequence point does not play a role in their evaluations: ( S ) /\ / \ f x The call operator is a complex operation; it can be broken down into three parts: the evaluation of the function designator and the arguments, the actual calling, and the return. To represent these operations explicitly, we need to 'slice' the call operator into two halves. The left parentheses represents the first part of the operation (evaluation of the function designator and arguments). The right parentheses represents the third (the function return). f and x are thus related as follows: f ---<--- ( --->--- (x) Only the left half of the call operator is involved, and there is no sequence point. And, x = f(x) = [x] \ \ (S) /\ / \ f (x) (x) and [x] are related by: (x) ---<--- (S) ---<--- =[x] As we have seen in the various examples before, it is not necessary to really distinguish the left/right half of the call operator in an actual analysis. There is no ambiguity as to which part of the operation is involved, and which sequence point is relevant. X.2 The partial ordering The key to sequence point analysis is to determine the order of evaluation; this order is a partial order. The syntax tree notation presented in this paper is a tool to find out this partial ordering. However, for the sake of simplicity, certain details are hidden and not explicitly shown. This appendix provides a more complete description. X.2.1 Suppose e is an expression. Consider the set of all subexpressions of e; denote this set by P(e). We take it for granted that subexpressions are defined and exist. Use e1, e2, etc., to stand for subexpressions; and 0 to stand for the empty subexpression. Both e and 0 are in P(e). A partial ordering can be defined for elements of P(e) according to the specification of the Standard. If this partial ordering is inconsistent, the Standard has an error. The Standard does not talk about the empty subexpression; we add the element 0 and define it to compare smaller than all other elements. We use < to represent the partial order and write e1 < e2 if e1 is evaluated earlier than e2. (e1 is smaller than e2.) For two elements e1 and e2 with e1 < e2, if there is no distinct third element lying in between, we say e2 immediately succeeds e1. We can then draw a graphical representation of the partial ordering as follows (using transitive reduction). Draw all subexpressions as nodes of a graph. Two nodes are connected by a directed edge if one immediately succeeds the other; the direction of the edge points to the smaller node. In general this graph is not a tree. If we ignore the comma, &&, || and ?: expressions, and if we ignore 0, the graph of the partial ordering is a tree; and it has a 1-1 correspondence with the syntax tree. Furthermore, the partial ordering is a lattice, as any two elements has a unique least upper bound and a unique greatest lower bound. The upper bound of the whole lattice is the root of the tree, and the lower bound the 0. If we include the comma, &&, || and ?: expressions, the situation becomes more complicated and the graph is no longer a tree (even if we ignore the 0). But the partial ordering remains to be a lattice. The upper bound of the whole lattice is still the whole expression e. Note: We don't have to worry about the conditional expressions where part of the expression may not be evaluated. We are concerned only with the ordering here, not with the actual evaluation of the subexpressions. All nodes involved are represented in a single partial ordering. X.2.2 Correspondence with n925 Let us use an example to illustrate the partial ordering. u * v + w * x The subexpressions are: e1 = u * v e2 = w * x e = e1 + e2 0 = empty u, v, w, x are also subexpressions. They represent themselves. The partial ordering is shown below: e / \ / \ e1 e2 / \ / \ u v x y \ | | / \ | | / \ \ / / \ | / 0 If we start from 0 and follow all the edges leaving away from it and trace through the paths towards e, we get four sequences of nodes. Each sequence is a total ordering. Note that some nodes are duplicated among the sequences. The evaluation order of the whole expression is derived by interleaving the (distinct) nodes and merging the sequences into a single one, subjected to the ordering within each sequence. There could be more than one way to do the interleaving. If we follow n925 and annotate the events identified by the model beside the nodes in the lattice, we obtain a correspondence with the model. For conciseness, can we annotate only the new events as they come into existence at a particular node. The node then inherits all the events from nodes smaller than it. The rules derived during the analysis define the same partial ordering as the lattice. Certain details are ignored in the above. n925 makes a distinction between central and incidental events, and not all events are inherited under all circumstances. But we should be able to add these details without affecting the general theme. The last stage of the analysis in n925 is to determine all the permitted ordering of the events. This is equivalent to all the possible interleaving of the sequences as described above. X.2.3 Correspondence with the abstract syntax tree notation For expressions without comma, &&, || and ?: subexpressions, the correspondence between the lattice and the abstract syntax tree is obvious. For expressions with comma, &&, || or ?:, the syntax tree notation avoids the complexities of drawing the full partial ordering by introducing step A.5.2 in the algorithm in section 8. The syntax tree together with A.5 gives the same partial ordering as the lattice. X.2.4 Interpretation of the Standard The expression syntax and semantics defines the evaluation order of the subexpressons. This partial ordering should be well-defined (i.e. consistent); otherwise the Standard is defective. As long as the partial ordering is well-defined, the Standard cannot be 'wrong' or ambiguous in the following sense. Whatever the Standard does not specify, or whatever the Standard specifies as unspecified, the corresponding relationship among the subexpressions is simply not there in the partial ordering. The only effect this has in the analysis is that there are more permitted ways of interleaving the events. More expressions become undefined. The rules listed in section 5 is an interpretation of the Standard. We want an interpretation that gives a set of well-defined expressions matching our expectations (as programmers). In this respect, the Standard is unclear in the areas of floating-point flags, volatile and signal handling. The Standard is also not adequate in describing function calls. Even though DR087 suggested function calls do not overlap, the Standard does not explicitly talk about this. ===== End =====