Accredited Standards Committee X3       Doc No: X3J16/94-0181   WG21/N0568
  Information Processing Systems          Date:   Sept 27, 1994   Page 1 of 13
  Operating under the procedures of       Project: Programming Language C++
  American National Standards Institute   Ref Doc:
                                          Reply to: Josee Lajoie
                                                    (josee@vnet.ibm.com)
+------------------+
| C++ Memory model |
+------------------+


1) C's Memory Model
===================

This section uses the information provided in the ISO C standard as
well as the information provided by Tom MacDonald in core message 4156
describing the content of C's Defect Report 69 and the proposed
resolution for this defect presented by Tom Plum in core message 4229.

1.1 unsigned char is a "byte"
-----------------------------

The proposed resolution for defect report #69 presented in core message
4229 indicates that the type 'unsigned char' is the C type that
represents a 'byte' of memory:

    For any object type T, the underlying bytes of the object can
    be copied into an array of  unsigned char :

    #define N sizeof(T)
    union aligned_buf { T t; unsigned char s[N]; } buf;
    T object;

    memcpy(buf.s, (const void *)&object);

Even though core message 4229 doesn't explicitly say so, I will also
assume that:

    After this memcpy operation, 't' has the same value as 'object'.
    The memcpy operation is guaranteed to be well-defined, even if
    'object' does not hold a valid value of type T.


1.2 terminology
---------------

Core message 4229 defines some terminology:

    #define N sizeof(T)
    union aligned_buf { T t; unsigned char s[N]; } buf;

  The _object representation_ of an object consists of the resulting
  sequence of  N  unsigned char  objects in the buffer.  The object
  representation is the amount of storage taken up by the object of
  type T, amount of storage which is described as an array of
  unsigned char .

-------- X3J16/94-00181 - WG21/N0568 ----- Lajoie:Memory Model ----- Page  2

  The _value representation_ of an object is the sequence of bits
  in the array of unsigned char that holds the value of type T.

  The bits of the value representation determine a _value_, which is
  one discrete element of an implementation-defined set of values.

  Example:
    Here is an example.  Consider a (possibly hypothetical)
    implementation whose int value representation provides one sign
    bit and 40 integer bits.

        +-+---------------------+
        | |                     |
        +-+---------------------+
         1         40

    Its object representation provides one sign bit, a hole
    containing seven non-participating bits, and 40 integer bits:

        +-+------+---------------------+
        | |      |                     |
        +-+------+---------------------+
         1   7        40


1.3 representation of signed and unsigned integer types
-------------------------------------------------------

The ISO C standard already specifies many requirements regarding the
representation of signed and unsigned integer types.  From ISO C
Standard, sub-clause 6.1.2.5:
  For each of the signed integer types, there is a corresponding (but
  different) "unsigned integer type" (designated with the keyword
  unsigned) that uses the same amount of storage (including sign
  information) and has the same alignment requirements.

[Note: using the terminology defined in section 1.2 above, this
       paragraph can be interpreted to say that the object
       representation of a signed integer type must be the same as the
       object representation of its corresponding unsigned integer
       type.]

Sub-clause 6.1.2.5 continues:
  The range of nonnegative values of a signed integer type is a
  subrange of the corresponding unsigned integer type, and the
  representation of the same value in each type is the same.(16)
  (16) The same representation and alignment requirements are meant to
       imply interchangeability as arguments to functions, return
       values from functions, and members of unions.

[Note: using the terminology defined in section 1.2 above, this
       paragraph can be interpreted to say that the value
       representation of a nonnegative value of a signed integer type
       must be the same as the value representation of the same value of

-------- X3J16/94-00181 - WG21/N0568 ----- Lajoie:Memory Model ----- Page  3

       the corresponding unsigned integer type.]

Sub-clause 6.1.2.5 continues:
  The [value] representations of integral types shall define values
  by use of a pure binary numeration system (18).
  (18) A positional representation for integers that uses the binary
       digits 0 and 1, in which the values represented by successive
       bits are additive, begin with 1, and are multiplied by
       successive integral power of 2, except perhaps the bit with the
       highest position.

With regards to the value representation of integral types, core
message 4156 also indicates that:
  The C standard Committee intended to permit 1's complement, 2's
  complement and signed magnitude implementations.


1.4 object representation vs value representation of integral types
-------------------------------------------------------------------

Core message 4229 therefore concludes:

  For character types, all bits of the object representation
  participate in the value representation.  This requirement does not
  hold for other types.

  For the type unsigned char , all possible bit patterns of the value
  representation represent numbers.  If all values of type char are
  nonnegative, then this is also true type char.  This requirement
  does not hold for other types.


1.5 value representation of scalar types
----------------------------------------

Core message 4229 indicates that:

  The value representation of floating-point and pointer types is
  implementation-defined.


1.6 Examples of implementations
-------------------------------

I took these examples from core-4156.  However, I believe some of the
answers provided in core-4156 need to be changed in the light of the
proposed resolution for defect 69 provided in core-4229 by Tom Plum.
After further discussions on this topic with Tom Plum and Bill Plauger,
here are the answers I believe are accurate in the light of the
definitions above.  The changes from core-4156 are marked with '|' in
the left margin.

-------- X3J16/94-00181 - WG21/N0568 ----- Lajoie:Memory Model ----- Page  4

  Q: In particular, are the following five implementations allowed?

  h) Unsigned values are pure binary.  Signed values are represented
     using ones complement (in other words, positive and negative
     values with the same absolute value differ in all bits, and zero
     has two representations).  Positive numbers have a sign of 0, and
     negative numbers a sign of 1.  In both cases, all bits are
     significant.
  h) Yes, provided there is no other violation of the Standard.

  i) Unsigned values are pure binary.  Signed values are represented
     using sign-and-magnitude with a pure binary magnitude (note that
     the top bit is not "additive").  Positive numbers have a sign bit
     of 0, and negative numbers a sign bit of 1.  In both cases, all
     bits are significant.
  i) Yes, provided there is no other violation of the Standard.

  j) Unsigned values are pure binary, with all bits significant.
     Signed values with an MSB (sign bit) of 0 are positive, and the
     remainder of the bits are evaluated in pure binary.  Signed values
     with an MSB of 1 are negative, and the remainder of the bits are
     evaluated in BCD.  If ints are 20 bits, then INT_MAX is 524,287 and
     INT_MIN is -79,999.
  j) No, it is not a pure binary system.

  k) Signed values are twos-complement using all bits.  Unsigned values
     are pure binary, but ignoring the MSB (so each number has two
     representations).  In this implementation, SCHAR_MAX==UCHAR_MAX,
     SHRT_MAX==USHRT_MAX, INT_MAX==UINT_MAX, and LONG_MAX==ULONG_MAX.
| k) No,
|    contradicts the resolution listed in section 1.4 above:
|    that is, for character types, _all bits_ of the object
|    representation must contribute to the value representation.  In
|    particular, the condition SCHAR_MAX==UCHAR_MAX doesn't respect
|    this resolution.

  l) Signed values are twos-complement.  Unsigned values are pure
     binary.  In both cases, the top three bits of the value are ignored
     (and each number has eight representations).  For signed values,
     the sign bit is the fourth from the top.
| l) No,
|    contradicts the resolution listed in section 1.4 above:
|    that is, for character types, _all bits_ of the object
|    representation must contribute to the value representation.

-------- X3J16/94-00181 - WG21/N0568 ----- Lajoie:Memory Model ----- Page  5

1.7 Aliasing
------------

1.7.1 Reinterpret cast

What should the behavior of the following example be?

    extern int *pi;
    extern unsigned int *pui;

    *pi = 1;
    pui = (unsigned int *) pi;
    *pui == 1; //1

Does ISO C guarantee that line //1 always yield true?

I believe it does.

C indicates that:
  Sub-clause 6.1.2.5:
  o ints and unsigned ints have the same object representation.
  o for the range of nonnegative values that can be represented by both
    a signed int and an unsigned int, the value representation of the
    value as a signed int must be the same as the value representation
    of the value as an unsigned int.
  Sub-clause 6.3 (expressions):
  o a stored value of type signed int can be accessed by an lvalue of
    type unsigned int.
  Sub-clause 6.3.4 (cast operator):
  o the resulting pointer may not be valid if it is improperly aligned
    for the type pointed to [which is not the case we have here].


1.7.2 Unions

What should the behavior of the following program be?

    union X {
        int x;
        unsigned int y;
        unsigned char buffer[sizeof(int)];
    } u;

    u.x = 1;
    u.y == 1; //1

Does ISO C guarantee that line //1 always yield true?

No, it doesn't.

>From ISO C Standard, Section 6.1.2.5 indicates that:
   "The value of at most one of the members can be stored in a union
    object at one time."

-------- X3J16/94-00181 - WG21/N0568 ----- Lajoie:Memory Model ----- Page  6

Also, the proposed resolution for defect #69 [ listed in section 1.1
above ] also clearly indicates that:

    For any object of type T, the underlying bytes of the object can
    _be copied_ into an array of unsigned char.

As Tom Plum emphasizes in core-4642:

    It was for very deliberate reasons that WG14 defined "object
    representation" in terms of an array of unsigned char which was
    _copied_ via memcpy, not aliased over the same storage.

    The ISO C rule quoted above grants license for super-checking
    environments to diagnose programs which fetch out of a different
    union member; it's an undefined behavior, strictly speaking.  The
    "same representation" rules do imply that, if your implementation
    isn't so pedantic as to diagnose this union-overlaying, then certain
    unsurprising behaviors must result.


2) What should C++ say?
=======================

2.1 As close to C as possible...
--------------------------------

I believe C++ has to allow what ISO C currently allows.
And I believe the WP is fairly close.

Sub-clause 3.7.1 [_basic.fundamemntal_] indicates that
  o unsigned types occupy the same storage and have the same alignment
    requirements as their corresponding signed types.

Sub-clause 9.2 [ _class.mem_ ] indicates that:
  o The range of nonnegative values of a signed integral type is a
    subrange of the corresponding unsigned integral type and the
    representation of the same value in each type is the same.
  o A program can access the stored value of an object other through an
    lvalue of one of the following types:
    [ ... ]
    . a type that is the signed or unsigned type corresponding to the
      declared type of the object, ...

Sub-clause 9.6 [ _class.union_ ] indicates that:
  At most t one of the member objects can be stored in a union at any
  time.

Proposal
--------

  Incorporate in section 3.7 [_basic.types_] and its sub-clauses the
  resolutions described in section 1.1 and 1.4:

-------- X3J16/94-00181 - WG21/N0568 ----- Lajoie:Memory Model ----- Page  7

  o  For any object type T, the underlying bytes of the object can
     be copied into an array of unsigned char .  The memcpy operation
     is guaranteed to be well-defined, even if the object does not hold
     a valid value of type T.
  o  For character types, all bits of the object representation
     participate in the value representation.  This requirement does not
     hold for other types.
  o  For the type unsigned char , all possible bit patterns of the
     value representation represent numbers.  If all values of type char
     are nonnegative, then this is also true type char.  This
     requirement does not hold for other types.

  See Appendix A for a complete description of the proposed WP changes.


2.2 Can any character type represent raw storage?
-------------------------------------------------

This is the thorniest issue in this paper.  Should C++ allow more than
what C allows and say that any character type can be used to manipulate
raw storage?  For example, should C++ allow the following:

    For any object type T, the underlying bytes of the object can
    be copied into an array of  char .

Many C and C++ programmers assume that this is true.
Many C++ libraries assume that this is true.
However, the C standard does not require implementations to support it.

>From Tom Plum in a private email:
    The problem with the use of char in C libraries is that "value
    collapse" can (theoretically) happen during assignment.  E.g. if a
    ones-complement system distinguishes +0 and -0, where 0xFF is -0 and
    0x00 is +0, AND if -0 is converted to +0 before the assignment, and
    a char receives 0xFF
       char c = 0xFF;
    you could find 0x00 in c afterwards.  Most of WG14 thought that we
    never ruled out "value collapse" so it remains a problem in C.

So what should C++ do?
>From Tom Plum in a private email:
    In my opinion, it would be a good idea for C++ to specify that
    value collapse cannot happen during assignment; the bit patterns
    must be copied.  But that is not a total solution, because how can a
    null termination byte (0x00) be distinguished from a 0xFF byte, in a
    ones-complement system?  When the program contains
       while ( c == 0)
    and c has the representation 0xFF, how could c==0 produce anything
    other than "true"?  So I'm still not sure what we can do, unless we
    prohibit ones-complement for char type.  There is still a problem
    here.

-------- X3J16/94-00181 - WG21/N0568 ----- Lajoie:Memory Model ----- Page  8

Proposal
--------
    Ones-complement arithmetic for character types is prohibited in C++.
    Any character type can be used to manipulate raw storage; that is,
    for any character type, all possible bit patterns of the value
    representation represent numbers.

    For any object of type T, the underlying bytes of the object can be
    copied into an array of C where C is any one of the character types:

        #define N sizeof(T)
        union aligned_buf { T t; C s[N]; } buf;
        T object;

        memcpy(buf.s, (const void *)&object);

        After this memcpy operation, 't' has the same value as
        'object'.  The memcpy operation is guaranteed to be
        well-defined, even if 'object' does not hold a valid value of
        type T.


2.3 Can any integral type manipulate raw storage?
-------------------------------------------------

The C standard does not require implementations to support this.  I do
not believe this is necessary and believe this would restrict
implementations too much.  I therefore do not propose that C++ imposes
the character type restrictions on all integral types.


3. Uninitialized Objects
========================

With the exception of objects of type unsigned char (proposal 2.1
above), and possibly of objects of type char and signed char (proposal
2.2 above), objects that are not initialized may contain invalid values
for their types.

Proposal
--------
Add to section 8.5 Initializers [ _dcl.init_ ]:
    An uninitialized object has unspecified value and referring to an
    object with an unspecified value results in undefined behavior.

What does this mean for the copy constructor and assignment operator?
The C++ WP specifies that [12.8 _class.copy_ ]:
    "If not declared by the programmer, they [ the copy constructor
     and assignment operator ] will be automatically defined
     (synthesized) as memberwise initialization and memberwise
     assignment of the base classes and non-static data members of the
     class, respectively."

-------- X3J16/94-00181 - WG21/N0568 ----- Lajoie:Memory Model ----- Page  9

Since the synthesized copy constructor and assignment operator are
defined to be memberwise initialization and memberwise assignment, if
some of the members are uninitialized, the synthesized copy constructor
and assignment operator will refer to uninitialized members and
therefore will have undefined behavior.

Example 1:
      struct S {
          int i;
          float f;
      } s1, s2;
      ...
      s1 = s2;     //1

      In this example, s2.i and s2.f are uninitialized.
      The assignment on line //1 therefore has undefined behavior.
      This behavior is the same as the behavior to be expected from a C
      program.

Example 2:
      class complex {
          float f, g;
          complex() { }
      };
      s1 = s2;     //1

      Since complex's constructor leaves f and g uninitialized, the
      assignment on line //1 has undefined behavior.


Appendix A - Suggested WP changes
=================================


3.7  Types                                               [basic.types]

Add the following text after paragraph 1:

+ 2 For any object type T, the underlying bytes of the object can be
+   copied (using the memcpy library function [ref]) into an array of
+   character type.  The copy operation is guaranteed to be
+   well-defined, even if the object does not hold a valid value of type
+   T.

+ 3 The object representation is the amount of storage taken
+   up by an object of type T, amount of storage which is described as
+   an array of character type.

+ 4 The value representation of an object is the sequence of bits in the
+   array of character type that holds the value of type T.

+ 5 The bits of the value representation determine a value, which is
+   one discrete element of an implementation-defined set of values.

-------- X3J16/94-00181 - WG21/N0568 ----- Lajoie:Memory Model ----- Page 10

3.7.1  Fundamental types                           [basic.fundamental]

  1 There are several fundamental types.  The standard header <climits>
    specifies the largest and smallest values of each for an
    implementation.

  2 Objects declared as characters (char) are large enough to store
    any member of the implementation's basic character set.  If a
    character from this set is stored in a character variable, its value
    is equivalent to the integer code of that character.
-   It is implementation-specified whether a char object can take on
-   negative values.
    Characters may be explicitly declared unsigned or signed.  Plain
    char, signed char, and unsigned char are three distinct types.  A
    char, a signed char, and an unsigned char occupy the same amount of
    storage
+   (including sign information) and have the same alignment
+   requirements; that is, they have the same object representation.
    In any particular implementation, a plain char object can take on
    either the same values as a signed char or an unsigned char; which
    one is implementation-defined.
+   It is implementation-specified whether a char object can take on
+   negative values.  For character types, all bits of the object
+   representation participate in the value representation and all
+   possible bit patterns of the value representation represent numbers
+   (these requirements do not hold for other types).

  3 An enumeration comprises a set of named integer constant values.
    Each distinct enumeration constitutes a different enumerated type.
    Each constant has the type of its enumeration.

  4 There are four signed integer types: signed char, short int, int,
    and long int. In this list, each type provides at least as much
    storage as those preceding it in the list, but the implementation
    may otherwise make any of them equal in storage size.  Plain ints
    have the natural size suggested by the machine architecture; the
    other signed integer types are provided to meet special needs.

  5 For each of the signed integer types, there exists a corresponding
    (but different) unsigned integer type: unsigned char, unsigned
    short int, unsigned int, and unsigned long int, each of which
    occupies the same amount of storage and has the same alignment
    requirements (1.5) as the corresponding signed integer type;
+   that is, each signed integer type has the same object
+   representation has its corresponding unsigned integer type.
    (7) An alignment requirement is an implementation-dependent
    restriction on the value of a pointer to an object of a given type
    (5.4, 1.5).
    _________FootNote_________
    7) See 7.1.5.2 regarding the correspondence between types and the
    sequences of type-specifiers that designate them.
+   The range of nonnegative values of a signed integral type is a
+   subrange of the corresponding unsigned integral type, and the
+   value representation of the same value in each type is the same.

-------- X3J16/94-00181 - WG21/N0568 ----- Lajoie:Memory Model ----- Page 11

  6 Unsigned integers, declared unsigned, obey the laws of arithmetic
    modulo 2n where n is the number of bits in the representation of
    that particular size of integer.  This implies that unsigned
    arithmetic does not overflow.

  7 Type wchar_t is a distinct type whose values can represent distinct
    codes for all members of the largest extended character set
    specified among the supported locales (17.5.9.1).  Type wchar_t
    has the same size, signedness, and alignment requirements (1.5) as
    one of the other integral types, called its underlying type.

  8 Values of type bool can be either true or false.  (8) There are no
    signed, unsigned, short, or long bool types or values.  As
    described below, bool values behave as integral types.  Thus, for
    example, they participate in integral promotions (4.1, 5.2.3).
    Although values of type bool generally behave as signed integers,
    for example by promoting (4.1) to int instead of unsigned int, a
    bool value can successfully be stored in a bit-field of any
    (nonzero) size.

    _________FootNote_________
    8) Using a bool value in ways described by this International
    Standard as ``undefined,'' such as by examining the value of an
    uninitialized automatic variable, might cause it to behave as if is
    neither true nor false.

  9 Types bool, char, and the signed and unsigned integer types are
    collectively called integral types.  A synonym for integral type is
    integer type.  Enumerations (7.2) are not integral, but they can be
    promoted (4.1) to signed or unsigned int.
+   The representations of integral types shall define values by use of
+   a pure binary numeration system (FootNote).
+   _________FootNote_________
+   A positional representation for integers that uses the binary
+   digits 0 and 1, in which the values represented by successive bits
+   are additive, begin with 1, and are multiplied by successive
+   integral power of 2, except perhaps the bit with the highest
+   position.

+   For any integral type, 2's complement and signed magnitude
+   implementations are permitted, and for integral types other than
+   character types, 1's complement implementations are also permitted.

  10There are three floating point types: float, double, and long
    double.  The type double provides at least as much precision as
    float, and the type long double provides at least as much precision
    as double.  Each implementation defines the characteristics of the
    fundamental floating point types in the standard header <cfloat>.
+   The value representation of floating-point is
+   implementation-defined.
    Integral and floating types are collectively called arithmetic
    types.

-------- X3J16/94-00181 - WG21/N0568 ----- Lajoie:Memory Model ----- Page 12

  11The void type specifies an empty set of values.  It is used as the
    return type for functions that do not return a value.  No object of
    type void may be declared.  Any expression may be explicitly
    converted to type void (5.4); the resulting expression may be used
    only as an expression statement (6.2), as the left operand of a
    comma expression (5.18), or as a second or third operand of ?: (5.16).

+ 12Even if the implementation defines two or more basic types to have
+   the same value representation, they are nevertheless different
+   types.


3.7.2  Compound types                                 [basic.compound]

Add the following text after paragraph 4 and 5:

  4 A pointer to objects of a type T is referred to as a pointer to T.
    For example, a pointer to an object of type int is referred to as
    pointer to int and a pointer to an object of class X is called a
    pointer to X.  Pointers to incomplete types are allowed although
    there are restrictions on what can be done with them (3.7).
+   The value representation of pointer types is
+   implementation-defined.

  5 Objects of cv-qualified (3.7.3) or unqualified type void* (pointer
    to void), can be used to point to objects of unknown type.  A void*
    must have enough bits to hold any object pointer.
+   A qualified or unqualified void* shall occupy the same amount of
+   storage and have the same alignment requirements, that is, have the
+   same object representation, as a qualified or unqualified char*.


5 Expressions                                                     [expr]

Add the following text after paragraph 11:

+ 12If the program attempts to access the stored value of an object
+   other than through an lvalue of one of the following types:
+
+   o the dynamic type of the object,
+
+   o a qualified version of the declared type of the object,
+
+   o a type that is the signed or unsigned type corresponding to the
+   declared type of the object,
+
+   o a type that is the signed or unsigned type corresponding to a
+   qualified version of the declared type of the object,
+
+   o an aggregate or union type that includes one of the aforementioned
+   types among its members (including, recursively, a member of a
+   subaggregate or contained union), or
+
+   o a character type. (40)

-------- X3J16/94-00181 - WG21/N0568 ----- Lajoie:Memory Model ----- Page 13

+   the result is undefined.
+
+   _________FootNote_________
+   40) The intent of this list is to specify those circumstances in
+   which an object may or may not be aliased.


8.5 Initializers                                              [dcl.init]

Add the following text after paragraph 9:
+ 10An uninitialized object has unspecified value and referring to an
+   object with an unspecified value results in undefined behavior.


9.2 Class Members                                            [class.mem]

Delete paragraph 16 to 22.