- Tom introduced the topic for discussion:
        
          - SG16 approved P1885R0 to forward to LEWG in Belfast.
            
          
 - Corentin has now provided an R1 with minor updates.
 
          - Since then, concerns were raised on the SG16 mailing list:
            
          
 - Questions of use cases have been raised.
 
        
       - Corentin stated that use cases haven't changed from his perspective
          and that the discussion on the mailing list went off on a
          tangent.
 
      - Tom replied that the discussion suggested a lack of consensus on the
          importance of a name vs a MIB ID.
 
      - Corentin stated that what is proposed is just a name intended to
          resolve issues with names not being portable across platforms.  The
          proposal relies on MIB IDs to correlate names for use with third
          party products.  The proposal does not allow dynamically adding
          names so as to avoid the possibility of inconsistent results.
 
      - Tom asked what the motivation was for not including enumerators for
          all MIB IDs in text_encoding::id, but to require the
          implementation to support all names and aliases from the
          IANA Character Set Registry.
 
      - Corentin replied that the requirements were changed in R1.  Hosted
          implementations are now required to support all of the names, but
          freestanding implementations need not.
 
      - Tom asked for clarification regarding omission of enumerator IDs.
 
      - Corentin replied that, if we specify enumerator names for all
          registered character sets, then we'll have to maintain that list.
          Additionally, if implementors can add names, that could lead to
          portability or compatibility issues.  Discussion with others prior to
          Belfast suggested more names were not needed.
 
      - Jens summarized the concern; the RFC has ~150 names and we would have
          to put all 150 names into the enumeration and deal with the
          maintenance.  If we select just a few names, then we don't have a
          maintenance burden.
 
      - Tom countered that use of the cs prefixed identifiers
          described in section 2.3 of
          RFC 2978
          and maintained in the
          IANA Character Set Registry
          would avoid the portability and compatibility concerns and provide a
          specification we can defer to.
 
      - Corentin replied that it isn't quite that simple because of version
          skew and that exposing MIB IDs to programmers has limited value to
          begin with.
 
      - Tom countered that, in the example use case provided in Belfast, you
          don't necessarily know what the name is.
 
      - [ Editor's note: That example use case is:
    template<class traits, class Rep, class Period>
    void print_fancy_suffix(basic_ostream<char, traits>& os, const duration<Rep, Period>& d)
    {
      if constexpr (text_encoding::literal().mib == UTF-8) {
        os << d.count() << "\u00B5s";
      } else {
        os << d.count() << "us";
      }
    }
        ]
       
      - Corentin replied that the use case could still be covered by
          comparing the implementation provided text_encoding object
          with one constructed by the programmer with a name.
 
      - [ Editor's note: Presumably something like:
    template<class traits, class Rep, class Period>
    void print_fancy_suffix(basic_ostream<char, traits>& os, const duration<Rep, Period>& d)
    {
      if constexpr (text_encoding::literal() == text_encoding("UTF-8")) {
        os << d.count() << "\u00B5s";
      } else {
        os << d.count() << "us";
      }
    }
        ]
       
      - Tom opined that string names are good for interaction with current
          third party libraries, but IDs are preferred for the example
          provided
 
      - Corentin replied that adding more enumerators is ok, but expressed
          discomfort with deferring to the IANA registry due to the possibility
          of incompatibilities arising from version skew.
 
      - Steve noted that the proposal only intends to provide portable names;
          there is no requirement for encoders and decoders to be provided.
 
      - Zach observed that no enumerator is provided for Windows-1252 and
          asked how an implementor that frequently traffics in that encoding
          would provide support.
 
      - Corentin responded that a text_encoding object can be
          constructed by name or that the fixed numeric value from the IANA
          registry can be used.
 
      - JeanHeyd asked if we could reserve a range of MIB IDs for use by
          implementations similar to the Private Use Area in Unicode.
 
      - Corentin replied that he is strongly opposed to doing so.
 
      - Corentin asked if we really want all of these names to be available
          as identifiers when we can just use strings.
 
      - Zach responded that he thinks it makes sense for cases where we know
          compilers default to certain encodings.
 
      - Corentin repeated that he doesn't want implementors to add their
          own names.
 
      - Jens asked about the source for the names whether as strings or
          identifiers.
          RFC 3808
          lists the MIB names with interesting spellings, and
          RFC 2978
          defines a registration process, but neither provides the latest
          names.
 
      - Steve provided the URL to the IANA registry and explained that the RFCs don't change, but specify the URL for the registry; which doesn't change often.
        
      
 
      - Tom added that the IANA registry mostly changes for administrative
          reasons, not because of new character set registrations.
 
      - Jens asked how it is determined which names are good for
          enumerators.
 
      - Tom replied that
          RFC 2978
          specifies that each registered character set have an associated name
          prefixed with "cs" that is appropriate for use as an identifier.
 
      - Jens asked why the names in the proposal do not match the "cs" names.
 
      - Corentin responded that he picked names that he preferred.
 
      - Jens asserted that, in that case, implementors cannot extend the
          list.
 
      - Zach stated that there isn't much cost in taking the list of "cs"
          prefixed names, removing dashes, and dumping that list in the wording
          and asked again for motivation for omitting them.
 
      - Corentin replied that he thought they were not needed.
 
      - Zach agreed that many would not be used much, but determining which
          ones are important would be difficult where as just including them
          all would be easy.
 
      - Tom asked Corentin, why he felt comfortable deferring to the IANA
          registry for string names, but not for enumerator names
 
      - Corentin replied that he felt that the names and alias names were
          definitive, but that the enumerator names seemed more fuzzy.
 
      - Corentin asked Jens if there are concerns regarding the use of
          trademark names in the standard; many of the character set names
          include trademark names.
 
      - Jens replied that we already use trademarked names like Windows and
          POSIX in the filesystem specification.
 
      - Steve added that these names have already been vetted by their
          respective owners, if necessary, for inclusion in the registry.
 
      - Jens asked if the names in the IANA registry might already be
          reflected in an ISO standard that we could reference instead.
 
      - Corentin replied that he was unaware of such an ISO standard.
 
      - Tom asked Jens how a search for such an ISO standard could be
          conducted.
 
      - Jens suggested searching for "character set" in the ISO list.
 
      - Steve noted that the RFC describing the IANA registration process
          does mention ISO standards such as ISO 10646, ISO 8859, and
          ISO 2022.
 
      - Corentin stated that web browsers, iconv, ICU, etc... all use the
          IANA registry; it is the defacto standard.
 
      - Jens expressed some uncertainty with regard to how to refer to these
          RFCs from the standard, but mentioned that we did similarly for the
          time zone database which is even less regulated.
 
      - Jens raised a concern about impact to small/embedded implementations.
          As proposed, they would have to include an instance of the string
          name table with every instance of the program and that could be
          problematic even for some hosted implementations.
 
      - Tom suggested that, if the string table is not referenced; e.g., if
          none of the text_encoding factory functions is referenced
          or if the <text_encoding> header is not included, that
          the implementation might be able to omit it.
 
      - Jens suggested that it would be helpful if the paper addressed cost
          of implementation and anticipated impact to deployments.
 
      - JeanHeyd suggested that the guarantee we make should be that if only
          text_encoding::system() or text_encoding::literal()
          are called, then there should be no string table overhead.
 
      - Jens asked if an implementation could provide support for a reduced
          set of names.  If not, the discussion of how to reduce deployment
          cost is warranted since, as proposed, this is not a zero-cost of
          zero-overhead solution.
 
      - Jens also stated a preference for the system() and
          wide_system() functions to return a MIB ID rather than a
          text_encoding object.
 
      - Corentin responded that there may be cases where the system encoding
          is not registered with IANA.  In that case, the MIB ID would be
          "unknown"; and a different interface would have to be used to retrieve
          the string name of the encoding anyway.
 
      - JeanHeyd provided WTF-8 and Modified UTF-8 as examples of encodings
          that are not registered with IANA but that are known to be in use on
          Android and elsewhere on the web.
 
      - Jens suggested that, in such cases, the implementation register their
          encoding.
 
      - Zach asked to clarify what the motivation is for supporting string
          names at all.
 
      - Tom responded that third party products like iconv and ICU have
          interfaces that require use of string names.
 
      - Corentin confirmed.
 
      - Tom added that the IANA registry is effectively a common subset of
          recognized names.
 
      - Zach stated a preference for omitting string names and just relying
          on MIB IDs.
 
      - Corentin responded that doing so would complicate use of iconv.
 
      - Hubert expressed a lack of motivation for an interface that relies
          on numeric values that no one knows; the string names make sense.
 
      - Jens pondered if string name to MIB ID lookup was an orthogonal
          feature.
 
      - Tom stated that question was posed in the mailing list discussion
          as well.
 
      - Corentin mentioned existing host system interfaces.  Windows provides
          a code page with an ID.  POSIX systems provide a name and no ID.
 
      - Jens suggested that an interface that provides a string name does not
          suit all use cases.  For example, a programmer might desire to assert
          a specific system encoding; that shouldn't require a full string
          table.
 
      - Zach expressed a desire for the interface to provide more safety and
          that he would prefer a list of identifiers over a list of string
          names.
 
      - Hubert suggested other benefits of the string names, 1) useful for
          interaction with the system and third party libraries, and 2) useful
          for interchange or serialization.
 
      - Hubert expressed concern about use of a string interface for looking
          up an encoding name and asked what name is provided in response to a
          lookup of a MIB ID.
 
      - Corentin replied that there is no proposed lookup interface that
          accepts a MIB ID.  The factory interfaces like
          text_encoding::system() return a preferred name, but
          otherwise, the name provided when constructing a
          text_encoding object is preserved.
 
      - Jens expressed a desire for a low-level interface that just returns
          an integer that could be used to assert the environment is UTF-8
          without having to compare with a bunch of strings; that could be a
          zero overhead facility.
 
      - Hubert asked if there is overhead if neither of
          text_encoding::system() or
          text_encoding::wide_system() is called.
 
      - Corentin responded that yes, there is, but it is low.
 
      - Hubert cautioned that some standard library implementors are likely
          to oppose anything that increases startup cost or requires
          "static constructors".
 
      - Tom asked why the interface couldn't perform a lazy lookup.
 
      - Corentin responded that calls to setlocale() could interfere;
          text_encoding::system() is intended to return the locale
          dependent encoding known at program startup time.
 
      - [ Editor's note: Later discussion on the SG16 mailing list
          revealed that it is possible on POSIX systems to retrieve the locale
          dependent encoding known at program startup time regardless of
          intervening calls to setlocale() with code like:
     locale_t loc = newlocale(LC_CTYPE_MASK, "", (locale_t)0);
     const char* name = nl_langinfo_l(CODESET, loc);
     ...
     freelocale(loc); 
          ]
       
      - Hubert suggested that programmers can collect this information on
          their own and that they should be aware if some library is calling
          setlocale() before main() is invoked.
 
      - Tom agreed, but stated that doing so is hard in practice,
          particularly for library authors.
 
      - JeanHeyd observed that the C library behavior depends on the
          currently set locale and asked what benefit is provided by
          text_encoding::system() if it's not in sync with the C and
          C++ libraries.
 
      - Tom responded that it indicates what encoding is expected for I/O
          outside of the process.