International Locale Support in the Standard C++ Library (Revision 1) ANSI: X3J16/93-0167R1 ISO: WG21/N0374R1 by Nathan Myers myersn@roguewave.com Rogue Wave Software, Inc. P.O.Box 2328, Corvallis, OR 97339 USA voice: (800) 487-3217 FAX: (503) 757-6650 Copyright 1993 by Rogue Wave Software, Inc. 1. Introduction The need for standard language libraries to support international character sets and data formats was acknowledged by the original ANSI C committee. They invented and standardized the new library component described by , and specified that calls to this new component would change the behavior of many of the traditional C library functions. As one result, we all got a chance to experiment with internationalization. 2. Improvements Possible over Standard C Library Locales The interface to the original Standard C Library implied a nest of hidden global variables that affect many standard global functions, which limits its use in thread-safe libraries and makes use of multiple locales difficult. In addition, a number of important features were omitted, such as parsing the new numeric, monetary, and date/time output formats it defined. "Amendment 1" to the Standard C Library fixes some of these problems. However, encapsulating locale and character set semantics is still unsupported, as is parsing local data formats. The C library's restrictions and omissions imperil portability in two important (and rapidly growing) application domains: multithreaded programs, and network servers. Therefore, this proposal describes a pure C++ approach. I have not tried to go far beyond the domain covered by the Standard C Library, to include every feature proposed by the POSIX and X/Open groups, because I feel timeliness is more important than an absolute (and perhaps illusory) completeness, at this stage. First, we need a firm foundation that can support extensions. ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 2 3. C++ Library Locale Facilities A Standard C Library locale has five parts, or "categories": collation order, character classes (ctype), and monetary, numeric and date/time formats. (POSIX adds a sixth: messages.) The categories are all brought together under a locale because they are interindependent. This proposal adopts those same categories. A C++ locale is, of course, an object. In implementation, it can be quite heavy-weight. To ease the memory management burden on users, it is implemented as a value which may be cheaply assigned, passed as an argument, and stored as member data. To avoid ambiguous semantics, locales are immutable; but a new locale can be constructed as a variation on an existing locale. In addition to its constructors, class locale provides static member functions to obtain three standard locales. locale::classic() returns an instance that implements the classic "C" locale. locale::global() returns a stable snapshot of the current global locale. locale:: transparent() returns an instance which dynamically tracks the state of the global locale. Any combination of the above behaviors may be obtained easily by derivation. Besides its re-entrancy and convenient encapsulation, a notable difference between class locale and the Standard C Library facilities is that (as in Schwarz's iostreams) little provision is made for in-memory multibyte strings. If needed, they may be generated with the help of a strstream or stringstream; but since the wide form is more practical, the design is optimized in its favor. 4. Relationship With class ios Iostream function semantics depend on local preferences. Because the functions (and the operators) involved take no locale argument, and because different streams may require different conversions, each iostream needs a locale member to use for reference during locale-dependent operations. This proposal simplifies Jerry Schwarz's proposal (X3J16/93-0125, WG21/N0332) by eliminating the members ios::btowc(), ios::wcisb(), and ios::wctob() described in his section 7.2. Instead, ios is expected to delegate to the locale any such conversions. In place of these functions, ios gains functions to set and retrieve its locale member. I call setting an iostream's locale member "imbuing the iostream". First, to set it: locale ios::imbue(locale const& loc); ios::imbue() returns the previously imbued locale. ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 3 To retrieve the currently imbued locale: locale ios::rdloc() const { return this->xlocale; } If none has been imbued, it returns the classic "C" locale. To use an imbued locale, operators >> and << simply call its member functions via the public interface ios::rdloc(). For example: ostream& ostream::operator<<(long i) { rdloc().insert(*this, i); return *this; } Extraction operators must use "loc.is(locale::SPACE, c)" in place of "isspace((unsigned char)c)" to delimit fields, if they are to support large character sets. Implicit in this proposal is Schwarz's proposed support for large characters in the iostream library. In particular, a locale "knows" only one codeset, so codeset conversions must be done with the help of class iotransform. Because the Standard C Library locale's magical effect on other global functions was its the most troublesome quality, I have proposed that iostreams default to the "C" locale behavior regardless of the current global locale; this guarantees predictable, classical behavior until something else is asked for. In effect, each stream is initially imbued with locale::classic(). 5. Relationship with Classes string and wstring The string classes interact with locales in two areas: collation, and character classing. While class locale provides primitives to collate strings, an interface provided by the string classes would be more convenient. For example: inline int collate(string const& a, string const& b, locale const& accordingTo = locale::global()) { return accordingTo.collate(a.c_str(), a.length(), b.c_str(), b.length()); } The default argument allows coders with no interest in locales to ignore them, without penalizing the more ambitious. I propose, here, that this integration with the string classes be done. ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 4 6. Relationship With Classes Date, Time, and Money Although the committee is not considering proposals for date, time, or money classes, many commercial libraries provide them. Class locale provides low-level primitives to convert to and from preferred character representations for such values, using portable representations such as struct tm, double, and character sequences. A convenient way to integrate locale support is to provide an optional locale argument in conversion functions (as above), and to have the operators << and >> use the locale imbued in the stream. For example, we might see: class Date { ... public: Date(unsigned day, unsigned month, unsigned year); string asString(locale const& loc = locale::global()); }; istream& operator>>(istream& s, Date& d) // for example { struct tm t; s.rdloc().extractdate(s, &t); if (s) d = Date(t.tm_day, t.tm_mon+1, t.tm_year+1900); return s; } string Date::asString(locale const& loc) { stringstream s; s.imbue(loc); return (s << *this).str(); } 7. Sample Definition: Here is the proposed header file. Detailed explanations follow in Section 8. I have omitted exception-handling declarations in this draft. Inline definitions provided for many functions are intended suggestively, as an aid to understanding. Vendors are allowed but not required to implement them inline. The locale instance physically contains only a pointer to a separate representation object, which provides protected virtual functions to allow extensible semantics. The locale class itself provides forwarding functions to give access to the virtuals, and to allow pre- and post-processing. The representation is reference-counted, and may in turn reference-count portions it shares with other representation instances. ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 5 Note the header file name, . As with , there is already a standard C header named . I propose the same solution as for strings: provides only the C facilities, exposed to the global namespace; provides both the C and the C++ facilities, each wrapped safely in its appropriate namespace. // #ifndef __locale__ #define __locale__ 1 namespace stdc { extern "C" { # include // declarations of C facilities, not wrapped. } }; #include /* for UCHAR_MAX */ namespace iso_standard { struct tm; // [ should these names be namespace-qualified? ] class istream; // class ostream; // class locale { public: enum category { COLLATE = 1<<0, CTYPE = 1<<1, MONETARY = 1<<2, NUMERIC = 1<<3, TIME = 1<<4, MESSAGES = 1<<5, ALL = (1<<6)-1 }; ~locale() { imp_->remove_reference(); } locale(locale const& l) : imp_(l.imp_) { imp_->add_reference(); } locale(locale::virtuals* imp) : imp_(imp) { imp_->add_reference(); } locale(char const*); locale(locale const&, char const*, category); locale const& operator=(locale const& other); bool ok() const { return imp_ != 0; } // construction succeeded? bool operator==(locale const& other) const { return imp_->equal(other.imp_, ALL); } bool operator!=(locale const& other) const { return !imp_->equal(other.imp_, ALL); } bool equal(locale const& other, category cat = ALL) const { return imp_->equal(other.imp_, cat); } // iostream support: void insert(ostream& s, bool v) const { imp_->insert(s,v); } void insert(ostream& s, long v) const { imp_->insert(s,v); } void insert(ostream& s, unsigned long v) const { imp_->insert(s,v); } void insert(ostream& s, double v) const { imp_->insert(s,v); } // void insert(ostream& s, long double v) const { imp_->insert(s,v); } ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 6 void extract(istream& s, bool v) const { imp_->extract(s,v); } void extract(istream& s, long& v) const { imp_->extract(s,v); } void extract(istream& s, unsigned long& v) const { imp_->extract(s,v); } void extract(istream& s, double& v) const { imp_->extract(s,v); } // void extract(istream& s, long double& v) const { imp_->extract(s,v); } int narrow(wchar_t w, char& c) const { return imp_->narrow(w,c); } int widen(char c, wchar_t& w) const { return imp_->widen(c,w); } // ctype functions enum ctype { NO_MATCH=0, SPACE=1<<0, PRINT=1<<1, CNTRL=1<<2, UPPER=1<<3, LOWER=1<<4, ALPHA=1<<5, DIGIT=1<<6, PUNCT=1<<7, XDIGIT=1<<8, ALNUM=(1<<5)|(1<<6), GRAPH=(1<<7)|(1<<6)|(1<<5) }; bool is(ctype mask, char c) const { return ((int)imp_->ctypetable_[(unsigned char)c] & (int)mask) != 0); } bool is(ctype mask, unsigned char c) const { return is(mask,(char)c); } bool is(ctype mask, signed char c) const { return is(mask,(char)c); } bool is(ctype mask, int c) const { return ((c&~stdc::UCHAR_MAX) ? 0 : is(mask, (char)c)); } // notice that the above functions are wholly inline. bool is(ctype mask, wchar_t w) const { return imp_->is(mask, w); } size_t is(char const* s, size_t len, ctype* vec) const; size_t is(wchar_t const* s, size_t len, ctype* vec) const { return imp_->is(s, len, vec); } ctype namedctype(char const* s) const { return imp_->namedctype(s); } enum totype { NO_CHANGE, UP, DOWN // plus others returned by namedto() }; char to(totype t, char c) const { return imp_->to(t,c); } signed char to(totype t, signed char c) const { return to(t,char(c)); } unsigned char to(totype t, unsigned char c) const { return to(t,char(c)); } wchar_t to(totype t, wchar_t c) const { return imp_->to(t,c); } size_t to(totype t, char* s, size_t len) const { return imp_->to(t,s,len); } size_t to(totype t, wchar_t* s, size_t len) const { return imp_->to(t,s,len); } totype namedto(char const* s) const { return imp_->namedto(s);} ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 7 // string functions int collate(char const* sa, size_t la, char const* sb, size_t lb) const { return imp_->collate(sa, la, sb, lb); } int collate(wchar_t const* sa, size_t la, wchar_t const* sb, size_t lb) const { return imp_->collate(sa, la, sb, lb); } int transform(ostream& o, char const* s, size_t len) const { return imp_->transform(o, s, len); } int transform(ostream& o, wchar_t const* s, size_t len) const { return imp_->transform(o, s, len); } long hash(char const* s, size_t len) const { return imp_->hash(s, len); } long hash(wchar_t const* s, size_t len) const { return imp_->hash(s, len); } // time functions void insert(ostream& s, struct tm const* tmb, char const* pattern, size_t len) const; void insert(ostream& s, struct tm const* tmb, char format) const { imp_->insert(s,tmb,format); } void extracttime(istream& s, struct tm* t) const { imp_->extracttime(s,t); } void extractdate(istream& s, struct tm* t) const { imp_->extractdate(s,t); } void extractweekday(istream& s, struct tm* t) const { imp_->extractweekday(s,t); } void extractmonthname(istream& s, struct tm* t) const { imp_->extractmonthname(s,t); } enum dateorder { NO_ORDER, DMY, MDY, YMD, YDM }; dateorder date_order() const { return imp_->date_order(); } // money functions enum moneysymbol { NONE, LOCAL, INTL }; void insert(ostream& s, double units, Moneysymbol sym) const { imp_->insert(s, units, sym); } void insert(ostream& s, char* digits, Moneysymbol sym) const { imp_->insert(s, digits, sym); } void extractmoney(istream& s, double& units, Moneysymbol sym) const { imp_->extractmoney(s, units, sym); } void extractmoney(istream& s, ostream& digits, Moneysymbol sym) const { imp_->extractmoney(s, digits, sym); } int moneyfracdigits(locale::moneysymbol sym) const { return imp_->moneyfracdigits(sym); } // static members: static locale global(); // the current global locale static locale global(locale const&); // replaces ::setlocale(...) static locale const& transparent(); // the transparent global locale static locale const& classic(); // the "C" locale ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 8 class virtuals { protected: // miscellaneous virtual void name(ostream&) const = 0; virtual bool equal(virtuals const*, locale::category) const; // iostream support virtual void insert(ostream& s, bool v) const; virtual void insert(ostream& s, long v) const; // virtual void insert(ostream& s, long long v) const; virtual void insert(ostream& s, unsigned long v) const; // virtual void insert(ostream& s, unsigned long long v) const; virtual void insert(ostream& s, double v) const; virtual void extract(istream& s, bool& v) const; virtual void extract(istream& s, long& v) const; virtual void extract(istream& s, unsigned long& v) const; virtual void extract(istream& s, double& v) const; // [ what about types like "long long" and "long double"? ] virtual int narrow(wchar_t, char&) const; virtual int widen(char, wchar_t&) const; // ctype functions locale::ctype const* ctypetable_; // data member, for is(...); virtual bool is(locale::ctype mask, wchar_t) const; virtual size_t is(wchar_t const*, size_t, locale::ctype* vec) const; virtual locale::ctype namedctype(char const*) const; virtual char to(locale::totype, char c) const; virtual wchar_t to(locale::totype, wchar_t c) const; virtual size_t to(locale::totype, char*, size_t len) const; virtual size_t to(locale::totype, wchar_t*, size_t len) const; virtual locale::totype namedto(char const* s) const; // stdlib functions: virtual int collate(const char*, size_t len1, const char*, size_t len2) const; virtual int collate(const wchar_t*, size_t len1, const wchar_t*, size_t len2) const; virtual int transform(ostream&, char const*, size_t len) const; virtual int transform(ostream&, wchar_t const*, size_t len) const; virtual long hash(char const* s, size_t len) const; virtual long hash(wchar_t const* s, size_t len) const; ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 9 // time functions virtual void insert(ostream& s, struct tm const* tmb, char format) const; virtual void extracttime(istream& s, struct tm* t) const; virtual void extractdate(istream& s, struct tm* t) const; virtual void extractweekday(istream& s, struct tm* t) const; virtual void extractmonthname(istream& s, struct tm* t) const; virtual locale::dateorder date_order() const; // money functions virtual void insert(ostream& s, double units, locale::moneysymbol sym) const; virtual void insert(ostream& s, char* digits, locale::moneysymbol sym) const; virtual void extractmoney(istream& s, double& units, locale::moneysymbol sym) const; virtual void extractmoney(istream& s, ostream& digits, locale::moneysymbol sym) const; virtual int moneyfracdigits(locale::moneysymbol sym) const; virtual virtuals* copybut(char const*, locale::category) const; virtuals(size_t refs); virtual ~virtuals(); private: size_t refcount_; // data member, reference count, 0 => 1 ref. void add_reference() { if (this) ++refcount_; } void remove_reference() { if (this && refcount_-- == 0) delete this; } virtuals(virtuals const&); // not defined virtuals const& operator=(virtuals const&); // not defined friend class locale; }; private: virtuals* imp_; void name(ostream& s) const { imp_->name(s); } // used by operator<< // these insert and extract the unique ASCII name of a locale friend ostream& operator<<(ostream& s, locale const& l) { l.name(s); return s; } friend istream& operator>>(istream& s, locale& l); }; ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 10 class localev_byname : public locale::virtuals { localev_byname(const char*, size_t refs); // ... (Defines appropriate vendor semantics for all virtuals.) }; // locale::category bitwise operators: locale::category operator~(locale::category a); locale::category operator&(locale::category a, locale::category b); locale::category operator|(locale::category a, locale::category b); locale::category operator^(locale::category a, locale::category b); locale::category const& operator&=(locale::category& a, locale::category b); locale::category const& operator|=(locale::category& a, locale::category b); locale::category const& operator^=(locale::category& a, locale::category b); } // namespace iso_standard #endif /* !defined(__locale__) */ 8. Explanation of functions: Members of class locale ----------------------- locale(char const*); This is the generic constructor. It takes the same null-terminated string argument values as the C library function ::setlocale(...). enum category { COLLATE = 1<<0, CTYPE = 1<<1, MONETARY = 1<<2, NUMERIC = 1<<3, TIME = 1<<4, MESSAGES = 1<<5, ALL = (1<<6)-1 }; locale(locale const&, char const*, category cat); This constructor generates a variation from an existing locale. The *cat* argument may be any bitwise combination of the categories listed. locale(locale const& loc) : imp_(loc.imp_) { imp_->add_reference(); } locale const& operator=(locale const& other); These are the generic copy operators. As usual, the assignment operator must check for identity. ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 11 locale(locale::virtuals* imp) : imp_(imp) { imp_->add_reference(); } This constructor supports user-defined derivations from locale::virtuals. ~locale() { imp_->remove_reference(); } The destructor. Note that it is not virtual. bool ok() const { return imp_ != 0; } // construction succeeded? ok() must be used to determine if a locale was constructed successfully [if exceptions are disabled?]. enum ctype { NO_MATCH=0, SPACE=1<<0, PRINT=1<<1, CNTRL=1<<2, UPPER=1<<3, LOWER=1<<4, ALPHA=1<<5, DIGIT=1<<6, PUNCT=1<<7, XDIGIT=1<<8, ALNUM=(1<<5)|(1<<6), GRAPH=(1<<5)|(1<<6)|(1<<7) }; bool is(ctype mask, char c) const { return ((int)imp_->ctypetable_[(unsigned char)c] & (int)mask) != 0); } bool is(ctype mask, unsigned char c) const { return is(mask,(char)c); } bool is(ctype mask, signed char c) const { return is(mask,(char)c); } bool is(ctype mask, int c) const { return ((c&~stdc::UCHAR_MAX) ? 0 : is(mask, (char)c)); } bool is(ctype mask, wchar_t wc) const { return imp_->is(mask, wc); } is() implements semantics for the char types efficiently enough to be used per-character in stream operations, while remaining configurable. The wchar_t version is implemented virtually for greater flexibility. size_t is(char const* s, size_t len, ctype* vec) const; size_t is(wchar_t const* s, size_t len, ctype* vec) const { return imp_->is(s, len, vec); } is(s,len,vec) writes into vec[i] a description of the character in s[i], for i in [0..len-1]. It returns the number of characters successfully classified. The wchar_t* version is virtual, the char* version is not. ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 12 enum totype { NO_CHANGE, UP, DOWN // plus others returned by namedto() }; char to(totype t, char c) const { return imp_->to(t,c); } signed char to(totype t, signed char c) const { return to(t,char(c)); } unsigned char to(totype t, unsigned char c) const { return to(t,char(c)); } wchar_t to(totype t, wchar_t c) const { return imp_->to(t,c); } The functions to() take a character and a totype value, and return either that character or another, related character. size_t to(totype t, char* s, size_t len) const { return imp_->to(t,s,len);} size_t to(totype t, wchar_t* s, size_t len) const { return imp_->to(t,s,len);} The functions to(t,s,len) change the characters s[0..len-1] in place. They return the number of characters successfully changed. void insert(ostream& s, struct tm const* tmb, char const* pattern, size_t len); This interprets its *pattern* argument like the corresponding argument to the C library function ::strftime(), except that it is considered a simple byte string of given length, not a NTMBS. "%" directives in the *pattern* argument are implemented with calls to insert(ostream&, struct tm const*, char). void insert(ostream& s, struct tm const* tmb, wchar_t const* pattern, size_t len); This interprets its *pattern* argument identically as the standard C library function wcsftime(), except it is not NUL-terminated. "%" directives in the *pattern* argument are implemented with calls to insert(ostream&, struct tm const*, char). static locale global(); // the current global locale static locale const& transparent(); // the transparent global locale static locale const& classic(); // the "C" locale global() and transparent() differ in that the locale returned by global() is stable against subsequent calls to ::setlocale() or global(locale const&), whereas transparent() returns a locale that tracks changes resulting from such calls. classic() returns an object that implements the traditional "C" locale of yore. It is particularly useful for reading and writing portable ASCII data files, and is the default locale imbued on iostreams. ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 13 static locale global(locale const&); global(locale const&) sets the global locale, like ::setlocale(), but it may be applied to a locale instance. It returns the previous global locale. The effect of applying this function to a locale which is implemented, in any part, by calling locale::transparent() or locale::global() is undefined (but will probably lead to infinite recursion and stack overflow). friend ostream& operator<<(ostream&, locale const&) { l.imp_->name(s); return s; } friend istream& operator>>(istream&, locale&); These functions insert and extract a null-terminated ASCII string that uniquely identifies a locale. The extractor recreates the locale if it can. These functions may safely be used regardless of any locale currently imbued in the stream argument. All other functions have semantics identical to their corresponding virtual implementation, described below. Members of class locale::virtuals --------------------------------- All virtual members of the base class locale::virtuals implement the classic "C" locale semantics, with one exception: locale::virtuals:: name() is a pure virtual. virtual void name(ostream&) const; name() generates an ASCII string uniquely identifying the locale. This string may be passed as an argument to the locale constructor to create a copy of the locale. virtual bool equal(virtuals const*, locale::category) const; Returns true iff the two locales are identical in the categories specified. Equivalent to comparing the locale names, for the vendor-supported locales. The expression (locale("C") == locale::classic()) is guaranteed to be true. virtual void insert(ostream& s, bool v) const; virtual void insert(ostream& s, long v) const; virtual void insert(ostream& s, unsigned long v) const; virtual void insert(ostream& s, double v) const; These functions are used by ostream to insert numbers for output. Smaller numbers (short, float) may be promoted as appropriate. Digit group and decimal separators are inserted as specified. ios::flags are set accordingly. ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 14 virtual void extract(istream& s, bool& v) const; virtual void extract(istream& s, long& v) const; virtual void extract(istream& s, unsigned long& v) const; virtual void extract(istream& s, double& v) const; These functions are used by istream to parse numbers off the input stream. Values out of range for smaller types must be identified by the caller (typically the operator >>). Digit group separators are permitted, and checked, if specified. ios::flags are set accordingly. virtual int narrow(wchar_t, char&) const; virtual int widen(char, wchar_t&) const; These functions are used by istream and ostream when converting between "skinny" and wide characters, such as when extracting into a char* from a wide stream, or vice versa. They return 0 on success, non-zero if there is no direct mapping. // ctype functions const locale::ctype* ctypetable_; // used by is(ctype, char) virtual bool is(locale::ctype, wchar_t) const; The ctypetable_ data member is used by the locale::is() functions for efficient inline char classification. The corresponding wide character operation is virtual, where function call overhead is less important and flexibility is essential. virtual size_t is(wchar_t const*, size_t len, locale::ctype*) const; The vector form is() writes a description of each character into the space provided. It returns the number of characters successfully classified. (The char* form is not virtual.) virtual locale::ctype namedctype(char const*) const; namedctype() returns a bitmask corresponding to the named character classification requested, or ctype(0) if the string matches none [or throws an exception?]. Names accepted are (at least) those defined in the Standard C Library "Amendment 1". virtual char to(locale::totype t, char c) const; virtual wchar_t to(locale::totype t, wchar_t c) const; The functions to() take a character and a locale::totype value, and return either that character or another, related character, in the manner of isupper() or islower(). ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 15 virtual size_t to(locale::totype, char*, size_t len) const; virtual size_t to(locale::totype, wchar_t*, size_t len) const; The vector functions to() convert a bufferful of characters in place. They return the number of characters successfully converted. virtual locale::totype namedto(char const* s) const; namedto() returns a value that may be passed to the to() functions. It provides a means to extend the built-in conversions beyond toupper() and tolower(). The base implementation accepts any NTBS as a name, including those defined in the Standard C Library "Amendment 1". For non-standard names it returns either the next available value, or the value last associated with that name. // stdlib functions: virtual int collate(char const*, size_t len1, char const*, size_t len2) const; virtual int collate(wchar_t const*, size_t len1, wchar_t const*, size_t len2) const; virtual int transform(ostream&, char const*, size_t len) const; virtual int transform(ostream&, wchar_t const*, size_t len) const; The collate() members work like the standard C library function of the same name, except the argument strings are simple char (or wchar_t) sequences of given length, not NTMBSs. The transform() functions correspond to strxfrm() and wcsxfrm() in the same way. The transformed string is sent to the stream argument. virtual long hash(char const* s, size_t len) const; virtual long hash(wchar_t const* s, size_t len) const; For any two strings which collate() reports are equal, hash() returns equal values. It also returns equal values for some (rare) pairs of non-equal strings. Beyond these requirements, the algorithm used is unspecified. // time functions virtual void insert(ostream& s, struct tm const* tmb, char format) const; insert() generates a time string according to the *format* argument, with semantics identical to the argument to strftime(). ios::flags are set accordingly. virtual void extracttime(istream& s, struct tm* t) const; extracttime() sets the tm_hour, tm_min, and tm_sec fields of the *t* argument. In general, extracttime() is only required to correctly parse times produced by calling insert(s, t, 'X') in the same locale. ios::flags are set accordingly. ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 16 virtual void extractdate(istream& s, struct tm* t) const; extractdate() sets the tm_year, tm_mon, tm_mday, tm_yday, and tm_wday fields of the *t* argument. Dates may be entirely numeric, or may contain the month name. If the latter, case is ignored and "standard" abbreviations are permitted; otherwise, the order of components expected is that returned by dateorder(). Certain locales may parse dates assuming a different era and calendar; e.g. Chinese, Semitic, Lunar. In general, extractdate() is only required to correctly parse a date produced by calling insert(s, t, 'x') in the same locale. ios::flags are set accordingly. virtual void extractweekday( istream& s, struct tm* t) const; virtual void extractmonthname(istream& s, struct tm* t) const; extractweekday() sets the tm_wday field of the *t* argument. extractmonthname() sets the tm_mon field of the *t* argument. Case is ignored, and "standard" abbreviations are permitted. ios::flags are set accordingly. enum locale::dateorder { NO_ORDER, DMY, MDY, YMD, YDM }; virtual locale::dateorder date_order() const; date_order() returns an enumeration indicating the conventional order of components in a numeric date; in the U.S, it would return MDY, in Europe, DMY, and in Japan, YMD. A NO_ORDER result implies that the date format is not gregorian. [ Are any others, such as JY and YJ (julian dates), or YWD and DWY (year/week/weekday) needed? ] // money functions enum locale::moneysymbol { NONE, LOCAL, INTL }; virtual void insert(ostream& s, double units, locale::moneysymbol sym) const; virtual void insert(ostream& s, char* digits, locale::moneysymbol sym) const; virtual void extractmoney(istream& s, double& units, locale::moneysymbol sym) const; virtual void extractmoney(istream& s, ostream& digits, locale::moneysymbol sym) const; These functions extract and insert monetary values. The *units* argument is interpreted as an integer multiple of the smallest unit of currency. For example, in the U.S. a value of 1000.0 would indicate $10.00. The *digits* argument is an unbroken sequence of digits. The *sym* argument indicates whether to use the local (e.g. "$"), international (e.g. "USD "), or no symbol. extractmoney() takes a *sym* argument because the units sometimes differ; e.g. LOCAL currency might come in units of cents, while INTL may be dollars. ios::flags are set accordingly. ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 17 virtual int moneyfracdigits(locale::moneysymbol sym) const; moneyfracdigits() returns the number of digits after the radix separator in the format chosen by the *sym* argument. virtual locale::virtuals* copybut(char const*, locale::category) const; copybut() is used by the locale(locale const&, char const*, category) locale constructor. locale::virtuals(size_t refs); The regular constructor. Normally the *refs* argument is zero, incremented later by the locale object contructed from this. The base class constructor sets refcount_ to (refs-1) and ctypetable_ appropriately for the "C" locale; constructors for derived classes must reset ctypetable_ for other character mappings. size_t refcount_; void add_reference() { if (this) ++refcount_; } void remove_reference() { if (this && refcount_-- == 0) delete this; } These are the ordinary reference-counting functions. Note that a zero value of refcount_ indicates one reference. [ These functions are good candidates for exclusion locking in a multithread implementation. ] class localev_byname : public locale::virtuals { localev_byname(char const* locale_name, size_t refs); // ... (Defines appropriate vendor semantics for all virtuals.) }; Class localev_byname provides definitions for all virtual functions as necessary to implement locales chosen by name. It is the class used by the constructor locale(const char*), and the operator>>(ostream&, locale&). ----------- X3J16/93-0167R1 - WG21/N0374R1 ----- Myers:Locale ---- Page 18 9. Conclusion I believe that this proposal, in concert with Jerry Schwarz's iostream proposal, encapsulates the entire domain covered by the Standard C Library locale mechanism. An appropriate interface for a message facility is still wanted, and would be very welcome. [Is message support considered unimportant, now, because resource files are widely used instead?] Note that this proposal covers little new ground -- mostly it just encapsulates the familiar Standard C Library features behind a safe, re-entrant, and extensible interface, and integrates them with iostreams. Only the date/time/money parsing is really new; it is practical to add because the C++ locales' extensibility allows us to specify only the minimum capability. Any suggestions for unaddressed (or ill-addressed) areas would be most welcome. In particular, those tracking other standards efforts are invited to help keep this current. This is a large proposal, and some would reject it merely on that basis. Locale support is clearly within the group's charter, however, and the iostream library needs that support. I have found it hard to imagine a smaller proposal that meets C++ programmers' needs for portability and encapsulation. It has also been criticized as duplicating the work of other standards bodies; but it may also be seen as providing a C++ binding to those features. Others argue persuasively that locale support is not yet well enough understood to standardize. Certainly this proposal is no one's idea of the last word on the subject. Besides defining an interface to specific features, however, class locale is an architectural fixture. While the interface will evolve, most uses of the class as a default argument or private member will not. In a sense, a standard class locale provides the vehicle for evolution. Not to standardize an encapsulated, re-entrant interface to locale facilities would be to condemn to non-portability the entire family of multi-threaded and network-aware programs. The importance of this family of applications is growing rapidly and will to continue to accelerate as worldwide network connectivity improves.