ISO: WG21/N0357 ANSI: 93-0150 Author: John Max Skaller Date: 11/9/93 Reply to: maxtal@suphys.physics.us.oz.au Subscripting operators for string classes ------------------------------------------ C++ needs a general purpose value oriented utility string class which has convenient C like syntax, moderate efficiency, and either a rich set of operations or a minimal set with provision for user extension. The existing string class fails to meet these criteria. The proposals below are an attempt to close the gap between the existing string class and what I believe are the principal requirements. PROPOSAL 1a (Alternative) ------------------------- The classes string and wstring will provide the methods char string::operator[](size_t i); wchar_t wstring::operator[](size_t i); with the following semantics: a) if the index is in range, the operator will return the ith character of the string b) if the index is equal to the number of characters in the string, the operator will return 0 c) in other cases the operator semantics are implementation defined, and may include throwing an exception, returning a 0, or undefined results d) Non-normative comment: its expected implementors will provide compiler switches or the like to provide options for the user PROPOSAL 1b (Alternative) ------------------------- The classes string and wstring will provide the methods char string::operator[](size_t i)const; wchar_t wstring::operator[](size_t i)const; char& string::operator[](size_t i); wchar_t& wstring::operator[](size_t i); The semantics for the const functions are as in proposal 1a. The functions returning a reference will return a reference to the ith character of the internal representation of the string if i is in range. The results otherwise are implementation defined. Any operation that moves the string or changes its length, or otherwise specified, will invalidate the reference. Execution of c_str() also invalidates the reference. If the string utilises a shared representation, the representation must be adjusted when the reference is formed. PROPOSAL 1c (Recommended Alternative) ------------------------------------- The classes string and wstring will provide the methods char string::operator[](size_t i)const; wchar_t wstring::operator[](size_t i)const; char& string::operator()(size_t i); wchar_t& wstring::operator()(size_t i); The semantics for the const functions are as in proposal 1a. The semantics for the non-const functions are as in proposal 1b. RATIONALE --------- Dynarray already provides both these operators. The operators are expected for strings. Users will not complain if they are not provided, they simply wont use the string class if convenient, C-like syntax is not available. For both these reasons I dont think providing nothing or 1a is viable. It has been argued that providing a non-const reference forces shared representations to be split needlessly, thus reducing the efficiency of the string class, and thus that proposal 1a is superior to proposal 1b. That would make 1b non-viable. Unfortunately, the syntax for storing a char in a string is somewhat ungainly: s.put_at(n,ch) whereas s[n]=ch; is more readable (at least to me). Since I use short strings for utility purpuses, I dont believe efficiency ought to be a prinicpal design determinant for a string class: usability is much more important. If large strings are desired, or strings with efficient computation of various attributes, a more specialised class will have to be provided by the user, no matter what implementation of string is used. For example, for lexical analysis a special class that optimises storage use and provides fast comparison and assignment is important, and the string class is not adequate (on its own) for that purpose. Therefore I do not favour restricting the interface of the string class on efficiency grounds. Its principal purpose ought to be provision of general purpose functionality with moderate efficiency and convenient syntax. Nevertheless, proposal 1c appears to effect a pleasing compromise. The principal problem with 1a is that it does not provide an alternative to put_at(). 1b does, but it threatens efficiency because the use of the operators is not properly reflected by overloading const. Proposal 1c solves this problem by providing two distinct names that do not overload, thus giving control back to the programmer. For example, the copy operation for (int i=0; i