N1623=04-0063
                                                              Matt Austern
                                                              24 Mar 2004

Resolutions to regular expression issues

7.3 The Interface to regex_traits Should Use Iterators, Not Strings

1. lookup_classname
-------------------
In clause 7.2 [tr.re.req], paragraph 4, immediately after "*I1* and *I2* are
Input Iterators;" add "*F1* and *F2* are forward iterators;".

In Table 7.1, change the entry for the member function *lookup_classname* to:

   v.lookup_classname(F1, F2)   X::char_class_type   Converts the character sequence
                                       designated by the iterator range
                                       *[F1, F2)* into a bitmask type
                                       that can subsequently be passed
                                       to *is_class*. Values returned
                                       from *lookup_classname* can be
                                       safely bitwise or'ed together.
                                       Returns 0 if the character sequence
                                       is not the name of a character
                                       class recognized by X. At least
                                       the names "d", "w", "s", "alnum",
                                       "alpha", "blank", "cntrl", "digit",
                                       "graph", "lower", "print", "punct",
                                       "space", "upper" and "xdigit" shall
                                       be recognized. The value returned
                                       shall be independent of the case
                                       of the characters in the sequence.

In Table 7.1, change the entry for the member function *is_class* to:

   v.is_class(c,                  bool      Returns *true* if character *c*
      v.lookup_classname(F1, F2))               is a member of the character
                                       class named by the character
                                       sequence designated by the
                                       iterator range *[F1, F2)*, false
                                       otherwise.

In Clause 7.7 [tr.re.traits], in the definition of the template *regex_traits*, change
the signature of the member function *lookup_classname* to:

   template <class ForwardIterator>
   char_class_type lookup_classname(ForwardIterator first, ForwardIterator last) const;

Later in Clause 7.7, change the signature and Effects clause for *lookup_classname* to:

   template <class ForwardIterator>
   char_class_type lookup_classname(ForwardIterator first, ForwardIterator last) const;

   Effects: returns an unspecified value that represents the character classification
   named by the character sequence designated by the iterator range *[first, last)*.
   If the name is not recognized then returns a value that compares equal to 0. At least
   the names "d", "w", "s", "alnum", "alpha", "blank", "cntrl", "digit", "graph",
   "lower", "print", "punct", "space", "upper" and "xdigit" shall be recognized. The
   value returned shall be independent of the case of the characters in the character
   sequence.

Later in Clause 7.7, in the Returns clause for the member function *is_class*,
change "returns true if *f & lookup_classname("w") == lookup_classname("w")* and
*c == '_'*, otherwise returns false" to "returns true if *f* bitwise or'ed with the
result of calling *lookup_classname* with an iterator pair that designates the
character sequence "w" is not equal to 0 and *c == '_'*, or if *f* bitwise or'ed
with the result of calling *lookup_classname* with an iterator pair that designates
the character sequence "blank" is not equal to 0 and *c* is one of an implementation
defined subset of the characters for which *isspace(c, getloc())* returns true,
otherwise returns false.".

In clause 7.13 [tr.re.grammar], at the end of the last bullet item, change "a
character c is a member of a character class *some_name* if
*traits_inst.is_class(c, traits_inst.lookup_classname(some_name))*" to "a
character c is a member of a character class designated by an iterator range
*[F1, F2)* if *traits_inst.is_class(c, traits_inst.lookup_classname(F1, F2))*".

2. lookup_collatename
---------------------
In Table 7.1, change the entry for the member function *lookup_collatename* to:

   v.lookup_collatename(F1, F2)   X::string_type   Returns a sequence of characters
                                       that represents the collating
                                       element consisting of the
                                       character sequence designated
                                       by the iterator range *[F1, F2)*.
                                       Returns an empty string if the
                                       characer sequence is not a
                                       valid collating element.

In Clause 7.7 [tr.re.traits], in the definition of the template *regex_traits*,
change the declaration of the member function *lookup_collatename* to

   template <class ForwardIterator>
   string_type lookup_collate_name(ForwardIterator first, ForwardIterator last) const;

Later in Clause 7.7, change the signature and Effects clause for *lookup_collatename* to:

   template <class ForwardIterator>
   string_type lookup_collatename(ForwardIterator first, ForwardIterator last) const;

   Effects: returns a sequence of one or more characters that represents the
   collating element consisting of the character sequence designated by the
   iterator range *[first, last)*. Returns an empty string if the character
   sequence is not a valid collating element.

[Note: I removed the reference to IEEE Std 1003.1-2001, Base Definitions and
Headers, Section 6.1, Portable Character Set, because that is a bunch of portable
names for characters, which are not the same as collating elements within the
meaning of POSIX locales).

3. transform
------------
In Table 7.1, change the entry for the member function *transform* to:

   v.transform(F1, F2)      X::string_type   Returns a sort key for the
                                 character sequence designated by
                                 the iterator range *[F1, F2)*
                                 such that if the character sequence
                                 *[G1, G2)* sorts before the character
                                 sequence *[H1, H2)* then
                                 *v.transform(G1, G2) < v.transform(H1, H2)*.

In Clause 7.7 [tr.re.traits], in the definition of the template *regex_traits*,
change the declaration of the member function *transform* to

   template <class ForwardIterator>
   string_type transform(ForwardIterator first, ForwardIterator last) const;

Later in Clause 7.7, change the signature and Returns clause for *transform* to:

   template <class ForwardIterator>
   string_type transform(ForwardIterator first, ForwardIterator last) const;

   Effects:
      string_type str(first, last);
      return use_facet<collate<charT> >(getloc()).transform(
         &*str.begin(), &*str.end());

In Clause 7.13 [tr.re.grammar], change the second bullet item to read:

   During matching of a regular expression finite state machine against a sequence
   of characters, comparision of a collating element range c1-c2 against a
   character *c* is conducted as follows: if *flags() & regex_constants::collate*
   is false then the character *c* is matched if *c1 <= c && c <= c2*, otherwise
   *c* is matched in accordance with the following algorithm:

      string_type str1 = string_type(1,
         flags() & icase ? traits_inst.translate_nocase(c1)
            : traits_inst.translate(c1);
      string_type str2 = string_type(1,
         flags() & icase ? traits_inst.translate_nocase(c2)
            : traits_inst.translate(c2);
      string_type str = string_type(1,
         flags() & icase ? traits_inst.translate_nocase(c)
            : traits_inst.translate(c);
      return traits_inst.transform(str1.begin(), str1.end())
         <= traits_inst.transform(str.begin(), str.end())
         && traits_inst.transform(str.begin(), str.end())
         <= traits_inst.transform(str2.begin(), str2.end());


4. transform_primary
--------------------
In Table 7.1, change the entry for the member function *transform_primary* to:

   v.transform_primary(F1, F2)   X::string_type   Returns a sort key for the
                                    character sequence designated by
                                    the iterator range *[F1, F2)*
                                    such that if the character sequence
                                    *[G1, G2)* sorts before the character
                                    sequence *[H1, H2)* when character case
                                    is not considered, then
                                    *v.transform_primary(G1, G2) <
                                    v.transform_primary(H1, H2)*.

In Clause 7.7 [tr.re.traits], in the definition of the template *regex_traits*,
change the declaration of the member function *transform_primary* to

   template <class ForwardIterator>
   string_type transform_primary(ForwardIterator first, ForwardIterator last) const;

Later in Clause 7.7, change the signature and Effects clause for *transform_primary* to:

   template <class ForwardIterator>
   string_type transform_primary(ForwardIterator first, ForwardIterator last) const;

   Effects: if *typeid(use_facet<collate<charT> >) == typeid(collate_byname<charT> >)
   and the form of the sort key returned by
   *collate_byname<charT>::transform(first, last)* is known and can be converted
   into a primary sort key, then returns that key, otherwise returns an empty string.


7.4 Regular expressions and internationalization

1. reducing use of traits::translate():
---------------------------------------
In table 7.1 change the entry for the member function translate from:

   v.translate(c,b)   X::char_type   Returns a character d such that:
                              for any character d that is to
                              be considered equivalent to c
                              then v.translate(c,false) ==
                              v.translate(d,false). Likewise
                              for all characters C that are to
                              be considered equivalent to c when
                              comparisons are to be performed without
                              regard to case, then v.translate(c,true)
                              == v.translate(C,true).

to:

   v.translate(c)      X::char_type   Returns a character such that:
                              for any character d that is to
                              be considered equivalent to c
                              then v.translate(c) ==
                              v.translate(d).

and add the following entry:

   v.translate_nocase(c)   X::char_type   For all characters C that are to
                                 be considered equivalent to c when
                                 comparisons are to be performed without
                                 regard to case, then v.translate_nocase(c)
                                 == v.translate_nocase(C).

In clause 7.7 [tr.re.traits] in the definition of regex_traits replace the
declaration of translate, which currently reads:

   charT translate(charT c, bool icase) const;

to:

   charT translate(charT c) const;
   charT translate_nocase(charT c) const;

In clause 7.7 [tr.re.traits] replace the Effects clause for the member
function translate from:

   Effects: returns (icase ? use_facet<ctype<charT> >(getloc()).tolower(c) : c)

to:

   Effects: returns (c);

In clause 7.7 [tr.re.traits] add the following description of the member
function translate_nocase following the description of the member function
translate:

   charT translate_nocase(charT c) const;
   Effects: returns use_facet<ctype<charT> >(getloc()).tolower(c)

In clause 7.8 [tr.re.regex] change the sentence that reads:

   During matching of a regular expression finite state machine
   against a sequence of characters, two characters c and d are
   compared using traits_inst.translate(c,   getflags() & regex_constants::icase)
   == traits_inst.translate(d, getflags() & regex_constants::icase).

to:

   During matching of a regular expression finite state machine
   against a sequence of characters, two characters c and d are
   compared using the following rules:
      1. if *(flags() & regex_constants::icase)* the two characters
         are equal if *traits_inst.translate_nocase(c) ==
            traits_inst.translate_nocase(d)*
      2. otherwise, if *(flags() & regex_constants::collate)* the two
         characters are equal if *traits_inst.translate(c) ==
            traits_inst.translate(d)*
      3. otherwise, the two characters are equal if *c == d*.
   
2. removing syntax_type, escape_syntax_type:
--------------------------------------------
In table 7.1, remove the rows for *v.syntax_type(c)* and for *v.escape_syntax_type(c)*

In clause 7.4 [tr.re.syn] remove the typedefs for *syntax_type* and *escape_syntax_type*
from *namespace regex_constants*.

In the second paragraph of 7.5 [tr.re.const] remove the references to *syntax_type*
and *escape_syntax_type*.

Remove 7.5.3 [tr.re.syntype]
Remove 7.5.4 [tr.re.escsyn]

In clause 7.7 [tr.re.traits], remove the entries for the member functions
*syntax_type* and *escape_syntax_type* from the definition of the template
struct *regex_traits*, and remove the signatures and the Effects clauses for
the two functions *syntax_type* and *escape_syntax_type*.

In clause 7.13 [tr.re.grammar], remove the third paragraph (beginning with "The
transformation from a sequence of characters...").

7.18 Can anything other than basic_regex throw bad_expression objects?

Resolution:

p 137:
------

remove the last entry in the table about error_string().


p 138:
------

change

   class bad_expression;

to

   class regex_error;

p 153:
------

change
   7.6 Class bad_expression [tr.re.badexp]
         
   class bad_expression : public std::runtime_error
   {
   public:
      explicit bad_expression(const std::string& what_arg);
   };
   
   The class bad_expression defines the type of objects thrown as exceptions to report errors during the
   conversion from a string representing a regular expression to a finite state machine.
   
   bad_expression(const string& what_arg );
   Effects: Constructs an object of class bad_expression.
   Postcondition: strcmp(what(), what_arg.c_str()) == 0.
to
   
   7.6 Class regex_error [tr.re.regerr]   
      
   class regex_error : public std::runtime_error
   {
   public:
      explicit regex_error( regex_constants::error_type code );
      regex_constants::error_type code() const;
   };
   
   The class regex_error defines the type of objects thrown as exceptions to report errors 
   from the regular expression library.

   regex_error( regex_constants::error_type code );
   Effects: Constructs an object of class regex_error.
   Postcondition: code == code().
   regex_constants::error_type code() const;
   Returns: The error condition that caused the exception.
   
p 154:
------

remove

   std::string error_string(regex_constants::error_type) const;

p 156:
------

remove

   std::string error_string(regex_constants::error_type e) const;
   Returns: A human readable error string for error condition e.

p 157:
------

change 
   bad_expression
to
   regex_error
   
p 160:
------

change 
   bad_expression
to
   regex_error

twice

p 161:
------

change 
   bad_expression
to
   regex_error

twice

p 163:
------

change 
   bad_expression
to
   regex_error


p 174:
------
change
   7.11 Regular expression algorithms [tr.re.alg]
   
   7.11.1 regex_match
to
   7.11 Regular expression algorithms [tr.re.alg]
   
   7.11.1 exceptions
   
   All algorithms described in this clause can throw a regex_error
   exception. If such an exception e is thrown, e.code() is guaranteed to
   return either regex_constants::error_complexity or regex_constants::error_stack.
   
   7.11.2 regex_match

p 188:
------

change 
   bad_expression
to
   regex_error


7.44 Too many syntax options

In section 7.5.1, eliminate the following syntax option types: normal, javascript, jscript, sed, perl.

7.45 Names recognized by regex_traits::lookup_classname

In the entry for *lookup_classname* in table 7.1, remove the sentence "At least
the names ... shall be recongnized."

In the Effects clause for *lookup_classname*, replace the sentence

   At least the names "d", "w", "s", "alnum", "alpha", "blank", "cntrl",
   "digit", "graph", "lower", "print", "punct", "space", "upper" and
   "xdigit" shall be recognized.

with:

   For *regex_traits<char>*, at least the names "d", "w", "s", "alnum",
   "alpha", "blank", "cntrl", "digit", "graph", "lower", "print", "punct",
   "space", "upper" and "xdigit" shall be recognized. For *regex_traits<wchar_t>*,
   at least the names L"d", L"w", L"s", L"alnum", L"alpha", L"blank", L"cntrl",
   L"digit", L"graph", L"lower", L"print", L"punct", L"space", L"upper" and
   L"xdigit" shall be recognized.