N2358: No internal state for mblen

Submitter:Philipp Klaus Krause
Submission Date:2019-03-20

Summary:

Disallow the use of internal state in mblen.

This is a partial follow-up to N2281, and tries to fix CR 498 for one of the three functions affected.

Justification:

At London, the committee wanted to resolve CR 498 by stating that mblen, mbtowc, and wctomb are not thread safe, even when the encoding is not state-dependent. It even wanted to allow data races between calls to different functions (i.e. the three function would be allowed to use shared internal state).

However, there would be advantages to making them thread-safe for encodings that are not state-dependent. Such a change could be considered for the future C standard, and N2281, rejected at Pittsburgh, was an attempt to do so by making mblen never use internal state and mbtowc and wctomb not use internal state for encodings that are not state-dependent, and also keeping the internal state of mbtowc separate from that of wctomb. This is a similar attempt, but less ambitious. It only disallows internal state for mblen, and disallows sharing of internal state between mbtowc and wctomb.

mblen can easily be implemented without internal state even for state-dependent encodings. After all, according to the description, the very first thing this function does on invokation is reset its internal state. Thus the return value of mblen depends only on its arguments and the LC_CTYPE category of the current locale.

Disallowing the use of internal state would allow multithreaded applications to use mblen without synchronization. It would also be an elegant solution to the question of making the internal state of mblen thread-local or not (which has been discussed on the WG14 mailing list).

Currently, multithreaded application have to use synchronization or use the restartable mbrlen() instead. Neither is a good option where speed or code size matters.

Synchronization obviously has quite some overhead, and unnecessary synchronization should be avoided for multithreaded programs.

The restartable mbrlen is slow and big (being restartable it needs to be able to handle incomplete input). This can be seen in CR 498, and was stated by multiple attendants of the London meeting.

Proposed changes:

§7.22.7 from (text from the current proposed technical corrigendum for CR 498)

The behavior of the multibyte character functions is affected by the LC_CTYPE category of the current locale. For a state-dependent encoding, each function is placed into its initial conversion state at program startup and can be returned to that state by a call for which its character pointer argument, s, is a null pointer. Subsequent calls with s as other than a null pointer cause the internal conversion state of the function to be altered as necessary. A call with s as a null pointer causes these functions to return a nonzero value if encodings have state dependency, and zero otherwise.305) Changing the LC_CTYPE category causes the conversion state of these functions to be indeterminate. A call to any one of these functions may introduce a data race with a call to any other function in this subclause.

to

The behavior of the multibyte character functions is affected by the LC_CTYPE category of the current locale. For a state-dependent encoding, each of the mbtowc and wctomb functions is placed into its initial conversion state at program startup and can be returned to that state by a call for which its character pointer argument, s, is a null pointer. Subsequent calls with s as other than a null pointer cause the internal conversion state of the function to be altered as necessary. A call with s as a null pointer causes these functions to return a nonzero value if encodings have state dependency, and zero otherwise.305) Changing the LC_CTYPE category causes the conversion state of the mbtowc and wctomb functions to be indeterminate.

§7.22.7.1 3 remove paragraph (text from the C17 standard)

The implementation shall behave as if no library function calls the mblen function.