Doc. no: N0868 / 96-0050 Date: 25 January, 1996 Project: Programming Language C++ Reply to: T. Kamimura E-mail: kamimura@trl.ibm.co.jp Japanese Comments on "Extended Characters in C++ Programs" Tom Plum and Dag Bruck have proposed to support extended characters in Document X3J16/95-0220(WG21/N0820). This memo is to express a view shared by Japanese members of WG21 on the proposal. The proposal calls for the support of ISO Latin-1 encoding for source files, and for the support of all characters representable in ISO 10646. In addition, it proposes the introduction of trigraphs for characters in ISO 10646 that are not representable in the source character set. It also supports the idea of extended identifier characters as being proposed as "International identifier characters" in ISO/IEC PDTR 10176. Also, it requires to define for each character in a literal or a string the implementation-defined character in the target character set. These proposed items have significant implications on the current situation of character set and encoding environment. Some will be very difficult to support , and others will require detailed study for recommendations for practical migration. If the proposal is adopted without such careful analysis, the standard will not be implementable at least in Japan and will therefore become useless. We believe that the character set and encoding are very delicate and complex issues. The current situation is realized as a balance of practical diversity, political climate among various competitors and international competition and collaboration, and basic ideas on which standards can practically be grounded. The notions of two character sets, one using ISO 646 invariant set as a basic character set to support maximum portability, and an extended character set to support portability for regional/local/cultural needs created reasonable and practical environment. If a standard enforces the support of ISO 10646 for identifier characters, then this will likely change. We are aware that PDTR 10176 has a list of "international identifier characters" in its annex, but this is a controversial subject, and at least Japanese representatives of SC22/WG20 are not for full support at this stage. Also, our understanding is that no programming language standard enforces one particular encoding scheme. This is important and practical as there are three major encoding schemes used currently in Japan: JIS (Japanese Industrial Standard), Shift JIS, and EUC encoding. Latin-1 is not compatible with any of these major encoding. For example, a single byte is used for Roman character and half width Katakana in Shift JIS has a direct conflict against Latin 1 encoding. If the standard enforces Latin 1 encoding, it will require major change on the current environment which is very unlikely to be accepted by our community. To support a standard which will require changes of current environment, we need to formulate possible and practical approach for migration. To analyze possible approach to support ISO 10646 character set, we need to understand details of what is intended by "accepting source files using all characters representable in ISO 10646". What will be recommended action if a character is not uniquely reprentable in the target character set? If we need to preserve uniqueness, it may require to define some run-time representation with escape characters. Even though the proposal indicates that it does not specify encoding for them, defining such run-time implementation in many systems without major change to existing systems will be difficult since they are based on existing encoding schemes mentioned above. If it requires the ability to assign literals to array of char/wchar, the difficulty will increase. Unless we specify where and how the use of ISO 10646 character set is permitted, we cannot proceed in forming possible approach to support the standard. The proposed issues are all important ones, and they will require careful analysis in each national environment. They are definitely not the issues we can decide in a single meeting or two. Considering the current schedule of our standardization process, we are wondering if we should investigate these issues further at this stage.