WPCw 2BVPW#|R12ptTRzNxxx,-x  @U;HP LaserJet II DHPLAIID.PRSx  @h0 u#|R2 7 C WGZUSCFj=lHP LaserJet II DHPLAIID.PRSFuX  Pg9Ch0 ul6s@9,+mDs P7DP5v@9,rDv_ p^7D 4q@9,-_6Dq&_ x$&7DXs4ddd,Hzd6X@8;@l8wC;,[hXw P7XP7zC;,sXz_ p^7X2{*{P {p "mu3y^4@@d0LLd0@0ddddddddddd88dĘ|HXxlĘ@d@dd0dlXld@`l84h8llllLL@l\XXXddddtddttddtttt0ttttdtttdtlttdddddИXddddH8H8H8H8lllllllllXdlllXlxldddXXXXldddd``````llHtHtH8HtttXth888D8llllllttLLLlLlLlLlL@@@llllllĈXXXXttl8lLlL@XXlllttttttxWxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxNtdxtdd@dddddHHdtW4,X0Xt4 P];(>tW4,X|)Xt4 p;&>pT4,)XuXp4 X*;Zm[ddd,B#d  @];   TitleXISO/IEC WD4 14651 ĩ International String Ordering Method for comparing Character Strings and Description of a Default Tailorable Ordering, for Characters Strings Using the repertoire (or subrepertoires) of ISO/IEC 10646(# XX` [ISO/CEI DT4 14651 Classement international de cha3nes de caract/res M)thode de comparaison de cha3nes de caract/res et description d'un ordre implicite adaptable pour les cha3nes de caract/res utilisant le r)pertoire (ou des sousr)pertoires) de l'ISO/CEI 10646](#` XX` (#` Status: ` Working Draft 4 for comments by SC22/WG20 members before the April 1996 Kyoto meeting(#` Date: XX` 19960125(#` Project:`  22.30.02.02(#` Editor: ` Alain LaBont)(#` XX` Gouvernement du Qu)bec(#` XX` Secr)tariat du Conseil du tr)sor(#` XX` Service de la prospective et de la francisation(#` XX` 875, GrandeAll)e Est, 4C(#` XX` Qu)bec, QC G1R 5R8(#` XX` Canada(#` ` GUIDE SHARE Europe XX` SCHINDLER Information AG(#` XX` CH6030 Ebikon (Bern)(#` XX` Switzerland(#` Email: Xalb@sct.gouv.qc.ca(#   FOREWORD  d\  PC0 ISO (International Standards Organisation) and IEC (International Electrotechnical Commission) form the specialised bodies for worldwide standardisation. National bodies that are members of ISO or IEC participate in the development of International Stand d\  PC0 ards through technical committees. These technical committees are established by the respective organisation to deal with particular fields of mutual interest d\  PC0 . In liaison with ISO and IEC, other international organisations, governmental and nongovernmental, also take part in the work.  d\  PC0 In the field of information technology, ISO and IEC have established a joint technical committee known as ISO/IEC JTC1. Draft International Standards adopted by the joint technical committee are circulated to the national bodies for voting. Publication as d\  PC0  an international standard requires approval by at least 75% of the national bodies that cast a vote.  d\  PC0 The ISO/IEC 14651 Inter d\  PC0 national Standard has been prepared by the Joint Technical Committee ISO/IEC JTC1, Information Technology.  d\  PC0  INTRODUCTION  d\  PC0 A default international ordering mechanism does not provide a universal solution for all situations. The purpose of such a mechanism is to correct errors of the past regarding only collation on binary coded character values. Past approaches have never res d\  PC0 pected cultures. English is one exception, although a poor one, when only upper case alphabetic data was used instead of other characte d\  PC0 rs including punctuation and spacing.  d\  PC0 This is one of the major flaws that affect portability between countries and between applications. (Traditionally, different programs make different ordering corrections.) Therefore, it has been considered feasible to design a Default Tailorable Ordering  d\  PC0 Mechanism (a method and a unique table). This mechanism will constitute an acceptable tool that will make sense for most users of the different scripts. Also, most simple applications will be able to use the mechanism  d\  PC0 without modification. These applications use ordering dependencies that are not dependent on any context.  d\  PC0 Naturally, a modification mechanism is embedded in the model. The mechanism will accommodate particular languages with a minimum of changes. Let us look at Latin Script as an example. The Spanish and Scandinavian languages will have the order of a few let d\  PC0 ters changed compared to the order acceptable in most other European languages that use the Latin script. Also, a whole script order change could be d d\  PC0 esired relative to another one for example, Thai before Latin, and so on.  d\  PC0 Furthermore, there might be specific linguistic requirements that cannot be fulfilled without knowing the context. For example, Japanese names expressed in Kanji cannot be deduced solely in phonetic ordering. Instead, Japanese names need hidden multiple fields. Generally, in Japanese databases, a given Kanji proper name is associated with a hidden phonetic representation in a different field. This association allows correct ' ordering, otherwise a replication of items might be necessary for human searching of Kanji proper names in a list in the absence of other fields. More generally, specific requirements exist for complex telephonebook type classification or for phonetic classification. This is particularly true in multilingual countries or organisations. As an example, the item "4" could sometimes be phonetically classified (transformed) in such lists to accomplish ordering. This classification requires that the item be reproduced several times. Each replicated item is hence transformed for phonetic ordering (for example, as "QUATRE", "FOUR" and "VIER" in French, English, and German respectively). In this way, a user can immediately retrieve the item "4" in a list under "Q", "F" and "V" depending on the individual user requirements. To achieve these requirements, the comparison and ordering mechanism on which focus is directed here is included in a more general model. The general model is also described in this international standard. The general model allows multiplefield ordering and prehandling and posthandling classification phases. The ordering mechanism assumes this higherlevel scheme. Specifically, the prehandling and posthandling phases could be null processes. Also in the simplest applications, only one field will be ordered typically. In such cases, a straightforward order could be achieved and would be reasonably valid for the majority of users who do not require further specialised classification. The typical lexical dictionary order in a given natural language is an example of this type. It is assumed that lexical order is the minimal culturally acceptable order for a list so that the general public, and even specialists, can use it without error. To simplify matters, the Default Tailorable Ordering Mechanism will describe a method to order text data independently of context. The method will be culturally acceptable to a majority of worldwide environments (with provisions to accommodate more local contexts). It is obvious that ordering is not limited to a sorting program. Ordering requires that string comparison be consistently redefined with a new comparison engine. This engine will be used by processes which compare, sort, search, mix, and merge graphic character data. This engine will be described in this international standard. The design of this international standard keeps in mind that old systems could also integrate culturally valid ordering with minimal changes. Therefore, the basic engine will not work directly on a text string of graphic characters. Instead, the first phase of the process reduces the text string to a single bit string that is suitable for direct and mechanical numeric comparisons. Numeric data has two general kinds of representation. One type of representation is external and uses human readable graphic characters. The other type of representation is internal and is directly suitable for highspeed processing. For this reason, programming languages define data types for suitable processing of numbers (in general more than one type). In this way, programmers do not need to parse graphic characters before performing numeric processing. This parsing would be very prone to errors, add to ' programming complexity, and would not achieve general consistency among different applications. Character comparisons are of a more complex nature. Therefore, having the programmers involved in parsing is not more desirable. Nevertheless, this was the prevailing situation before the present international standard was designed. The consistent text data comparison engine described in this international standard works on an internal structure that is the result of parsing an original string for comparison. Parsing is done according to a formal description of cultural ordering conventions. The definition of such an engine makes it highly desirable that future versions of programming language standards define new data types. In each language, it is desirable that at least one data type manage graphic character string comparisons that are not limited to absolute equality. The programming language can define these data types as formal containers. These containers represent strings of text that can be processed internally, in a way that is very straightforward and independent of coded graphic characters. In this way, the programmer is freed from parsing processes. Also, the probability of achieving application portability between different countries using different cultures would be increased because applications can be designed in a generic way. Furthermore, the predigested structure materialising such a data type can be stored and reused in a given cultural environment for increasing performance and allow preserving past applications with minimal changes. Reusing the structure would require no further parsing by external, even ancient, hardwired engines that have the capability to do straightforward binary comparisons (such as a hardware disk search engine, or an access method designed decades ago that developers do not want to redesign because of its high efficiency). This feature is a nonnegligible economic byproduct of this international standard: once a string has been parsed for an environment, its processing does not require reparsing. In fact, as for numbers, the standard graphic character representation need not be used until data is presented again to the user. This calls for reversibility of the process. The present standard makes that reversibility a possibility, in addition to guaranteeing the full predictability of the comparison operation. If two equivalent strings are not absolutely equal, then the tie must be broken. Consequently, a sort program, the simplest application, can always sort data in the same way.   Tutorial on problems solved by this standard Why aren't existing standard codes, character by character comparisons and commercial sort programs appropriate for sorting and what must be done to solve the problem? For clarity, this discussion will start with the Latin script. i.XSorting, in any language using the Latin script, including English, using standard ISO 646 coding, does not follow traditional dictionary sequence, which is the minimum the average(# user needs. XEx.: Sorting the list "august", "August", "container", "coop","co-op", "Vice-president", "Vice versa" gives the following order, if ISO 646 coding is used and a simple sort following binary order is done:(# XX` August(#` XX` Vice versa(#` XX` Vice-president(#` XX` august(#` XX` co-op(#` XX` container(#` XX` coop(#` Xwhich is obviously wrong.(# ii.XTranslating lower case to upper case and removing special characters gives a sorted list acceptable to users, but also unpredictable results.(# XEx.: Sorting the list "August", "august", "coop", "co-op" gives the following order:(# XX` August(#` XX` august(#` XX` coop(#` XX` co-op(#` XSorting the same list with a different initial order, say, "august", "August", "co-op",(# "coop" gives a different order with this method: XX` august(#` XX` August(#` XX` co-op(#` XX` coop(#` iii.XIf accented characters are introduced using for example ISO 8859-1 code, the problems encountered in steps i and ii above are amplified but they share the same causes.(# 'Ԍ iv.XIf tables are reorganized to make all related characters contiguous, one might think it would permit a simplified single-character sort, but this does not work either. Take upper and lower case unaccented letters as an example. If code point 01 is assigned to a , code point 02 assigned to A , code point 03 to b , code point 04 to B  and so on, let's see what happens in a list sorted directly by these rearranged values:(# XX` SortedX# $Internal(# XX` ListX! $Values(# XX` aaaaX! $01010101(# XX` abbbX! $01030303(# XX` AaaaX! $02010101(# XX` AbbbX! $02030303(# XThis is predictable also, but obviously wrong in any country from a cultural point of view.(# v.XThe only path of solution is to decompose the initial data in a way that will respect traditional lexical order, and at the same time ensure absolute predictability. For the Latin script, this necessitates at least four levels:(# X1. The first decomposition renders information to be sorted case insentitive and diacritical mark insensitive, and removes all special characters which have no preestablished order in any human culture:(# XAn example using English:(# XX` "r)sum)" (an English word derived from French but with a very different meaning in French) becomes "resume", without any accent.(#` XAn example using French:(# XX` "Vice-l)gation" becomes "vicelegation", with no accent, no upper case and no dash.(#` XAn example using German:(# XX` "gro" becomes "gross", with the sharp-s being converted to double-s to render it case insensitive.(#` XIn Spanish or Scandinavian languages, some extra letters are added to the 26 fixed letters of the English, French and German alphabet, which are not ordered according to the expectations of this group of languages. This calls for adaptability.(#  'ԌX2. The second decomposition breaks ties on quasi-homographs, strings that differ only because they have different diacritical marks. In the English example above, "resum)" and "r)sum)" are quasi-homographs. Traditional lexical order requires that "resume" always come before "r)sum)" (which sorting using only the first level would not guarantee). In this case, tradition does not say if "resum)" (another spelling) should come before "r)sum)", which would seem logical: English and German dictionaries only state that unaccented words precede the accented words.(# XHere another characteristic is introduced. In French, because of the large number of multiple quasi-homograph groups formed of more than 2 instances, main dictionaries follow a rule that is the following: accents are generally not taken into account for sorting, but in case of homographic ties, the last difference in the word determines the correct order between two given words, a priority order being then assigned to each type of accent. For example, "cot)" should be sorted after "c=te" but before "c=t)". This is easy to implement: a number is assigned to each character of original data to be sorted, representing either an accent or no accent at all, but these numbers are stacked instead of being added to a linear list: in other words, the resulting string is made starting from the last character of the original data and backward.(# XExample: to obtain the following order respecting this rule: "cote, "c=te", "cot)", "c=t)",numbers could be assigned indicating respectively **** , **c* , a*** , a*c* , where "*" means no accent, "a" means acute accent, "c" circumflex accent. Here this scheme is sufficient to break the tie correctly at this second level.(# X3. The third decomposition breaks ties for quasi-homographs different only because upper-case and lower-case characters are used. This time, the tradition is well established in English and German dictionaries, where lower case always precedes upper case in homographs, while the tradition is not well established in French dictionaries, which generally use only accented capital letters for common word entries. In known French dictionaries where upper and lower case letters are mixed, the capitals generally come first, but this is not an established and stated rule, because there are numerous exceptions. So for a default template it is advisable to use English and German traditions, if one wants to group the largest possible number of languages together. Let's note here by the way that in Denmark, upper case comes before lower case, a different but well established rule. This is a second fact calling for adaptability in the model used in this standard.(# XExample: to have the following order: "august", "August", numbers could be assigned indicating respectively llllll , ulllll , where "l" means lower case and "u" upper case.(# X4. The fourth decomposition breaks the final tie that does not correspond to any tradition, the tie due to quasi-homographs that differ only because they contain special characters. Breaking this tie is essential to ensure the absolute ' predictibility of sorts and also to be able to sort strings composed only of special characters. Since the traces of special characters were removed from the original data to form the three first orders of decomposition, simply putting them in row in the fourth order of decomposition would mean that their position would be lost. These positions are quite important to solve remaining ties and in consequence we must retain here the original positions of these special characters: two quasi-homographs could each contain a common special character in different positions and thus be strictly different (ex.:"ab*cd" is still different from "a*bcd" despite they share one and only one common special character).(# XExample: to have the following order: "coop", "co-op", "coop-", numbers could be assigned respectively according to the following pattern: d , d3-  and d5- , where "d" is an always-present delimiter that separates this decomposition from the first three in case all four decompositions are to be concatenated to form a single sorting key based on numeric values (see discussion in the next paragraph). "3-" means a dash in position 3 of the original string. "5-" means a dash in position 5, and so on.(# XThese four decompositions can be structured using a four level key, concatenating the subkeys from the highest significance to the lowest. If coded assignment of numbers is done properly, instead of necessitating a cumbersome exception process for dealing with homographs, all decompositions may be made at once and resulting strings concatenated and passed through a standard sort program sorting in numeric order. To attain this result, it is sufficient that numbers chosen for the first decomposition code set be greater than numbers chosen for the second one, the second one's greater than the third one's, and that the delimiter chosen for the fourth decomposition be less than the lowest possible number coded elsewhere for the sort (delimiter called logical zero), in which case no restriction applies to the content of the fourth decomposition. An easier implementation might just choose to put the lowest value possible as a delimiter between each subkey, in which case no restriction ever applies.(# XThis method has been fully described with tables for the first time in R/gles du classement alphab)tique en langue fran'aise et proc)dure informatis)e pour le tri, Alain LaBont), Minist/re des Communications du Qu)bec, 19 aoEt 1988, ISBN 2-550-19046-7.(# XReduction techniques have been designed to considerably shorten space requirements. As no implementation is required to use specific numbers for weights and does not require reduction nor compression, this issue is outside the scope of this standard but it is interesting to note that implementation can be optimized. This has been improved over time and is highly feasible.(# XA plublicdomain reduction technique is described in details (with ample examples) in Technique de r)duction - Tris informatiques ! quatre cl)s, Alain LaBont), Minist/re des Communications du Qu)bec, June 1989 (ISBN 2-550-19965-0).(# 'Ԍ vi.XFor a certain number of languages, the default presented in this standard will need to be adapted, both in the table values for the four orders of keys and in the potential context analysis processing necessary to achieve culturally correct results for users of these languages. To illustrate this, examples of dictionary sequences are given here for two languages which native order is not in the default table:(# XTraditional Spanish (note "ch" greater than "cu" and "9a" greater than "no"):(# XX` cuneo. The occurrences of xxxx which follow the letter "U" represent the hexadecimal value of a coded character as defined in ISO/IEC 10646. This is a means to be codeindependent (the same value being possibly used even if the coded character set in use in a given implementation is not ISO/IEC 10646). At the same time, this is a means to keep a straightforward link with the Universal MultipleOctet Coded Character Set, which is assumed to contain all the coded graphic characters ever defined by ISO/IEC. Whenever possible, in the ordering table, glyphs will be used in comments alongside with character ordering definitions. This will give a more accurate understanding of characters in question. The letter U stands for UCS, which itself stands for Universal multipleoctet Coded Character set. The collatingsymbol statements will include declarations of symbols used as intermediary values for: P2  X` hp x (#%'0*,.8135@8:, whose ordering is done at the last level in the default, be normally processed separately. This will avoid collisions with eventual extra levels added by tailoring. It is highly recommended that only four levels be used in tailoring, the fourth one being the level reserved to special characters. This is the only way this standard can guarantee that nothing will be broken; otherwise thorough and skillfull thinking by the implementer will be required, the minimum being that special characters have to be processed at the last level.A# 5.3.1.1 Table sections and processing properties  The table is separated into sections, one section for each script. Each section is assigned a sequential number corresponding to its order of apparition. The header of each section is named for clarity. The header describes transformation properties for each level of the script. These properties are tailored for the peculiarities of the script relative to the ordering process.A#  One of the tailoring possibilities is to change the relative order of a whole script relative to other scripts. Separation of the table into named sections will simplify that requirement, as well as serving to describe script properties.A#  The scanning direction (forward or backward) used to process the string at each level is a property of each script. These properties can be changed according to the language. Clause 5.5 describes tailoring.A#  One of the properties is also the possibility to assign a comparison on the numerical value representing the position of each character of two strings, before comparing weights assigned to the characters.A#  'Ԍ  Note : The scanning direction (forward or backward) is not normally related to the natural writing direction of a script. The scanning direction applies only to the order processing in relation with the logical sequence of the coded character string.A#  According to ISO/IEC 10646, for scripts written right to left, such as Arabic, the lowest positions in the logical sequence of characters correspond to the rightmost characters of a string (from the point of view of their natural sequence). Conversely, for the Latin script, written left to right, the lowest positions in the logical sequence of characters correspond to the leftmost characters of the string (from the point of view of their natural presentation sequence).A#  Therefore, scanning forward starts with the lowest positions in the logical sequence, while scanning backward starts from the highest positions.A#  Now, in order to precise what was just said, in ISO/IEC 10646, Arabic is artificially separated in two scripts: the logical, intrinsic Arabic, coded independently of shapes, and the presentation forms. Both allow to code Arabic completely, but intrinsic Arabic is normally prefered for better processing, while the second is prefered by some presentationoriented applications.A#  Intrinsic Arabic is coded in the logical order, while presentation forms are coded in presentation order. The first of these two scripts is described in the default under the header , standing for the normal coding, called intrinsic Arabic. The second one is described under the header , standing for Arabic forms. Scanning properties of these two artificial sections differ, the firts one being csanned forward, the second one being scanned backward, for the first three default levels.A#  5.3.2 Key composition   U !y )19X!A series of m subkeys is formed out of a character string composing a comparison field ; m is the maximum number of levels described in either the default ordering table or the tailored ordering table. The following paragraphs describe these formations. In the default table, m is equal to 4.A#! 5.3.2.1 Formation of properties vector X!For each character string, a corresponding vector is built (another bit string) which is not used in the comparison process and which describes to which script each character of the input character string belongs. This data will be used subsequently to determine how each token of each subkey is formed.A#! X!During forward scanning of each character of the input character string, a token is concatenated to the script identifier vector, which is initially empty. The token corresponds to the value assigned to the script to which the character definition of the character in process belongs. The value of the script is the logical number assigned implicitly to the script name header of the table section in which is located the character definition. If, due to tailoring, the character definition is ' moved before or after another character definition, it becomes part of the script whose name header comes before the new character definition.A#! 5.3.2.2 Formation of subkey level 1 through m minus 1 (level i; m=4 in the default) X!For i varying from 1 to m minus 1 (from 1 to 3 if the default is used), form subkey level i in the following way:A#! X!During forward scanning of each character of the input character string, a token is obtained. The token corresponds to the transformation value of that character at level i.A#! X!Note : In the default definition, characters of script are ignored from level 1 through 3. The definition of these characters can be been tailored to make them any of these characters a part of another script. The script is the first script to be defined in the default table. It contains special characters that are not, stricto sensu, a specific part of any natural language script for example, "dingbats" of ISO/IEC 10646, or punctuation for most scripts.A#! X!The scanning properties for the level i being processed requires to be carefully monitored. When there is a change in scanning direction at level i and the new direction is backward, stacking of the token will be done at the position where the change of direction has occurred. Therefore when such a condition occurs, the application shall retain the current position in the output subkey i as position p (push position).A#!   !y )19!y )19X!According to scanning direction assigned to the level i of the script whose identification corresponds to the character being processed, the obtained token is either added (concatenated) at the end of subkey i (which behaves like a list), or pushed at position p of subkey i (which then behaves like a stack). Subkey i is initially empty.A#! X!This is the equivalent of backward or forward scanning of the input string for that level. This property of scanning direction is given for each level of each script and is a script property. Each script header gives, for each level, the scanning direction property of the script.A#! Normally, in alphabetic scripts (and in the default), levels represent the following decomposition for each character: level 1:y base level of each script. This level corresponds to the basic letters of the alphabet for that script, if the script is alphabetic, and to each character of the script if the script is ideographic or syllabic;A#y level 2:y the level corresponding to diacritical marks affecting each basic character of the script. For some scripts, diacritics are always considered an integral part of the basic letters of the alphabet, and are not considered at this ' second level, but rather at the first. For example, N TILDE in Spanish is considered a basic letter of the Latin script. Therefore, tailoring for Spanish will change the definition of N TILDE from "the weight of an N in the first level and a tilde weight in the second level" to "the weight of an N TILDE (placed after N and before O) in the first level, and indication of the absence of extra diacritics in the second level"A#y level 3:y the level corresponding to case or to variant character shape that affects each basic character of the scriptA#y 5.3.2.3 Formation of subkey level m (m=4 in the default table) X!During forward scanning of each character of the input character string, a pair of tokens is concatenated to subkey level m . The first token of the pair corresponds to the logical position in the original character string of the character being processsed. The second token in the pair corresponds to the value assigned that character at level m of the table. When the character is not assigned at level m in the table, it is ignored for the formation of subkey level m and no pair is concatenated. The pair of tokens is concatenated immediately after subkey level m . Subkey level m is initially empty.A#! X!This level represents the level common to all scripts. In this standard, this level is considered as the first script (under the header ). The property of this level is positional in an absolute way. This means that the numerical value of the position in the original string has precedence over the weight assigned to the special character which occupies this position. This means that subkey level m is composed of a pair of values for each such character (the character string being always scanned forward in the logical string sequence). The first value of the pair corresponds to the sequential position of the character in the input string. The second value of the pair corresponds to the weight assigned to the character according to level m in script .A#! X!In the table, this behaviour is described using the couple of parameters "forward, position". To be conformant to this international standard, the parameter "backward, position" shall always be specified for level m . These two parameters shall be considered mutually exclusive.A#! X!In the default table, the first script (whose header is named ) exclusively includes characters that are not considered part of the set of basic characters of any script for example, special characters such as SPACE, HYPHEN, and "dingbats" of ISO/IEC 10646.A#! X!In the default table, definitions of these characters for levels 1 to 3 are such that they are ignored at these levels and values are exclusively assigned to level m (m being equal to 4 in the default).A#! 'ԌX!A#!  5.3.2.4 Formation of subkey level 5 X!This extra clause has been removed from the previous draft. It was intended for processing combining characters dynamically. There are more static solutions possible which will require tailoring if ever SC22 wants to go beyond level 1 conformance of ISO/IEC 10646.A#!  5.3.2.5 Posthandling The posthanding phase is part of the formation of a binary comparison key. Once the binary key has been formed out of the data specified in the table, the posthandling phase shall be invoked (see discussion about the potential purposes of such a phase in annex B). The result of the posthandling phase shall be returned as subkey level m1. 5.4 Table formation Table 1 through 4 are formed out of the LC_COLLATE specification data described in the following paragraphs. Each of the text element definition of the default contains 4 explicit values. Each value corresponds to an internallyused token. 5.5 Default table  X!Normative Annex 1 gives the international default ordering table used as a template for tailoring localized applications working on the full repertoire of ISO/IEC 10646 (the Universal multiocted coded character set).A#! 6. Conformance X!An application conforming to this international standard shall respect all the requirements of clause 5 of this document. A#!  \ Reminder:y Excerpt from the scope clause (to explain that 7.1, 7.2 and Annex H will be removed)A#y X!Note : [7.1, 7.2 and Annex H] will be removed: [...] it is no longer the intention of SC22/WG20 to have data specification remain in the present standard; as soon as CD 14652 will be harmonized with the syntax used in this standard, this will go away and be replaced by a normative reference. So it is possible that some raw elements of data specifications be left, incomplete, at this stage, in the present working draft, which define:A#! X! Xy a data specification for describing ordering tablesA#y X! Xy a tailoring specification to complete the data specification; This tailoring provision will allow modification of the default order data for a specific set of languages in each script in a reasonably compact way, without the burden of having to modify other scripts' definitions. In this way, the default order can be used as a template to define culturespecific orders that are similar to one another as much as possible.A#y  7.1 Data specification The following is a recapitulation of the POSIX syntax and its enhancements. Lines preceded by * correspond exactly to present POSIX syntax; others are enhancements necessary for a more flexible, complete and tailorable default. *X!LC_COLLATE A#! *X!collatingsymbol [from ]A#! *X!order_start [forward|backward|position][;[...]]A#! X! to be replaced at the beginning of each script by: A#! X! order [forward|backward|forward,position][;[...]]A#! *X!#),[[collatingsymbol | | IGNORE] [;[...]]A#) X! This statement is a character definition A#! X!redefine [[before|after] ]A#! X! This latter statement shall precede a new character definition A#! *X!order_endA#! *X!END LC_COLLATEA#!  'Ԍ X!rangeXy [ [[collatingsymbol] | * | |IGNORE][;[...]]]A#y X! This statement defines a whole range of characters A#! X!move [before|after] |A#! X!create order [forward|backward|position][;[...]]A#! X!Xy X $[after|before] [| A# 7.2 Tailoring Mechanism X!Essentially, this section describes how the previous new statements are handled to form an updated table. The tailoring described in this standard consists only of a table updating mechanism (which results in a new table replacing the default).A#! X!Note and questions from editor: input is expected here from Keld specification standard (see also current annex H which describes standard syntax used in 5.4 below. Should we separate: 5.4 which is the vanilla flavour of POSIX plus the script header addition and 5.5 specialized in tailoring aspects (while pointing at informative annexes [like current annex H] for explanations included in other standards)?A#!  X Normative annexes  Note: In this draft, annexes identified with a digit are intended to be normative. Annexes identified with a letter are intended to be informative.  Annex 1 (normative) International Default Table  d  @];#  h LC_COLLATE # D)claration des symboles internes / Declaration of internal symbols # # SYMB N$ Expl. # collating-symbol # # / # # collatingsymbol # 2 normal > voir/see collatingsymbol # 3 isol. collatingsymbol # 4 final collatingsymbol # 5 initial collatingsymbol # 6 medial/mdian # collating-symbol # 7 minuscule/minuscule (bas de casse/lower case) collating-symbol # 8 inf)rieur min./subscript min. (indice/index) collating-symbol # 9 sup)r. min./superscript min. (exposant/exponent) collating-symbol # 10 capitale/capital (haut de casse/upper case) collating-symbol # 11 inf)rieur en capitale/subscript capital collating-symbol # 12 sup)rieur en capitale/superscript capital # # / # collatingsymbol # 13 accent madda+::1B collatingsymbol # 14 accent hamza+::1B GI collatingsymbol # 141 accent hamza/waw11B collatingsymbol # 142 accent hamza under / hamza souscrit collatingsymbol # 143 accent under yeh / accent souscrit du ya' collatingsymbol # 144 accent hamza/yeh barree # collating-symbol # 15 de base/basic (non accentu)/non-accented) # collating-symbol # 16 particulier/peculiar collating-symbol # 17 ligature/ligature collating-symbol # 18 accent aigu/acute accent collating-symbol # 20 accent grave/grave accent collating-symbol # 21 br/ve/breve collating-symbol # 22 accent circonflexe/circumflex accent collating-symbol # 23 caron/caron collating-symbol # 24 rond sup)rieur/ring above collating-symbol # 25 tr)ma/diaeresis (ou/or umlaut) collating-symbol # 26 double ac. aigu/double acute ac. 'Ԍcollating-symbol # 27 tilde/tilde collating-symbol # 28 point/dot collating-symbol # 29 barre oblique/oblique collating-symbol # 30 c)dille/cedilla collating-symbol # 31 ogonek/ogonek collating-symbol # 32 macron/macron # collating-symbol <0> collating-symbol <1> collating-symbol <2> collating-symbol <3> collating-symbol <4> collating-symbol <5> collating-symbol <6> collating-symbol <7> collating-symbol <8> collating-symbol <9> # collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol

# 112 # 113 # 114 # 115 # 116 # 117 # 118 # 119 # 120 # 121 # 122 # 122b # # / #  '#Ԍ # # #  '$Ԍ  ` % order_start forward;backward;forward;forward,position # # SYMB. N$ GLY # IGNORE;IGNORE;IGNORE; # 32 IGNORE;IGNORE;IGNORE; # 33 _ IGNORE;IGNORE;IGNORE; # 34 <"_> IGNORE;IGNORE;IGNORE; # 35  (MACRON) IGNORE;IGNORE;IGNORE; # 36 IGNORE;IGNORE;IGNORE; # 37 - IGNORE;IGNORE;IGNORE; # 38 , IGNORE;IGNORE;IGNORE; # 39 ; IGNORE;IGNORE;IGNORE; # 40 : IGNORE;IGNORE;IGNORE; # 41 ! IGNORE;IGNORE;IGNORE; # 42  IGNORE;IGNORE;IGNORE; # 43 ? IGNORE;IGNORE;IGNORE; # 44  IGNORE;IGNORE;IGNORE; # 45 / IGNORE;IGNORE;IGNORE; # 46 <"/> IGNORE;IGNORE;IGNORE; # 47 . IGNORE;IGNORE;IGNORE; # 58  IGNORE;IGNORE;IGNORE; # 59  IGNORE;IGNORE;IGNORE; # 60 <";> IGNORE;IGNORE;IGNORE; # 61 ' IGNORE;IGNORE;IGNORE; # 62 <'6> IGNORE;IGNORE;IGNORE; # 63 <'9> IGNORE;IGNORE;IGNORE; # 64 " IGNORE;IGNORE;IGNORE; # 65 <"6> IGNORE;IGNORE;IGNORE; # 66 <"9> IGNORE;IGNORE;IGNORE; # 67  IGNORE;IGNORE;IGNORE; # 68  IGNORE;IGNORE;IGNORE; # 69 ( IGNORE;IGNORE;IGNORE; # 70 <(S> IGNORE;IGNORE;IGNORE; # 71 ) IGNORE;IGNORE;IGNORE; # 72 <)S> IGNORE;IGNORE;IGNORE; # 73 [ IGNORE;IGNORE;IGNORE; # 74 ] IGNORE;IGNORE;IGNORE; # 75 { IGNORE;IGNORE;IGNORE; # 76 } IGNORE;IGNORE;IGNORE; # 77  IGNORE;IGNORE;IGNORE; # 78  IGNORE;IGNORE;IGNORE; # 79  IGNORE;IGNORE;IGNORE; # 80  IGNORE;IGNORE;IGNORE; # 81 IGNORE;IGNORE;IGNORE; # 82 @ IGNORE;IGNORE;IGNORE; # 83  IGNORE;IGNORE;IGNORE; # 84  IGNORE;IGNORE;IGNORE; # 85 $ IGNORE;IGNORE;IGNORE; # 86  IGNORE;IGNORE;IGNORE; # 87  IGNORE;IGNORE;IGNORE; # 88 * '&Ԍ IGNORE;IGNORE;IGNORE; # 89 \ IGNORE;IGNORE;IGNORE; # 90 & IGNORE;IGNORE;IGNORE; # 91 # IGNORE;IGNORE;IGNORE; # 92 % IGNORE;IGNORE;IGNORE; # 93 <-S> IGNORE;IGNORE;IGNORE; # 94 + IGNORE;IGNORE;IGNORE; # 95 <+S> IGNORE;IGNORE;IGNORE; # 96  IGNORE;IGNORE;IGNORE;<0> # 123  IGNORE;IGNORE;IGNORE;<1> # 124 ` IGNORE;IGNORE;IGNORE;<2> # 125 <"(> IGNORE;IGNORE;IGNORE;<3> # 126 ^ IGNORE;IGNORE;IGNORE;<4> # 127 <"<> IGNORE;IGNORE;IGNORE;<5> # 128 <"0> IGNORE;IGNORE;IGNORE;<6> # 129  IGNORE;IGNORE;IGNORE;<7> # 130 <""> IGNORE;IGNORE;IGNORE;<8> # 131 ~ IGNORE;IGNORE;IGNORE;<9> # 132 <".> IGNORE;IGNORE;IGNORE; # 133  IGNORE;IGNORE;IGNORE; # 134 ' IGNORE;IGNORE;IGNORE; # 135 IGNORE;IGNORE;IGNORE; # 136 < IGNORE;IGNORE;IGNORE; # 137 <=<> IGNORE;IGNORE;IGNORE; # 138 = IGNORE;IGNORE;IGNORE; # 139 => IGNORE;IGNORE;IGNORE; # 140 > IGNORE;IGNORE;IGNORE; # 141  IGNORE;IGNORE;IGNORE; # 142 | IGNORE;IGNORE;IGNORE; # 143 | IGNORE;IGNORE;IGNORE; # 144 $ IGNORE;IGNORE;IGNORE; # 145  IGNORE;IGNORE;IGNORE; # 146 IGNORE;IGNORE;IGNORE; # 147 IGNORE;IGNORE;IGNORE;

# 148 <_V/>> IGNORE;IGNORE;IGNORE; # 149 <_V-> IGNORE;IGNORE;IGNORE; # 150 <_V IGNORE;IGNORE;IGNORE; # 151 <_!/>> IGNORE;IGNORE;IGNORE; # 152 <_!-> IGNORE;IGNORE;IGNORE; # 153 <_!<> IGNORE;IGNORE;IGNORE; # 154 <_A/>> IGNORE;IGNORE;IGNORE; # 155 <_-A> IGNORE;IGNORE;IGNORE; # 156 <_A<> IGNORE;IGNORE;IGNORE; # 157 <_!> IGNORE;IGNORE;IGNORE; # 158 <_-> # IGNORE;IGNORE;IGNORE; # 159 <_=> IGNORE;IGNORE;IGNORE; # 160 <<-> IGNORE;IGNORE;IGNORE; # 161 <-/>> IGNORE;IGNORE;IGNORE; # 162 <"7> IGNORE;IGNORE;IGNORE; # 163 <-!> IGNORE;IGNORE;IGNORE; # 164 <-v> ''Ԍ IGNORE;IGNORE;IGNORE; # 165 <_d!> IGNORE;IGNORE;IGNORE; # 166 <_/>//> IGNORE;IGNORE;IGNORE; # 167 <_<\> IGNORE;IGNORE;IGNORE; # 168 <_./>//> IGNORE;IGNORE;IGNORE; # 169 <_.<\> # # / # IGNORE;IGNORE;IGNORE; IGNORE;IGNORE;IGNORE; IGNORE;IGNORE;IGNORE; IGNORE;IGNORE;IGNORE; IGNORE;IGNORE;IGNORE; IGNORE;IGNORE;IGNORE; IGNORE;IGNORE;IGNORE; IGNORE;IGNORE;IGNORE; IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # IGNORE;IGNORE;IGNORE; # # # # IGNORE;IGNORE;IGNORE; #point_sheva IGNORE;IGNORE;IGNORE; #point_hataf_segol '(Ԍ IGNORE;IGNORE;IGNORE; #point_hataf_patah IGNORE;IGNORE;IGNORE; #point_hataf_qamats IGNORE;IGNORE;IGNORE; #point_hiriq IGNORE;IGNORE;IGNORE; #point_tsere IGNORE;IGNORE;IGNORE; #point_segol IGNORE;IGNORE;IGNORE; #point_patah IGNORE;IGNORE;IGNORE; #point_qamats IGNORE;IGNORE;IGNORE; #point_holam IGNORE;IGNORE;IGNORE; #point_qubuts IGNORE;IGNORE;IGNORE; #point_dagesh IGNORE;IGNORE;IGNORE; #point_meteg IGNORE;IGNORE;IGNORE; #point_rafe IGNORE;IGNORE;IGNORE; #point_shin_dot IGNORE;IGNORE;IGNORE; #point_sin_dot  ) order_start forward;backward;forward;forward,position # U0020;;;IGNORE # 170 # <0>;;;IGNORE # 171 0 <1>;;;IGNORE # 172 1 <2>;;;IGNORE # 173 2 <3>;;;IGNORE # 174 3 <4>;;;IGNORE # 175 4 <5>;;;IGNORE # 176 5 <6>;;;IGNORE # 177 6 <7>;;;IGNORE # 178 7 <8>;;;IGNORE # 179 8 <9>;;;IGNORE # 180 9 # <0>;;;IGNORE # 181 <18> <0>;;;IGNORE # 182  <0>;;;IGNORE # 183 <38> <0>;;;IGNORE # 184 <58> <0>;;;IGNORE # 185 <78> <0>;;;IGNORE # 186  <0>;;;IGNORE # 187  <0>;;;IGNORE # 188 <0S> <1>;;;IGNORE # 189 N <2>;;;IGNORE # 190  <3>;;;IGNORE # 191  <4>;;;IGNORE # 192 <4S> <5>;;;IGNORE # 193 <5S> <6>;;;IGNORE # 194 <6S> <7>;;;IGNORE # 195 <7S> <8>;;;IGNORE # 196 <8S> <9>;;;IGNORE # 197 <9S> # ;;;IGNORE # 198 a ;;;IGNORE # 199  ;;;IGNORE # 200  ;;;IGNORE # 201 ! ;;;IGNORE # 202  ;;;IGNORE # 203 M ;;;IGNORE # 204  ;;;IGNORE # 205 # ;;;IGNORE # 206 ;;;IGNORE # 207 ;;;IGNORE # 208 ;;;IGNORE # 209 % ;;;IGNORE # 210 b ;;;IGNORE # 211 c ;;;IGNORE # 212 ' ;;;IGNORE # 213 ;;;IGNORE # 214 > ;;;IGNORE # 215  '*Ԍ ;;;IGNORE # 216 ;;;IGNORE # 217 d ;;;IGNORE # 218 W ;;;IGNORE # 219 ;;;IGNORE # 220 ;;;IGNORE # 221 e ;;;IGNORE # 222 ) ;;;IGNORE # 223 / ;;;IGNORE # 224 + ;;;IGNORE # 225 - ;;;IGNORE # 226 ;;;IGNORE # 227 ;;;IGNORE # 228 ;;;IGNORE # 229 ;;;IGNORE # 230 f ;;;IGNORE # 231 g ;;;IGNORE # 232 ;;;IGNORE # 233 > ;;;IGNORE # 234 ;;;IGNORE # 235 ;;;IGNORE # 236 h ;;;IGNORE # 237 > ;;;IGNORE # 238 ;;;IGNORE # 239 i ;;;IGNORE # 240 1 ;;;IGNORE # 241 7 ;;;IGNORE # 242 3 ;;;IGNORE # 243 5 ;;;IGNORE # 244 ;;;IGNORE # 245 ;;;IGNORE # 246 ;;;IGNORE # 247 ;;;IGNORE # 248 ;;;IGNORE # 249 j ;;;IGNORE # 250 > ;;;IGNORE # 251 k ;;;IGNORE # 252 ;;;IGNORE # 253 ;;;IGNORE # 254 l ;;;IGNORE # 255 ;;;IGNORE # 256 ;;;IGNORE # 257 ;;;IGNORE # 258 ;;;IGNORE # 259 ;;;IGNORE # 260 m ;;;IGNORE # 261 n ;;;IGNORE # 262 9 ;;;IGNORE # 263 <'n> ;;;IGNORE # 264 ;;;IGNORE # 265 ;;;IGNORE # 266  '+Ԍ ;;;IGNORE # 267 ;;;IGNORE # 268 o ;;;IGNORE # 269  ;;;IGNORE # 270 ; ;;;IGNORE # 271 A ;;;IGNORE # 272 = ;;;IGNORE # 273 S ;;;IGNORE # 274 ? ;;;IGNORE # 275 Q ;;;IGNORE # 276 ;;;IGNORE # 277 ;;;IGNORE # 278

;;;IGNORE # 279 p ;;;IGNORE # 280 q ;;;IGNORE # 281 r ;;;IGNORE # 282 ;;;IGNORE # 283 ;;;IGNORE # 284 ;;;IGNORE # 285 s ;;;IGNORE # 286 ;;;IGNORE # 287 > ;;;IGNORE # 288 ;;;IGNORE # 289 ;;;IGNORE # 290  ;;;IGNORE # 291 t ;;;IGNORE # 292 ;;;IGNORE # 293 ;;;IGNORE # 294 ;;;IGNORE # 296 u ;;;IGNORE # 297 C ;;;IGNORE # 298 I ;;;IGNORE # 299 E ;;;IGNORE # 300 G ;;;IGNORE # 301 ;;;IGNORE # 302 ;;;IGNORE # 303 ;;;IGNORE # 304 ;;;IGNORE # 305 ;;;IGNORE # 306 ;;;IGNORE # 307 v ;;;IGNORE # 308 w ;;;IGNORE # 309 > ;;;IGNORE # 310 x ;;;IGNORE # 311 y ;;;IGNORE # 312 U ;;;IGNORE # 313  ;;;IGNORE # 314 > ;;;IGNORE # 315 z ;;;IGNORE # 316 ;;;IGNORE # 317 ;;;IGNORE # 318  ',Ԍ ;;;IGNORE # 318b X # ;;;IGNORE # 319 A ;;;IGNORE # 320  ;;;IGNORE # 321  ;;;IGNORE # 322  ;;;IGNORE # 323 L ;;;IGNORE # 324  ;;;IGNORE # 325 " ;;;IGNORE # 326 ;;;IGNORE # 327 ;;;IGNORE # 328 ;;;IGNORE # 329 $ ;;;IGNORE # 330 B ;;;IGNORE # 331 C ;;;IGNORE # 332 & ;;;IGNORE # 333 ;;;IGNORE # 334 > ;;;IGNORE # 335 > ;;;IGNORE # 336 ;;;IGNORE # 337 D ;;;IGNORE # 338 V ;;;IGNORE # 339 ;;;IGNORE # 340 ;;;IGNORE # 341 E ;;;IGNORE # 342 ( ;;;IGNORE # 343 . ;;;IGNORE # 344 * ;;;IGNORE # 345 , ;;;IGNORE # 346 ;;;IGNORE # 347 ;;;IGNORE # 348 ;;;IGNORE # 349 ;;;IGNORE # 350 F ;;;IGNORE # 351 G ;;;IGNORE # 352 ;;;IGNORE # 353 > ;;;IGNORE # 354 ;;;IGNORE # 355 ;;;IGNORE # 356 H ;;;IGNORE # 357 > ;;;IGNORE # 358 ;;;IGNORE # 359 I ;;;IGNORE # 360 0 ;;;IGNORE # 361 6 ;;;IGNORE # 362 2 ;;;IGNORE # 363 4 ;;;IGNORE # 364 ;;;IGNORE # 365 ;;;IGNORE # 366 ;;;IGNORE # 367  '-Ԍ ;;;IGNORE # 368 ;;;IGNORE # 369 J ;;;IGNORE # 370 > ;;;IGNORE # 371 K ;;;IGNORE # 372 ;;;IGNORE # 373 L ;;;IGNORE # 374 ;;;IGNORE # 375 ;;;IGNORE # 376 ;;;IGNORE # 377 ;;;IGNORE # 378 ;;;IGNORE # 379 M ;;;IGNORE # 380 N ;;;IGNORE # 381 8 ;;;IGNORE # 382 ;;;IGNORE # 383 ;;;IGNORE # 384 ;;;IGNORE # 385 ;;;IGNORE # 386 O ;;;IGNORE # 387 : ;;;IGNORE # 388 @ ;;;IGNORE # 389 < ;;;IGNORE # 390 R ;;;IGNORE # 391 > ;;;IGNORE # 392 P ;;;IGNORE # 393 ;;;IGNORE # 394 ;;;IGNORE # 395

;;;IGNORE # 396 P ;;;IGNORE # 397 Q ;;;IGNORE # 398 R ;;;IGNORE # 399 ;;;IGNORE # 400 ;;;IGNORE # 401 ;;;IGNORE # 402 S ;;;IGNORE # 403 ;;;IGNORE # 404 > ;;;IGNORE # 405 ;;;IGNORE # 406 ;;;IGNORE # 407 T ;;;IGNORE # 408 ;;;IGNORE # 409 ;;;IGNORE # 410 ;;;IGNORE # 412 U ;;;IGNORE # 413 B ;;;IGNORE # 414 H ;;;IGNORE # 415 D ;;;IGNORE # 416 F ;;;IGNORE # 417 ;;;IGNORE # 418 ;;;IGNORE # 419  '.Ԍ ;;;IGNORE # 420 ;;;IGNORE # 421 ;;;IGNORE # 422 ;;;IGNORE # 423 V ;;;IGNORE # 424 W ;;;IGNORE # 425 > ;;;IGNORE # 426 X ;;;IGNORE # 427 Y ;;;IGNORE # 428 T ;;;IGNORE # 429 > ;;;IGNORE # 430 ;;;IGNORE # 431 Z ;;;IGNORE # 432 ;;;IGNORE # 433 ;;;IGNORE # 434 ;;;IGNORE # 411 Y  / order_start forward;forward;forward;forward,position <0>;;;IGNORE <0>;;;IGNORE <1>;;;IGNORE <1>;;;IGNORE <2>;;;IGNORE <2>;;;IGNORE <3>;;;IGNORE <3>;;;IGNORE <4>;;;IGNORE <4>;;;IGNORE <5>;;;IGNORE <5>;;;IGNORE <6>;;;IGNORE <6>;;;IGNORE <7>;;;IGNORE <7>;;;IGNORE <8>;;;IGNORE <8>;;;IGNORE <9>;;;IGNORE <9>;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE '0Ԍ ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE >;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE order_start backward;backward;backward;forward,position ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE '1Ԍ ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE '2Ԍ ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE  '3Ԍ ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE '4Ԍ ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;;IGNORE ;;;;IGNORE ;;;;IGNORE ;;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE  P5 order_start forward;forward;forward;forward,position ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE ;;;IGNORE UNDEFINED IGNORE;IGNORE;IGNORE;IGNORE # order_end # END LC_COLLATE _________________________________________ Missing characters in this working draft: Discontinuit)s actuelles / Present discontinuities range U0114-U0115 range U012C-U012D range U014E-U014F range U017F-U0305 range U0308-U0309 U030B range U030D-U0327 range U0329-U0331 range U0333-U0337 range U0339-U2017 (except. 06xx ) range U201A-U201B  '6Ԍrange U201E-U206F range U2071-U2073 range U207A-U207C range U207F-U20D0 range U20D2-U2121 range U2123-U2125 range U2127-U215A range U215F-U218F range U2194-U220D range U220F-U225F range U2261-U2263 range U2266-U24FF range U2503-U250B range U250D-U250F range U2511-U2513 range U2515-U2517 range U2519-U251B range U251D-U2523 range U2525-U252B range U252D-U2533 range U2535-U253B range U253D-U2570 range U2573-U25E1 range U25E4-U2569 range U256B-UFFFF (except. Fxxx ) *end of the default table* P7  Xt4 P];0  Annex 2 (normative) Benchmark 1List with required result of the defaultp& 2List with required result after example of tailoringp&  (8 Informative annexes Note: In this draft, annexes identified with a digit are intended to be normative. Annexes identified with a letter are intended to be informative.  Annex A (informative) y  Criteria used initially to prepare the standard p&y Note:these criteria have been subject to change. They represented an optimum. Compromises had to be done according to diverse circumstances later on. p& 1. The mechanism must provide a deterministic way to collate graphic character strings. Thus, if two strings of graphic characters are different when directly compared in binary, the order assigned by the mechanism should be always the same and the strings will be considered different even if they are externally considered equivalent by humans. 2. For each script, if this is possible, the order assigned will be culturally acceptable to a majority of users of that script. 3. The repertoire of characters supported should be at least the one defined by Level three implementation (the richest possible) of ISO/IEC 10646. 4. The ordering table will be defined keeping in mind the following points concerning internal string transformation number assignments: ©X!the assignments are processed as efficiently as possible if they are stored in a permanent way, andp&! ©X!the assignments allow direct and correct onepass binary comparisons between two resultant number sequences.p&! The table is defined this way because it is always possible to define an order between two strings by whatever complex method is used. However, real systems must have a minimum level of performance. Once assignment is made on original strings, the result must be storable without modification. Also, the result must be directly reusable for comparison purposes, without having to redo the conversion process each time. This will also enable existing systems to make comparisons with minimum changes and sometimes without having to change programs. 5. There must be a mechanism to use the table as a template, primarily to optimise the process for the user's language. In the template, the order of a series of characters may be modified by simple a posteriori declaration, without having to specify the whole table again. 6. Given the reusable comparison keys obtained (see 4), it must be possible to reconstitute the original as is without the need to preserve it. This means that the reversibility of the process must be available to applications if required. As valuable information, this list of requirements can already be satisfied by Canadian Standard CAN/CSA '9 Z243.4.1 for West European languages, except that this standard is monoscript and does not support composite sequences as defined in ISO/IEC 10646. However, preliminary studies suggest that it is possible to extend the Canadian method to take into account both the multiscript requirement and the presence of composite sequences. ISO/IEC 99452 (POSIX2) allows the Canadian standard CAN/CSA Z243.4.11992 to be described. However, it could require modifications of the model to handle both the multiscript requirements and the need for composite sequences if an infinite repertoire is necessary for a given environment. The application of this standard will not require full POSIX2 conformance, but will be as compatible as possible with the POSIX LOCALE LC_COLLATE specification model. Otherwise, this standard will build on this specification model in attempting to make as few modifications as possible (particularly structural modifications).  , : Annex B (informative) Description of the prehandling phase Prehandling is essentially for modification and/or duplication of original records to render their fields contextindependent prior to the comparison phase. Examples are: ©X!duplicating a string such as "41" for phonetic ordering into 3 strings for trilingual phonetic ordering usage (French, English and German"):p&! X!Xy QUARANTEETUNp&y X!Xy FORTYONEp&y X!Xy EINUNDVIERZIGp&y ©X!removing or rotating characters that are a nuisance for special requirements of ordering; for example, in France, removing "de" in "de Gaulle" and not removing "De" in "De Gaulle" according to nobiliar origin or not, to give:p&! X!Xy Gaulle (de)p&y X!Xy De Gaullep&y ©X!transform incomplete data into full form; for example, transform "Mc Arthur" to give "Mac Arthur"p&! ©X!transform numbers so that the result will be ordered in numerical order and not positionally or according to phonetics, for example:p&! !y Given the strings "100" and "15",p&y X!Xy ©X $either separate each of these numbers in different fields from the rest of text and convert them entirely in standard numeric (binary) data to be ordered numerically and not textually, orp& X!Xy ©X $pad/align numbers to make sure the onephase default ordering mechanism will process them correctly:p& X!Xy  $"015"p&y X!Xy X $"100"p& ©X!transform Roman numerals into Arabic numbers after having determined the context (perhaps with the help of human interactive intervention or an expert system), as in the following French example:p&! ';Ԍ X!Xy CHAPITRE DIX might mean CHAPTER 010 or CHAPTER 509 ("dix" is the French word for 10, it is also the Roman numeral for 509). This generally requires context to be solved with total certainty.p&y Description of the Posthandling Phase Postprocessing is essentially for modifying resulting keys, or appending the original string to keys so that the results of comparisons can determine differences in the case of homography when the prehandling phase, particularly, has been done. For example, there could be equivalencies if numerical values (for example, "010" and "10") have been standaredized in the prehandling phase. The default ordering mechanism has no knowledge that the original strings are different in such cases, but the predictability requirement still exists. In particular, where different coding methods have been used in the original strings to be ordered in the same process, the posthandling phase can determine internal differences which would appear exactly the same on paper for endusers (for example, an ISO 2022 input stream intermixing ISO/IEC 6937 and ISO/IEC 8859). The DefaultTailorable Ordering Mechanism does not cover the prehandling and posthandling phases. However, the mechanism does describe these phases. The presence of the phases is mandatory even if empty processes must be defined. These empty processes can be replaced if the need occurs.  < Annex C Sources for methods and data gathering CAN/CSA Z243.4.1 Canadian ordering standardp& CAN/CSA Z243.230 Canadian minimum software localization parameters p& IBM NLTC Volume 2 reference manualp& IBM Egypt and Egypt Standardsp& Stefan Fuchs and Israel Standardsp& CEN TC304 Multilingual sorting standard projectp& LOCALES provisionally registered in x/Open or in SC22/WG15 (DKUUG.DK Internet site)p& R/gles du classement alphab)tique en langue fran'aise et proc)dure informatis)e pour le tri, Alain LaBont), Minist/re des Communications du Qu)bec, 1988 ISBN 2550190467p& Technique de r)duction Tris informatiques ! quatre cl)s, Alain LaBont), Minist/re des Communications du Qu)bec, 1989 ISBN 2550199650p& Fonctions de syst/mes Soutien des langues nationales, Alain LaBont), Minist/re des Communications du Qu)bec, 1988p& National Language Architecture Klaus daube, SHARE EUROPE White Paper, 1990p&  4= Annex D (informative) Preliminary principles of table assignments The principles of numeric table assignments are the following: a)X!All characters are assigned a value corresponding to the identification of the script. Each script header is given a name mainly for the purposes of tailoring. However, conceptually, a number corresponding to the identification of the script can be assigned to this name, which then serves as a variable. This script identification data is informative only and does not serve in the comparison process. However, the identification data may be necessary for determining the scanning direction of diacritics for that script. This data must sometimes be retained alongside with the ordering strings to meet the reversibility requirements above (capacity to reconstitute the original strings given the different subkeys that are a result of the multilevel transformation).p&! b)X!Each letter is assigned a basic normalised letter value (or a pair or a triad for ligatures). The assignment is made as first level (ideographic characters are assigned their standardised CJK order, corresponding to the order they have in ISO/IEC 10646). The assignment is in the order of the alphabet to which they belong for example, LATIN CAPITAL LETTER E WITH CIRCUMFLEX ACCENT is assigned a numerical value corresponding to the same value attributed to LATIN SMALL LETTER E. Such a definition is valid for most Latinscriptbased languages. Vietnamese would require a different definition, E CIRCUMFLEX being a base letter in this language.p&! c)X!Each letter is assigned an nplet of values (or 2 nplets or 3 nplets for ligatures) as 2nd level, which corresponds to the maximum realistic number of combining characters encountered in all world scripts for a given basic letter to which it applies. When there is only one diacritic, the second and third elements of the triplet are place holders. When there is no diacritic, three place holders are provided in each triplet, and so on. For each diacritic of a triplet, a flag is put in the nexttolast level to indicate an integrated diacritic (as opposed to a combining character). Note that for level 1 conformance to ISO 10646 (or if composite sequences are all predefined by "collating symbol" statements), the nplet of values for each character can be made equal to a single token because no analysis of combining diacritics will ever be necessary (and the nexttolast level, reserved for future use, will be empty).p&! X!Ideographs are assigned no value for this level according to ISO/IEC 10646 level 1 of conformance. This is because ideographs will be compared against completely different values simultaneously at the first level, and thus there will be no collision in comparison operations at this level. (Ideographs are not assigned equivalencies at the first level). Levels 2 or 3 of conformance could be processed with the same model as the one for letters, for theoretical combinations.p&! d)X!Each letter is assigned a value (or a pair or a triad for ligatures) as 3rd level, corresponding to the form of the letter (for example, upper or lower case for Latin, or freestanding, initial, medial, or ending form for Arabic). Ideographs are assigned no value for this level.p&! e)X!This paragraph was removed from the previous version.p&! '>Ԍ f)X!Each special character (a character not specifically belonging to a specific script, such as COPYRIGHT SIGN, or COMMA) is assigned a value as 4th level value. This is a worldwide common numerical value that is preceded with the position it occupies in the original string to be processed. Currently, no other level value is assigned in the default table.p&! g)X!this paragraph was removed from the previous draft.p&! Given such table assignments, a table of scanning directions will be provided for each script and for each of the levels. Note that scanning direction is not linked to the natural script direction, since the characters are already linearly coded according to their script direction (logical direction). This is linked to the direction in which each level is processed for ordering. For example, in French, diacritical marks are scanned backward in case of first level homography: accents are not considered for ordering in French except for specifying the order of quasihomographs. In this case, the last difference in the words determines the order, thus explaining the retrograde scanning (an example of an ordered list is: "cote", "c=te", "cot)", "c=t)"). When string direction is retrograde for a character in a given level, the value assigned to this level is placed in front of the resulting key instead of at the end for this level. Given that each subkey is established at all levels, and provided that a lowvalue delimiter is placed between each subkey , all subkeys can be concatenated at once and used for subsequent comparisons. (If values are carefully chosen for tablebuilding, no lowvalue delimiter is necessary). Given that all the information is present, the original string provided can be reconstituted from the subkeys. Reduction techniques exist to minimise the amount of storage requirements for that method without affecting the comparison process if keys are to be preserved for maximum performance reasons. (see References).  |? Annex D (informative) Principles of the comparison engine The basic philosophy behind the culturallycorrect character string comparison engine is the following: 1.No comparison mechanism is culturally correct when it assumes that the order is based on numerical internal values of raw character strings, and with any standard character set coding scheme.p& 2.If two strings are different, there must be a fully predictable order assigned to each one relative to each other one.p& 3.Ordering rules are languagerelated in a given script.p& 4.Whatever the language, the ordering rules are based on lexical order at the lowest level. Higher level classification (done in a prehandling phase) produces character strings whose ordering is to be made as for any other lexical entry.p& 5.Each rule tentatively determines an order between two different character strings by operating a single binary comparison on binary strings that represent the result of a straightforward and contextindependent transformation of the characters of each string. (Transformations typically involve ignoring, or giving a specific or generic weight to each character, or retaining the position of a character as a weight while assigning it a second weight depending on the character itself. Such transformations may be done by scanning the string forward or backward in the logical string sequence, except for the positional case which only implies the logical positions of a string).p& 6.Transformations can typically produce equivalencies for two different character strings transformed into two identical binary strings. Thus, when such cases are encountered, other sequential series of transformation are necessary until, at a final level, all ties are solved (at the last level, binary strings are necessarily different if two original character strings to be compared are different). If the only goal of a comparison is to determine equivalence up to a certain level of precision, then character transformation is required up to a certain level only.p& 7.The default table will define as many levels as necessary to produce a fully predictable order for two different character strings. This involves up to five comparison levels if characters of ISO/IEC 10646 level 1 are used, and up to six comparison levels if characters of ISO/IEC 10646 level 3 are used. An extra level (used for data management and not of particular significance for comparisons) is also defined (see 9 below).p& 8.A whole character string is transformed as many times as necessary into up to six different levels. Thus, it must be possible to deduce the original character string from all the different binary transformations concatenated into one binary string (reversibility property of the transformation process).p& 9.Different scripts may have different properties as to the way each level is processed. Thus, to ensure the operation will be reversed, an extra level transformation table is necessary to identify the script to which each character belongs.p&  '@Ԍ Annex E. Revised (if necessary) SC22/WG20 N 174 - From a requirement to its implementation - Compare, Sort, Searchp& Removed from the previous versionp& Annex F. Discussion on the number of levels for each script and their harmonizationp& Text will be added if necessaryp& Annex G. Example of national classification standards and how they can be harmonized to the international standardp& AFNOR Z.44001p& ANSI/NISO Z39.75199X (project at time of editing WD3) p& DIN 5007p&  A Annex H. Standard LOCALE parameters definitions unextended Text obtained from: ISO/IEC 9945-2 Locale: ------ A locale is the definition of the subset of a user's environment that depends on language and cultural conventions. It is made up from one or more categories. Each category is identified by its name and controls specific aspects of the behavior of components of the system. Category names correspond to the following environment variable names: LC_CTYPE Character classification and case conversion. LC_COLLATE Collation order. LC_TIME Date and time formats. LC_NUMERIC Numeric, nonmonetary formatting. LC_MONETARY Monetary formatting. LC_MESSAGES Formats of informative and diagnostic messages and interactive responses. Category Specifications: ------------------------ LC_CTYPE -------- The LC_CTYPE category shall define character classification, case conversion, and other character attributes. In addition, a series of characters can be represented by three adjacent periods representing an ellipsis symbol ("..."). The ellipsis specification shall be interpreted as meaning that all values between the values preceding and following it represent valid characters. The following keywords shall be recognized: copy Specify the name of an existing locale to be used for the definition of the category. If this keyword is specified, no other keyword shall be specified.  'BԌ upper Define characters to be classified as uppercase letters. No character specified for the keywords cntrl, digit, punct, or space shall be specified. lower Define characters to be classified as lowercase letters. No character specified for the keywords cntrl, digit, punct, or space shall be specified. alpha Define characters to be classified as letters. No character specified for the keywords cntrl, digit, punct, or space shall be specified. Characters classified as either upper or lower are automatically included in this class. digit Define characters to be classified as numeric digits. Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 shall be specified, and in contiguous ascending sequence by numerical value. space Define characters to be classified as white-space characters. No character specified for the keywords upper, lower, alpha, digit, graph, or xdigit shall be specified. The characters , , , , , and , and any characters included in the class blank, are automatically included in this class. cntrl Define characters to be classified as control characters. No character specified for the keywords upper, lower, alpha, digit, punct, graph, print, or xdigit shall be specified. punct Define characters to be classified as punctuation characters. No character specified for the keywords upper, lower, alpha, digit, cntrl, xdigit, or as the character shall be specified. graph Define characters to be classified as printable characters, not including the character. Characters specified for the keywords upper, lower, alpha, digit, xdigit, and punct are automatically included in this class. No character specified for the keyword cntrl shall be specified. print Define characters to be classified as printable characters, including the character. Characters specified for the keywords upper, lower, alpha, digit, xdigit, punct, 'CԌ and the character are automatically include in this class. No character specified for the keyword cntrl shall be specified. xdigit Define the characters to be classified as hexadecimal digits. Only the characters defined for the class digit shall be specified, in contiguous ascending sequence by numerical value, followed by one ore more sets of six characters representing the hexadecimal digits 10 through 15, with each set in ascending order. blank Define characters to be classified as characters. The characters and are automatically included in this class. toupper Define the mapping of lowercase letters to uppercase letters. The operand shall consist of character pairs, separated by semicolons. The characters in each character pair shall be separated by a comma and the pair enclosed by parentheses. The first character in each pair shall be the lowercase letter, the second the corresponding uppercase letter. Only characters specified for the keywords lower and upper shall be specified. tolower Define the mapping of uppercase letters to lowercase letters. The operand shall consist of character pairs, separated by semicolons. The characters in each character pair shall be separated by a comma and the pair enclosed by parentheses. The first character in each pair shall be the uppercase letter, the second the corresponding lowercase letter. Only characters specified for the keywords lower and upper shall be specified. If the tolower keyword is omitted from the locale definition, the mapping shall be the reverse mapping of the one specified for toupper. LC_COLLATE ---------- A collation sequence definition shall define the relative order between collating elements (characters and multicharacter collating elements) in the locale. This order is expressed in terms of collation values; i.e., by assigning each element on or more collation values (also known as collation weights). This does not imply that implementations shall assign such values, but that ordering of 'DԌstrings using the resultant collation definition in the locale shall behave as if such assignment is done and used in the collation process. The collation sequence definition shall be used by regular expressions, pattern matching, and sorting. The following capabilities are provided: (1) Multicharacter collating elements. Specification of multicharacter collating elements (e.e., sequences of two or more characters to be collated an an entity). (2) User-defined ordering of collating elements. Each collating element shall be assigned a collation value defining its order in the character (or basic) collating sequence. This ordering in used by regular expressions and pattern matching and, unless collation weights are explicitly specified, also as the collation weight to be used in sorting. (3) Multiple weights and equivalence classes. Collating elements can be assigned one or more (up to the limit {COLL_WEIGHTS_MAX}) collating weights for use in sorting. The first weight is hereafter referred to as the primary weight. (4) One-to-Many mapping. A single character is mapped into a string of collating elements. (5) Equivalence class definition. Two or more collating elements have the same collation value (primary weight). (6) Order by weights. When two string are compared to determine their relative order, the two strings are first broken up into a series of collating elements, and each successive pair of elements are compared according to the relative primary weights for the elements. If equal, and more than one weight has been assigned, then the pairs of collating elements are recompared according to the relative subsequent weights, until either a pair of collating elements compare unequal or the weights are exhausted. The following keywords shall be recognized in a collation sequence definition. They are described in detail in the following subclauses. copy Specify the name of an existing locale to be used for the definition of the category. If this keyword is specified, no other keyword shall be specified. collating-element Define a collating-element symbol representing a multicharacter collating element. This keyword 'EԌ is optional. collating-symbol Define a collating symbol for use in collation order statements. This keyword is optional. order_start Define collation rules. This statement is followed by one or more collation order statements, assigning character collation values and collation weights to collating elements. order_end Specify the end of the collation-order statements. collating-element Keyword ------------------------- In addition to the collating elements in the character set, the collating-element keyword shall be used to define multicharacter collating elements. collating-symbol Keyword ------------------------ This keyword shall be used to define symbols for use in collation sequence statements; i.e., between the order_start and the order_end keywords. The collating-symbol keyword defines a symbolic name that can be associated with a relative position in the character order sequence. While such a symbolic name does not represent any collating element, it can be used as a weight. order_start Keyword ------------------- The order_start keyword shall precede collation order entries. It defines the number of weights for this collation sequence definition and other collation rules. The operands to the order_start keyword are optional. If present, the operands define rules to be applied when strings are compared. The number of operands define how many weights each element is assigned; if no operands are present, one forward operand is assumed. If present, the first operand defines rules to be applied when comparing strings using the first (primary) weight; the second when comparing strings using the second weight, and so on. Operands shall be separated by semicolons (;). Each operand shall consist of one or more collation 'FԌdirectives, separated by commas (,). If the number of operands exceeds the {COLL_WEIGHTS_MAX} limit, the utility shall issue a warning message. The following directives shall be supported: forward Specifies that comparison operations for the weight level shall precede from start of string towards the end of the string. backward Specifies that comparison operations for the weight level shall precede from end of string towards the beginning of string. position Specifies that comparison operations for the weight level will consider the relative position of non-IGNOREd elements in the strings. The string containing a non-IGNOREd element after the fewest IGNOREd collating elements from the start of the compare shall collate first. If both strings contain a non-IGNOREd character in the same relative position, the collating values assigned to the elements shall determine the ordering. In case of equality, subsequent non-IGNOREd characters shall be considered in the same manner. The directives forward and backward are mutually exclusive. Other sections' descriptions are irrelevant for this standard. Titles of other sections are given here as an indication. LC_TIME ------- LC_NUMERIC ---------- LC_MONETARY ----------- LC_MESSAGES -----------  %G Caract/res h)breu non encore publi)s dans l'ISO/CEI 10646_1 IGNORE;IGNORE;IGNORE; #accent_etnahta IGNORE;IGNORE;IGNORE; #accent_segol IGNORE;IGNORE;IGNORE; #accent_shalshelet IGNORE;IGNORE;IGNORE; #accent_zaqef_qatan IGNORE;IGNORE;IGNORE; #accent_zaqef_gadol IGNORE;IGNORE;IGNORE; #accent_tipeha IGNORE;IGNORE;IGNORE; #accent_revia IGNORE;IGNORE;IGNORE; #accent_zarqa IGNORE;IGNORE;IGNORE; #accent_pashta IGNORE;IGNORE;IGNORE; #accent_yetiv IGNORE;IGNORE;IGNORE; #accent_tevir IGNORE;IGNORE;IGNORE; #accent_geresh IGNORE;IGNORE;IGNORE; #accent_geresh_muqdam IGNORE;IGNORE;IGNORE; #accent_gershayim IGNORE;IGNORE;IGNORE; #accent_qarney_para IGNORE;IGNORE;IGNORE; #accent_telisha_gedolaola IGNORE;IGNORE;IGNORE; #accent_pazer IGNORE;IGNORE;IGNORE; #accent_munah IGNORE;IGNORE;IGNORE; #accent_mahapakh IGNORE;IGNORE;IGNORE; #accent_merkha IGNORE;IGNORE;IGNORE; #accent_merkha_kefula IGNORE;IGNORE;IGNORE; #accent_darga IGNORE;IGNORE;IGNORE; #accent_qadma IGNORE;IGNORE;IGNORE; #accent_telisha_qetana IGNORE;IGNORE;IGNORE; #accent_yerah_ben_yomo IGNORE;IGNORE;IGNORE; #accent_ole IGNORE;IGNORE;IGNORE; #accent_iluy IGNORE;IGNORE;IGNORE; #accent_dehi IGNORE;IGNORE;IGNORE; #accent_zinor IGNORE;IGNORE;IGNORE; #mark_masora_circle IGNORE;IGNORE;IGNORE; #mark_upper_dot

collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol collating-symbol # # / # collatingsymbol collatingsymbol collatingsymbol collatingsymbol  'Ԍcollatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol # # # collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol  ' Ԍcollatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol collatingsymbol  H ! # Ordre des symboles internes / Order of internal symbols # # SYMB. N$ # # forme de base (bas de casse, arabe intrins/que, # h)breu intrins/que, etc. # basic form (lower case, intrinsic Arabic # intrinsic Hebrew and so on) # 7 # # / # # # voir # 2 # isol. # 3 # final # 4 # initial # 5 # medial/mdian # 6 # # 8 # 9 # 10 # 11 # 12 # # / # !y # accent madda+33: #13 !y # accent hamza+33: #14BI !y # accent hamza/waw/: #14 1 !y # accent hamza under / hamza souscrit #14 2 !y # accent under yeh / accent souscrit du ya' #14 3 !y # accent hamza/yeh barree #14 4 # # 15 # # 16 # 17 # 18 # 19 # 20 # 21 # 22 # 23 # 24 # 25 # 26 # 27 # 28 # 29 # 30 # 31 '"Ԍ# <0> # 48 <1> # 49 <2> # 50 <3> # 51 <4> # 52 <5> # 53 <6> # 54 <7> # 55 <8> # 56 <9> # 57 # # 97 # 98 # 99 # 100 # 101 # 102 # 103 # 104 # 105 # 106 # 107 # 108 # 109 # 110 # 111