[go: nahoru, domu]

Jump to content

European ordering rules: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Tags: Mobile edit Mobile web edit
delete Venn diagram: this is irrelevant to this article and misleading, because it seems to imply, for example, that Latin P, Greek Ρ, and Cyrillic Р are considered identical by the European ordering rules, which is not true
 
(6 intermediate revisions by 6 users not shown)
Line 1: Line 1:
The '''European ordering rules''' (EOR / EN 13710) define an ordering for strings written in languages that are written with the [[Latin alphabet|Latin]], [[Greek alphabet|Greek]] and [[Cyrillic script|Cyrillic]] [[alphabet]]s. The standard covers languages used by the [[European Union]], the [[European Free Trade Association]], and parts of the [[former Soviet Union]]. It is a tailoring of the ''Common Tailorable Template'' of [[ISO/IEC 14651]].<ref name="EOR">{{cite web|url=http://www.open-std.org/cen/tc304/EOR/eorhome.html |title=ENV 13710 – a "European Pre-Standard": European ordering rules |accessdate=2020-11-25}}
[[File:Venn diagram showing Maximum Greek, Latin and Cyrillic letters.svg|alt=Venn diagram showing Maximum Greek, Latin and Cyrillic letters|thumb|350x350px|Venn diagram showing Maximum Greek, Latin and Cyrillic letters]]
The '''European ordering rules''' (EOR / EN 13710), define an ordering for strings written in languages that are written with the [[Latin alphabet|Latin]], [[Greek alphabet|Greek]] and [[Cyrillic script|Cyrillic]] [[alphabet]]s. The standard covers languages used by the [[European Union]], the [[European Free Trade Association]], and parts of the [[former Soviet Union]]. It is a tailoring of the ''Common Tailorable Template'' of [[ISO/IEC 14651]].<ref name="EOR">{{cite web|url=http://www.open-std.org/cen/tc304/EOR/eorhome.html |title=ENV 13710 – a "European Pre-Standard": European ordering rules |accessdate=2020-11-25}}
</ref> EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring.
</ref> EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring.


Line 7: Line 6:
Just as for [[ISO/IEC 14651]], upon which EOR is based, EOR has 4 levels of weights.
Just as for [[ISO/IEC 14651]], upon which EOR is based, EOR has 4 levels of weights.


=== Level 1 ===
'''Level 1''' sorts the letters. The following [[Latin script|Latin]] letters are concerned by this level, in order:
The first level sorts the letters. The following [[Latin script|Latin]] letters are concerned by this level, in order:


:a b c d ð e f g h i j k l m n o p q r s t u v w x y z þ
:a b c d ð e f g h i j k l m n o p q r s t u v w x y z þ


The [[Greek alphabet]] has the following order:
The [[Greek alphabet]] has the following order:


:α β γ δ ε Ϝ Ϛ ζ η θ ι κ λ μ ν ξ ο π Ϟ ρ σ τ υ φ χ ψ ω Ϡ
:α β γ δ ε [[ϝ]] ϛ ζ η θ ι κ λ μ ν ξ ο π [[ϟ]] ρ σ τ υ φ χ ψ ω [[ϡ]]


[[Cyrillic script]] has the following order:
[[Cyrillic script]] has the following order:


ӑ ӓ ә ӛ ӕ б в г ғ ҕ ґ д ђ ҙ е ӗ ё є є̈ ж ӝ җ ӂ з ӟ з́ ѕ ӡ и ӥ і ї й ј к қ ӄ ҡ ҟ ҝ л љ м н ң ӊ ҥ њ ӈ о ӧ ŏ ө ӫ ө̆ ѡ ꙍ ҩ п ҧ р с ҫ с́ т ҭ ћ у ў ӱ ӳ ү ұ ф х ҳ ѯ һ ц ҵ ч ӵ ҷ ӌ ҹ ҽ ҿ џ ш щ ъ ы ӹ ь ѣ э ю ю̆ я я̆ Ӏ ѫ ѭ ѧ ѩ ѱ ѳ ѵ ѷ ҁ ꙟ
:а б в г ґ д ђ ѓ е ё є ж з з́ ѕ и і ї й ј к л љ м н њ о п р с с́ т ћ ќ у ў ф х ц ч џ ш щ ъ ы ь ѣ э ю я


The order for the three alphabets is:
The order for the three alphabets is:
Line 24: Line 24:
# Cyrillic alphabet
# Cyrillic alphabet


The [[Georgian alphabet|Georgian]] and [[Armenian alphabet]]s have not been included in ENV 13710. However, they are covered in CR 14400:2001 "European ordering rules – Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts". All scripts encoded in ISO/IEC 10646 and Unicode are covered by [[ISO/IEC 14651]] (and its datafile CTT) as well as [[Unicode collation algorithm]] (UCA and the associated DUCET), both of which are available at no charge.
The [[Georgian alphabet|Georgian]] and [[Armenian alphabet]]s had not been included in ENV 13710:2000. However, they were covered in CR 14400:2001 "European ordering rules – Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts". They have both been incorporated in and replaced by EN 13710:2011.<ref>[https://standards.cencenelec.eu/dyn/www/f?p=CEN:110:0::::FSP_PROJECT,FSP_ORG_ID:32015,6285&cs=11055CFAA447E3526543221522CAD9499 CEN/CENELEC: EN 13710:2011-09 ''European Ordering Rules - Ordering of characters from Latin, Greek, Cyrillic, Georgian and Armenian scripts'']</ref>


'''Level 2''' is where different additions, such as [[diacritic]]s and variations, to the letters are ordered. Letters with diacritical marks (like {{angbr|å}}, {{angbr|ä}}, {{angbr|ö}}, and {{angbr|ø}}) are ordered as variants of the base letter. {{angbr|æ}}, {{angbr|œ}}, {{angbr|ij}} and {{angbr|ŋ}} are ordered as modifications of {{angbr|ae}}, {{angbr|oe}}, {{angbr|ij}} and {{angbr|n}} respectively, similarly for similar cases.
All scripts encoded in ISO/IEC 10646 (Unicode) are covered by [[ISO/IEC 14651]] (and its datafile CTT) as well as [[Unicode collation algorithm]] (UCA and the associated DUCET), both of which are available at no charge.
=== Level 2 ===
The second level is where different additions, such as [[diacritic]]s and variations, to the letters are ordered. Letters with diacritical marks (like {{angbr|à}}, {{angbr|î}}, {{angbr|õ}}, and {{angbr|ü}}) are ordered as variants of the base letter. {{angbr|æ}}, {{angbr|œ}}, {{angbr|ij}} and {{angbr|ŋ}} are ordered as modifications of {{angbr|ae}}, {{angbr|oe}}, {{angbr|ij}} and {{angbr|n}} respectively, similarly for similar cases.


Level 2 defines the following order of diacritics and other modifications:
Level 2 defines the following order of diacritics and other modifications:
Line 35: Line 38:
# [[Caron]] (š)
# [[Caron]] (š)
# [[ring (diacritic)|Ring]] (å)
# [[ring (diacritic)|Ring]] (å)
# [[Diaeresis (diacritic)]] (ä)
# [[Diaeresis (diacritic)|Diaeresis]] (ä)
# [[Double acute accent]] (ő)
# [[Double acute accent]] (ő)
# [[Tilde]] (ã)
# [[Tilde]] (ã)
# [[dot (diacritic)|Dot]] (ż)
# [[dot (diacritic)|Dot]] (ż)
# [[Cedilla]] (ş)
# [[Cedilla]] (ç)
# [[Ogonek]] (ą)
# [[Ogonek]] (ą)
# [[Macron (diacritic)|Macron]] (ā)
# [[Macron (diacritic)|Macron]] (ā)
Line 45: Line 48:
# Modified letter(s) (æ)
# Modified letter(s) (æ)


=== Level 3 ===
'''Level 3''' makes the distinction between Capital and small letters, as in "Polish" and "polish".
The third level makes the distinction between Capital and small letters, as in "Polish" and "polish".


=== Level 4 ===
'''Level 4''' concerns [[punctuation]] and [[whitespace character]]s. This level makes the distinction between "MacDonald" and "Mac Donald", "its" and "it's".
The fourth level concerns [[punctuation]] and [[whitespace character]]s. This level makes the distinction between "MacDonald" and "Mac Donald", "its" and "it's".


=== Level 5 ===
An optional, and usually omitted, fifth level can distinguish typographical differences, including whether the text is ''italic'', normal or '''bold'''.
An optional, and usually omitted, fifth level can distinguish typographical differences, including whether the text is ''italic'', normal or '''bold'''.



Latest revision as of 22:50, 3 April 2024

The European ordering rules (EOR / EN 13710) define an ordering for strings written in languages that are written with the Latin, Greek and Cyrillic alphabets. The standard covers languages used by the European Union, the European Free Trade Association, and parts of the former Soviet Union. It is a tailoring of the Common Tailorable Template of ISO/IEC 14651.[1] EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring.

Method

[edit]

Just as for ISO/IEC 14651, upon which EOR is based, EOR has 4 levels of weights.

Level 1

[edit]

The first level sorts the letters. The following Latin letters are concerned by this level, in order:

a b c d ð e f g h i j k l m n o p q r s t u v w x y z þ

The Greek alphabet has the following order:

α β γ δ ε ϝ ϛ ζ η θ ι κ λ μ ν ξ ο π ϟ ρ σ τ υ φ χ ψ ω ϡ

Cyrillic script has the following order:

а б в г ґ д ђ ѓ е ё є ж з з́ ѕ и і ї й ј к л љ м н њ о п р с с́ т ћ ќ у ў ф х ц ч џ ш щ ъ ы ь ѣ э ю я

The order for the three alphabets is:

  1. Latin alphabet
  2. Greek alphabet
  3. Cyrillic alphabet

The Georgian and Armenian alphabets had not been included in ENV 13710:2000. However, they were covered in CR 14400:2001 "European ordering rules – Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts". They have both been incorporated in and replaced by EN 13710:2011.[2]

All scripts encoded in ISO/IEC 10646 (Unicode) are covered by ISO/IEC 14651 (and its datafile CTT) as well as Unicode collation algorithm (UCA and the associated DUCET), both of which are available at no charge.

Level 2

[edit]

The second level is where different additions, such as diacritics and variations, to the letters are ordered. Letters with diacritical marks (like ⟨à⟩, ⟨î⟩, ⟨õ⟩, and ⟨ü⟩) are ordered as variants of the base letter. ⟨æ⟩, ⟨œ⟩, ⟨ij⟩ and ⟨ŋ⟩ are ordered as modifications of ⟨ae⟩, ⟨oe⟩, ⟨ij⟩ and ⟨n⟩ respectively, similarly for similar cases.

Level 2 defines the following order of diacritics and other modifications:

  1. Acute accent (á)
  2. Grave accent (à)
  3. Breve (ă)
  4. Circumflex (â)
  5. Caron (š)
  6. Ring (å)
  7. Diaeresis (ä)
  8. Double acute accent (ő)
  9. Tilde (ã)
  10. Dot (ż)
  11. Cedilla (ç)
  12. Ogonek (ą)
  13. Macron (ā)
  14. With stroke through (ø)
  15. Modified letter(s) (æ)

Level 3

[edit]

The third level makes the distinction between Capital and small letters, as in "Polish" and "polish".

Level 4

[edit]

The fourth level concerns punctuation and whitespace characters. This level makes the distinction between "MacDonald" and "Mac Donald", "its" and "it's".

Level 5

[edit]

An optional, and usually omitted, fifth level can distinguish typographical differences, including whether the text is italic, normal or bold.

See also

[edit]

References

[edit]
  1. ^ "ENV 13710 – a "European Pre-Standard": European ordering rules". Retrieved 2020-11-25.
  2. ^ CEN/CENELEC: EN 13710:2011-09 European Ordering Rules - Ordering of characters from Latin, Greek, Cyrillic, Georgian and Armenian scripts
Notes
  • Hansson, Roger; Lindgren, Carl Göran; Ljung, Heléne; Lundén, Thomas. Språk och skrift i Europa. SNS Förlag. (2004) ISBN 91-7150-936-4
  • Küster, Marc Wilhelm: Geordnetes Weltbild. Die Tradition des alphabetischen Sortierens von der Keilschrift bis zur EDV. Eine Kulturgeschichte. Niemeyer (2006) ISBN 3-484-10899-1. Written by the editor of ENV 13710, it discusses in chapter 17.4 the genesis and the contents of the EOR. Cf. also [1], in particular also [2]
[edit]