[go: nahoru, domu]

Jump to content

Cork encoding

From Wikipedia, the free encyclopedia

The Cork (also known as T1 or EC) encoding is a character encoding used for encoding glyphs in fonts.[1] It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX.[1] It contains 256 characters supporting most west- and east-European languages with the Latin alphabet.[2]

Details

[edit]

In 8-bit TeX engines the font encoding has to match the encoding of hyphenation patterns where this encoding is most commonly used.[3] In LaTeX one can switch to this encoding with \usepackage[T1]{fontenc}, while in ConTeXt MkII this is the default encoding already. In modern engines such as XeTeX and LuaTeX Unicode is fully supported and the 8-bit font encodings are obsolete.

Character set

[edit]
Cork encoding
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x `
0060
´
00B4
ˆ
02C6
˜
02DC
¨
00A8
˝
02DD
˚
02DA
ˇ
02C7
˘
02D8
¯
00AF
˙
02D9
¸
00B8
˛
02DB

201A

2039

203A
1x
201C

201D

201E
«
00AB
»
00BB

2013

2014
ZWSP[a]
200B
[b]
2080
ı[c]
0131
ȷ[c]
0237

FB00

FB01

FB02

FB03

FB04
2x  SP  ! " # $ % &
2019
( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x
2018
a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ SHY[d]
8x Ă
0102
Ą
0104
Ć
0106
Č
010C
Ď
010E
Ě
011A
Ę
0118
Ğ
011E
Ĺ
0139
Ľ
013D
Ł
0141
Ń
0143
Ň
0147
Ŋ
014A
Ő
0150
Ŕ
0154
9x Ř
0158
Ś
015A
Š
0160
Ş
015E
Ť
0164
Ţ
0162
Ű
0170
Ů
016E
Ÿ
0178
Ź
0179
Ž
017D
Ż
017B
IJ
0132
İ
0130
đ
0111
§
00A7
Ax ă
0103
ą
0105
ć
0107
č
010D
ď
010F
ě
011B
ę
0119
ğ
011F
ĺ
013A
ľ
013E
ł
0142
ń
0144
ň
0148
ŋ
014B
ő
0151
ŕ
0155
Bx ř
0159
ś
015B
š
0161
ş
015F
ť
0165
ţ
0163
ű
0171
ů
016F
ÿ
00FF
ź
017A
ž
017E
ż
017C
ij
0133
¡
00A1
¿
00BF
£
00A3
Cx À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Dx Ð[e] Ñ Ò Ó Ô Õ Ö Œ
0152
Ø Ù Ú Û Ü Ý Þ SS[f]
1E9E
Ex à á â ã ä å æ ç è é ê ë ì í î ï
Fx ð ñ ò ó ô õ ö œ
0153
ø ù ú û ü ý þ ß
00DF

Notes

[edit]
  • Hexadecimal values under the characters in the table are the Unicode character codes.
  • The first 12 characters are often used as combining characters.
  1. ^ 0x17 is dubbed a “compound word mark” (CWM) in the Cork encoding, and is an innovation of this standard. It is an invisible character that separates compounds in a complex word, for instance in German, in order to disallow esthetic ligatures at compound boundaries.[2] It is mapped to the Unicode “zero-width space” (ZWSP, U+200B), defined at about the same time, whose purpose is similar, if not identical.
  2. ^ 0x18 is a “small o”, used to compose or (or arbitrary smaller quantities) out of percent sign (%).[2]
  3. ^ a b Dotless i and dotless j may be used to compose accented variants like i with macron (ī).
  4. ^ 0x7F is the hyphenation character, not really a soft hyphen (SHY) as defined by Unicode.
  5. ^ 0xD0 is used both as Eth (Ð, U+00D0) and as D with stroke (Đ, U+0110) which might be a problem at some occasions (like copying text from PDF, hyphenation, ...)
  6. ^ 0xDF contains SS (two letters S). It allows TeX to automatically convert the German lowercase ß into the uppercase form.

Supported languages

[edit]

The encoding supports most European languages written in Latin alphabet. Notable exceptions are:

Languages with slightly suboptimal support include:

References

[edit]
  1. ^ a b Petrlik, Lukas (1996-06-19). "The Czech and Slovak Character Encoding Mess Explained". cs-encodings-faq. 1.10. Archived from the original on 2016-06-21. Retrieved 2016-06-21.
  2. ^ a b c Ferguson, Michael (1990), "Report on Multilingual Activities" (PDF), TUGboat, 11 (4): 514–516
  3. ^ TeX hyphenation patterns
[edit]