CJK characters: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎See also: updated
Not part of traditional East Asia.
Tag: Reverted
Line 1: Line 1:
{{Short description|Logographs in shared East Asian written tradition}}
{{Short description|Logographs in shared East Asian written tradition}}
{{About||help with CJK character display|Help:Multilingual support (East Asian)|selfref=true}}
{{About||help with CJK character display|Help:Multilingual support (East Asian)|selfref=true}}
[[File:The old man is 72 years old final.png|thumb|342x342px|Translation of "That old man is 72 years old" in [[Vietnamese language|Vietnamese]], [[Cantonese]], [[Mandarin Chinese|Mandarin]] (in [[Simplified Chinese characters|simplified]] and [[Traditional Chinese characters|traditional characters]]), [[Japanese language|Japanese]], and [[Korean language|Korean]].]]
[[File:The old man is 72 years old final.png|thumb|342x342px|Translation of "That old man is 72 years old" in [[Cantonese]], [[Mandarin Chinese|Mandarin]] (in [[Simplified Chinese characters|simplified]] and [[Traditional Chinese characters|traditional characters]]), [[Japanese language|Japanese]], and [[Korean language|Korean]].]]
In [[internationalization and localization|internationalization]], '''CJK characters''' is a collective term for the [[Chinese language|Chinese]], [[Japanese language|Japanese]], and [[Korean language]]s, all of which include [[Chinese characters]] and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the CJK characters often include {{transl|zh|Hànzì}} in Chinese, {{transl|ja|[[Kanji]]}} and {{transl|ja|[[Kana]]}} in Japanese, and {{transl|ko|[[Hanja]]}} and {{transl|ko|[[Hangul]]}} in Korean. [[Vietnamese language|Vietnamese]] can be included, making the abbreviation '''CJKV''', as Vietnamese historically used Chinese characters known as {{lang|vi|[[chữ Hán]]}} and {{lang|vi|[[chữ Nôm]]}} in Vietnamese ({{lang|vi|[[Hán-Nôm]]}} altogether).
In [[internationalization and localization|internationalization]], '''CJK characters''' is a collective term for the [[Chinese language|Chinese]], [[Japanese language|Japanese]], and [[Korean language]]s, all of which include [[Chinese characters]] and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the CJK characters often include {{transl|zh|Hànzì}} in Chinese, {{transl|ja|[[Kanji]]}} and {{transl|ja|[[Kana]]}} in Japanese, and {{transl|ko|[[Hanja]]}} and {{transl|ko|[[Hangul]]}} in Korean.


== Character repertoire ==
== Character repertoire ==
Line 11: Line 11:
The [[Sinology|sinologist]] Carl Leban (1971) produced an early survey of CJK encoding systems.
The [[Sinology|sinologist]] Carl Leban (1971) produced an early survey of CJK encoding systems.


Until the early 20th century, [[Classical Chinese]] was the written language of government and scholarship in Vietnam. Popular literature in [[Vietnamese language|Vietnamese]] was written in the {{lang|vi|[[chữ Nôm]]}} script, consisting of Chinese characters with many characters created locally. From 1920s onwards, the script since then used for recording literature has been the Latin [[chữ Quốc ngữ]].{{sfnp|Coulmas|1991|pp=113–115}}{{sfnp|DeFrancis|1977}}
Until the early 20th century, [[Classical Chinese]] was formerly the written language of government and scholarship in Vietnam. Popular literature in [[Vietnamese language|Vietnamese]] was written in the {{lang|vi|[[chữ Nôm]]}} script, consisting of Chinese characters with many characters created locally. From 1920s onwards, the script since then used for recording literature has been the Latin [[chữ Quốc ngữ]].{{sfnp|Coulmas|1991|pp=113–115}}{{sfnp|DeFrancis|1977}}


== Encoding ==
== Encoding ==
Line 47: Line 47:
* [[Chinese character description languages]]
* [[Chinese character description languages]]
* [[Chinese character encoding]]
* [[Chinese character encoding]]
* [[Chinese character strokes]]
* [[Chinese input methods for computers]]
* [[Chinese input methods for computers]]
* [[CJK Compatibility Ideographs]]
* [[CJK Compatibility Ideographs]]
* [[Chinese character strokes]]
* [[CJK Unified Ideographs]]
* [[CJK Unified Ideographs]]
* [[Complex Text Layout languages]] (CTL)
* [[Complex Text Layout languages]] (CTL)
Line 58: Line 58:
* [[Sinoxenic]]
* [[Sinoxenic]]
* [[Variable-width encoding]]
* [[Variable-width encoding]]
* [[Vietnamese language and computers]]


== References ==
== References ==
Line 84: Line 83:
{{CJK ideographs in Unicode}}
{{CJK ideographs in Unicode}}


[[Category:Chinese-language computing]]
[[Category:Encodings of Asian languages]]
[[Category:Encodings of Asian languages]]
[[Category:Languages of East Asia]]
[[Category:Languages of East Asia]]
[[Category:Natural language and computing]]
[[Category:Chinese-language computing]]
[[Category:Japanese-language computing]]
[[Category:Japanese-language computing]]
[[Category:Korean-language computing]]
[[Category:Korean-language computing]]
[[Category:Natural language and computing]]
[[Category:Writing systems using Chinese characters]]
[[Category:Writing systems using Chinese characters]]
[[ja:CJKV]]
[[ja:CJK]]

Revision as of 19:23, 4 May 2024

Translation of "That old man is 72 years old" in Cantonese, Mandarin (in simplified and traditional characters), Japanese, and Korean.

In internationalization, CJK characters is a collective term for the Chinese, Japanese, and Korean languages, all of which include Chinese characters and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the CJK characters often include Hànzì in Chinese, Kanji and Kana in Japanese, and Hanja and Hangul in Korean.

Character repertoire

Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. Over 3,000 characters are required for general literacy, with up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea is increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. Even today, however, South Korean students are taught 1,800 characters.

Other scripts used for these languages, such as bopomofo and the Latin-based pinyin for Chinese, hiragana and katakana for Japanese, and hangul for Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of the target languages.

The sinologist Carl Leban (1971) produced an early survey of CJK encoding systems.

Until the early 20th century, Classical Chinese was formerly the written language of government and scholarship in Vietnam. Popular literature in Vietnamese was written in the chữ Nôm script, consisting of Chinese characters with many characters created locally. From 1920s onwards, the script since then used for recording literature has been the Latin chữ Quốc ngữ.[1][2]

Encoding

The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit character encodings, requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as those from Unicode up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 70,000 Han characters—and the requirement by the Chinese government that software in China support the GB 18030 character set.

Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible. Unicode has attempted, with some controversy, to unify the character sets in a process known as Han unification.

CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as pinyin, bopomofo, hiragana, katakana and hangul.

CJK character encodings include:

The CJK character sets take up the bulk of the assigned Unicode code space. There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of the Han unification process used to map multiple Chinese and Japanese character sets into a single set of unified characters.[citation needed]

All three languages can be written both left-to-right and top-to-bottom (right-to-left and top-to-bottom in ancient documents), but are usually considered left-to-right scripts when discussing encoding issues.

Legal status

Libraries cooperated on encoding standards for JACKPHY characters in the early 1980s. According to Ken Lunde, the abbreviation "CJK" was a registered trademark of Research Libraries Group[3] (which merged with OCLC in 2006). The trademark owned by OCLC between 1987 and 2009 has now expired.[4]

See also

References

Works cited

  • Coulmas, Florian (1991). The writing systems of the world. Blackwell. ISBN 978-0-631-18028-9.
  • DeFrancis, John (1977). Colonialism and language policy in Viet Nam. The Hague: Mouton. ISBN 978-90-279-7643-7.

Sources

External links