unicode(7) Miscellaneous Information Manual unicode(7) unicode - The international standard ISO/IEC 10646 defines the Universal Character Set (UCS). UCS contains all characters of all other character set standards. It also guarantees "round-trip compatibility"; in other words, conversion tables can be built such that no information is lost when a string is converted from any other encoding to UCS and back. UCS , . , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,, . . - , , , . UCS , , , , TeX, Postscript, APL, MS-DOS, MS-Windows, Macintosh, OCR . The UCS standard (ISO/IEC 10646) describes a 31-bit character set architecture consisting of 128 24-bit groups, each divided into 256 16-bit planes made up of 256 8-bit rows with 256 column positions, one for each character. Part 1 of the standard (ISO/IEC 10646-1) defines the first 65534 code positions (0x0000 to 0xfffd), which form the Basic Multilingual Plane (BMP), that is plane 0 in group 0. Part 2 of the standard (ISO/IEC 10646-2) adds characters to group 0 outside the BMP in several supplementary planes in the range 0x10000 to 0x10ffff. There are no plans to add characters beyond 0x10ffff to the standard, therefore of the entire code space, only a small fraction of group 0 will ever be actually used in the foreseeable future. The BMP contains all characters found in the commonly used other character sets. The supplemental planes added by ISO/IEC 10646-2 cover only more exotic characters for special scientific, dictionary printing, publishing industry, higher-level protocol and enthusiast needs. UCS - UCS-2 ( BMP), UCS-4 4- . 2 : UTF-8 -- , ASCII, UTF-16 -- , UCS-2, BMP 0x10ffff . The UCS characters 0x0000 to 0x007f are identical to those of the classic US-ASCII character set and the characters in the range 0x0000 to 0x00ff are identical to those in ISO/IEC 8859-1 (Latin-1). UCS . , . . , , UCS, . , . , <<->> ( ) UCS 0x00c4, << >> << : 0x0041 0x0308. , (International Phonetic Alphabet). As not all systems are expected to support advanced mechanisms like combining characters, ISO/IEC 10646-1 specifies the following three implementation levels of UCS: 1 Hangul Jamo ( , ) . 2 1 , (: , , , , , ). 3 UCS. The Unicode 3.0 Standard published by the Unicode Consortium contains exactly the UCS Basic Multilingual Plane at implementation level 3, as described in ISO/IEC 10646-1:2000. Unicode 3.1 added the supplemental planes of ISO/IEC 10646-2. The Unicode standard and technical reports published by the Unicode Consortium provide much additional information on the semantics and recommended usages of various characters. They provide guidelines and algorithms for editing, sorting, comparing, normalizing, converting, and displaying Unicode strings. Linux GNU/Linux C wchar_t 32- . UCS ( ); , ISO C99, GNU C __STDC_ISO_10646__. UCS/Unicode ASCII /, , , UTF-8, ASCII. UTF-8 (, <>). The nl_langinfo(CODESET) function returns the name of the selected encoding. Library functions such as wctomb(3) and mbsrtowcs(3) can be used to transform the internal wchar_t characters and strings into the system character encoding and back and wcwidth(3) tells how many positions (0-2) the cursor is advanced by the output of a character. (Private Use Areas, PUA) 0xe000 0xf8ff , . Linux : 0xe000 0xefff ; 0xf000 0xf8ff ( Linux) , Linux. Linux LANANA, Documentation/admin-guide/unicode.rst Linux ( Documentation/unicode.txt Linux 4.10). , 15 ( -A, 0xf0000 0xffffd) 16 ( -B, 0x100000 0x10fffd). o Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane. International Standard ISO/IEC 10646-1, International Organization for Standardization, Geneva, 2000. This is the official specification of UCS. Available from . o The Unicode Standard, Version 3.0. The Unicode Consortium, Addison-Wesley, Reading, MA, 2000, ISBN 0-201-61633-5. o S. Harbison, G. Steele. C: A Reference Manual. Fourth edition, Prentice Hall, Englewood Cliffs, 1995, ISBN 0-13-326224-3. . 1, ISO C90 1994 . , ISO C99, . o Unicode Technical Reports. o Markus Kuhn: UTF-8 and Unicode FAQ for UNIX/Linux. o Bruno Haible: Unicode HOWTO. . locale(1), setlocale(3), charsets(7), utf-8(7) Azamat Hackimov , Dmitriy Ovchinnikov , Dmitry Bolkhovskikh , Katrin Kutepova , Yuri Kozlov ; GNU 3 , . . , , . Linux man-pages 6.06 28 2024 . unicode(7)