UTF-8(7) Linux Programmer's Manual UTF-8(7) NAME UTF-8 - ASCII Unicode The Unicode 16 Unicode UCS-2) 16 `\0'`/' C ASCII UNIX 16 UCS-2 Unicode ISO 10646 Universal Character Set (UCS), Unicode 31 32 UCS-4 UCS-4 UTF-8 Unicode UCS UTF-8 UNIX Unicode UTF-8 * UCS 0x00000000 0x0000007f US-ASCII 0x00 0x7f ASCII 7 ASCII ASCII UTF-8. * 0x7f UCS 0x80 0fd ASCII `\0'`\[u2019] * UCS-4 * 2^32 UCS UTF-8 * 0xfe 0xff UTF-8 * ASCII UCS 0xc0 0xfd 0x80 0xbf * UTF-8 UCS 6 Unicode 3 Linux 16 Unicode UCS Linux UTF-8 UCS 0x00000000 - 0x0000007F: 0xxxxxxx 0x00000080 - 0x000007FF: 110xxxxx 10xxxxxx 0x00000800 - 0x0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx 0x00010000 - 0x001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 0x00200000 - 0x03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 0x04000000 - 0x7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx xxx Unicode 0xa9 = 1010 1001 () UTF-8 11000010 10101001 = 0xc2 0xa9 0x2260 = 0010 0010 0110 0000 ("") 11100010 10001001 10100000 = 0xe2 0x89 0xa0 ISO 10646, Unicode 1.1, XPG4, Plan 9. Markus Kuhn unicode(7) [] billpan [] 2000/11/09 linuxman: http://cmpp.linuxforum.net man man https://github.com/man-pages-zh/manpages- zh Linux 1995-11-26 UTF-8(7)