unicode(7)             Miscellaneous Information Manual             unicode(7)

JMENO
       unicode - univerzalni znakova sada

POPIS
       The international standard ISO/IEC 10646 defines the Universal
       Character Set (UCS).  UCS contains all characters of all other
       character set standards.  It also guarantees "round-trip
       compatibility"; in other words, conversion tables can be built such
       that no information is lost when a string is converted from any other
       encoding to UCS and back.

       UCS obsahuje znaky potrebne pro temer vsechny zname jazyky. Mimo jine
       je to mnoho jazyku vyuzivajicich rozsireni latinky a take nasledujici
       jazyky a pisma: rectinu, azbuku, hebrejstinu, arabstinu, armenstinu,
       gruzinstinu, japonstinu, cinstinu, korejske ideogramy Han, pisma
       Hiragana, Katakana, Hangul, Devangari, Bengali, Gurmukhi, Gujarati,
       Oriya, Tamil, Telugu, Kannada, Malayalam, thajstinu, Lao, Khmer,
       Bopomofo, tibetstinu, runove pismo, etiopstinu, kanadske slabiky,
       Cherokee, mongolstinu, Ogham, barmstinu, sinhalstinu, Thaana, Yi a
       mnoho jinych.  Pracuje se na vlozeni dalsich pisem jako hieroglyfy a
       ruzne historicke indoevropske jazyky, eventualne by mohly byt zacleneny
       nektere umele jazyky, jako Tengwar, Cirth a klingonstina.  UCS navic ke
       znakum pro tyto jazyky obsahuje graficke, typograficke, matematicke a
       vedecke symboly pouzivane napr. v TeXu, PostScriptu, APL, MS-DOSu,
       MS-Windows, Macintosh, OCR, stejne tak jako v mnoha systemech pro
       zpracovani textu a publikovani, ktere neustale pribyvaji.

       The UCS standard (ISO/IEC 10646) describes a 31-bit character set
       architecture consisting of 128 24-bit groups, each divided into 256
       16-bit planes made up of 256 8-bit rows with 256 column positions, one
       for each character.  Part 1 of the standard (ISO/IEC 10646-1)  defines
       the first 65534 code positions (0x0000 to 0xfffd), which form the Basic
       Multilingual Plane (BMP), that is plane 0 in group 0.  Part 2 of the
       standard (ISO/IEC 10646-2)  adds characters to group 0 outside the BMP
       in several supplementary planes in the range 0x10000 to 0x10ffff.
       There are no plans to add characters beyond 0x10ffff to the standard,
       therefore of the entire code space, only a small fraction of group 0
       will ever be actually used in the foreseeable future.  The BMP contains
       all characters found in the commonly used other character sets.  The
       supplemental planes added by ISO/IEC 10646-2 cover only more exotic
       characters for special scientific, dictionary printing, publishing
       industry, higher-level protocol and enthusiast needs.

       Reprezentaci kazdeho UCS znaku jako dvoubajtoveho slova se rika UCS-2
       forma (jen pro znaky z BMP), zatimco UCS-4 je reprezentace kazdeho
       znaku ctyrbajtovym slovem.  Navic existuji dve formy kodovani: UTF-8
       pro zpetnou kompatibilitu s programy zpracovavajicimi ASCII a UTF-16
       pro zpetne kompatibilni zpracovani znaku mimo BMP az do 0x10ffff
       programy pouzivajicimi UCS-2.

       The UCS characters 0x0000 to 0x007f are identical to those of the
       classic US-ASCII character set and the characters in the range 0x0000
       to 0x00ff are identical to those in ISO/IEC 8859-1 (Latin-1).

   Spojovani znaku
       Nektere kody v UCS jsou prirazeny tzv.  akcentum.  Tyto jsou podobne
       neposouvajicim znakum na psacim stroji. Akcent modifikuje predchozi
       znak. Nejdulezitejsi znaky s akcenty sice maji sve vlastni kody v UCS,
       ale akcentove znaky dovoluji pridat libovolne diakriticke znamenko k
       libovolnemu znaku. Akcent vzdy nasleduje znak, ktery je modifikovan.
       Napriklad, nemecky znak Umlaut-A ("Velke A v latince s umlautem") muze
       byt reprezentovan pomoci kodu UCS 0x00c4 a nebo alternativne jako
       kombinace normalniho velkeho A, nasledovaneho akcentem umlaut: 0x0041
       0x0308.

       Akcenty jsou nezbytne napr. pro thajske pismo, pro matematicke tisky a
       pro uzivatele Mezinarodni foneticke abecedy.

   Urovne implementace
       As not all systems are expected to support advanced mechanisms like
       combining characters, ISO/IEC 10646-1 specifies the following three
       implementation levels of UCS:

       Level 1  Akcenty a znaky Hangul Jamo (specialni, komplikovane kodovani
                korejskeho pisma, kde jsou jednotlive symboly dany jako
                sekvence dvou ci tri znaku) nejsou podporovany.

       Level 2  Jako level 1, pricemz nektere kombinujici znaky jsou povoleny
                (napr. pro thajstinu, Lao, hebrejstinu, arabstinu, Devangari,
                Malayalam).

       Level 3  Vsechny znaky z UCS jsou povoleny.

       The Unicode 3.0 Standard published by the Unicode Consortium contains
       exactly the UCS Basic Multilingual Plane at implementation level 3, as
       described in ISO/IEC 10646-1:2000.  Unicode 3.1 added the supplemental
       planes of ISO/IEC 10646-2.  The Unicode standard and technical reports
       published by the Unicode Consortium provide much additional information
       on the semantics and recommended usages of various characters.  They
       provide guidelines and algorithms for editing, sorting, comparing,
       normalizing, converting, and displaying Unicode strings.

   Unicode pod Linuxem
       V GNU/Linuxu je datovy typ jazyka C wchar_t definovan jako 32 bitovy
       integer. Knihovna jazyka C jeho hodnoty vzdy interpretuje jako kodove
       hodnoty UCS (ve vsech locale), coz je konvence, kterou GNU knihovna
       jazyka C oznamuje aplikacim definovanim konstanty __STDC_ISO_10646__,
       tj. tak, jak to urcuje standard ISO C99.

       UCS/Unicode muze byt, stejne jako ASCII, pouzivano ve vstupnich a
       vystupnich proudech, terminalove komunikaci, souborech prosteho textu,
       nazvech souboru a promennych prostredi prostrednictvim ASCII
       kompatibilniho vicebajtoveho kodovani UTF-8.  K uzivani UTF-8 jako
       kodovani znaku pro vsechny aplikace je treba vybrat vhodne locale
       pomoci promennych prostredi (napr. "LANG=en_GB.UTF-8").

       Funkce nl_langinfo(CODESET) vraci nazev zvoleneho kodovani. Knihovni
       funkce jako wctomb(3)  a mbsrtowcs(3)  mohou byt pouzity ke konverzi
       interniho typu wchar_t do kodovani pouzivaneho systemem a naopak.
       Funkce wcwidth(3)  rika, kolik o pozic (0-2) postoupil kurzor po
       vytisteni znaku.

   Private Use Areas (PUA)
       In the Basic Multilingual Plane, the range 0xe000 to 0xf8ff will never
       be assigned to any characters by the standard and is reserved for
       private usage.  For the Linux community, this private area has been
       subdivided further into the range 0xe000 to 0xefff which can be used
       individually by any end-user and the Linux zone in the range 0xf000 to
       0xf8ff where extensions are coordinated among all Linux users.  The
       registry of the characters assigned to the Linux zone is maintained by
       LANANA and the registry itself is Documentation/admin-guide/unicode.rst
       in the Linux kernel sources (or Documentation/unicode.txt before Linux
       4.10).

       Two other planes are reserved for private usage, plane 15
       (Supplementary Private Use Area-A, range 0xf0000 to 0xffffd)  and plane
       16 (Supplementary Private Use Area-B, range 0x100000 to 0x10fffd).

   Literatura
       o  Information technology -- Universal Multiple-Octet Coded Character
          Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane.
          International Standard ISO/IEC 10646-1, International Organization
          for Standardization, Geneva, 2000.

          This is the official specification of UCS.  Available from
          <http://www.iso.ch/>.

       o  The Unicode Standard, Version 3.0.  The Unicode Consortium,
          Addison-Wesley, Reading, MA, 2000, ISBN 0-201-61633-5.

       o  S. Harbison, G. Steele. C: A Reference Manual. Fourth edition,
          Prentice Hall, Englewood Cliffs, 1995, ISBN 0-13-326224-3.

          Dobra referencni kniha o jazyku C. Ctvrte vydani take zahrnuje
          dodatek 1 z roku 1994 ke standardu ISO C 90, ktery pridava mnoho
          knihovnich funkci pro praci s wide-byte a multi-byte kodovanimi, ale
          jeste nezahrnuje ISO C99, ktere dale zlepsilo podporu techto
          kodovani.

       o  Technicke zpravy Unicode.
          <http://www.unicode.org/reports/>

       o  Markus Kuhn: UTF-8 and Unicode FAQ for UNIX/Linux.
          <http://www.cl.cam.ac.uk/~mgk25/unicode.html>

       o  Bruno Haible: Unicode HOWTO.
          <http://www.tldp.org/HOWTO/Unicode-HOWTO.html>

DALSI INFORMACE
       locale(1), setlocale(3), charsets(7), utf-8(7)

PREKLAD
       Preklad teto prirucky do spanelstiny vytvorili Jiri Pavlovsky
       <pavlovsk@ff.cuni.cz> a Pavel Heimlich <tropikhajma@gmail.com>

       Tento preklad je bezplatna dokumentace; Prectete si GNU General Public
       License Version 3 <https://www.gnu.org/licenses/gpl-3.0.html> nebo
       novejsi ohledne podminek autorskych prav. Neexistuje ZADNA ODPOVEDNOST.

       Pokud narazite na nejake chyby v prekladu teto prirucky, poslete e-mail
       na adresu <translation-team-cs@lists.sourceforge.net>.

Linux man-pages 6.06            28. ledna 2024                      unicode(7)