plan9port

fork of plan9port with libvec, libstr and libsdb
Log | Files | Refs | README | LICENSE

rune.3 (3023B)


      1 .TH RUNE 3
      2 .SH NAME
      3 runetochar, chartorune, runelen, runenlen, fullrune, utfecpy, utflen, utfnlen, utfrune, utfrrune, utfutf \- rune/UTF conversion
      4 .SH SYNOPSIS
      5 .ta \w'\fLchar*xx'u
      6 .B #include <utf.h>
      7 .PP
      8 .B
      9 int	runetochar(char *s, Rune *r)
     10 .PP
     11 .B
     12 int	chartorune(Rune *r, char *s)
     13 .PP
     14 .B
     15 int	runelen(long r)
     16 .PP
     17 .B
     18 int	runenlen(Rune *r, int n)
     19 .PP
     20 .B
     21 int	fullrune(char *s, int n)
     22 .PP
     23 .B
     24 char*	utfecpy(char *s1, char *es1, char *s2)
     25 .PP
     26 .B
     27 int	utflen(char *s)
     28 .PP
     29 .B
     30 int	utfnlen(char *s, long n)
     31 .PP
     32 .B
     33 char*	utfrune(char *s, long c)
     34 .PP
     35 .B
     36 char*	utfrrune(char *s, long c)
     37 .PP
     38 .B
     39 char*	utfutf(char *s1, char *s2)
     40 .SH DESCRIPTION
     41 These routines convert to and from a
     42 .SM UTF
     43 byte stream and runes.
     44 .PP
     45 .I Runetochar
     46 copies one rune at
     47 .I r
     48 to at most
     49 .B UTFmax
     50 bytes starting at
     51 .I s
     52 and returns the number of bytes copied.
     53 .BR UTFmax ,
     54 defined as
     55 .B 3
     56 in
     57 .BR <libc.h> ,
     58 is the maximum number of bytes required to represent a rune.
     59 .PP
     60 .I Chartorune
     61 copies at most
     62 .B UTFmax
     63 bytes starting at
     64 .I s
     65 to one rune at
     66 .I r
     67 and returns the number of bytes copied.
     68 If the input is not exactly in
     69 .SM UTF
     70 format,
     71 .I chartorune
     72 will convert to 0x80 and return 1.
     73 .PP
     74 .I Runelen
     75 returns the number of bytes
     76 required to convert
     77 .I r
     78 into
     79 .SM UTF.
     80 .PP
     81 .I Runenlen
     82 returns the number of bytes
     83 required to convert the
     84 .I n
     85 runes pointed to by
     86 .I r
     87 into
     88 .SM UTF.
     89 .PP
     90 .I Fullrune
     91 returns 1 if the string
     92 .I s
     93 of length
     94 .I n
     95 is long enough to be decoded by
     96 .I chartorune
     97 and 0 otherwise.
     98 This does not guarantee that the string
     99 contains a legal
    100 .SM UTF
    101 encoding.
    102 This routine is used by programs that
    103 obtain input a byte at
    104 a time and need to know when a full rune
    105 has arrived.
    106 .PP
    107 The following routines are analogous to the
    108 corresponding string routines with
    109 .B utf
    110 substituted for
    111 .B str
    112 and
    113 .B rune
    114 substituted for
    115 .BR chr .
    116 .PP
    117 .I Utfecpy
    118 copies UTF sequences until a null sequence has been copied, but writes no 
    119 sequences beyond
    120 .IR es1 .
    121 If any sequences are copied,
    122 .I s1
    123 is terminated by a null sequence, and a pointer to that sequence is returned.
    124 Otherwise, the original
    125 .I s1
    126 is returned.
    127 .PP
    128 .I Utflen
    129 returns the number of runes that
    130 are represented by the
    131 .SM UTF
    132 string
    133 .IR s .
    134 .PP
    135 .I Utfnlen
    136 returns the number of complete runes that
    137 are represented by the first
    138 .I n
    139 bytes of
    140 .SM UTF
    141 string
    142 .IR s .
    143 If the last few bytes of the string contain an incompletely coded rune,
    144 .I utfnlen
    145 will not count them; in this way, it differs from
    146 .IR utflen ,
    147 which includes every byte of the string.
    148 .PP
    149 .I Utfrune
    150 .RI ( utfrrune )
    151 returns a pointer to the first (last)
    152 occurrence of rune
    153 .I c
    154 in the
    155 .SM UTF
    156 string
    157 .IR s ,
    158 or 0 if
    159 .I c
    160 does not occur in the string.
    161 The NUL byte terminating a string is considered to
    162 be part of the string
    163 .IR s .
    164 .PP
    165 .I Utfutf
    166 returns a pointer to the first occurrence of
    167 the
    168 .SM UTF
    169 string
    170 .I s2
    171 as a
    172 .SM UTF
    173 substring of
    174 .IR s1 ,
    175 or 0 if there is none.
    176 If
    177 .I s2
    178 is the null string,
    179 .I utfutf
    180 returns
    181 .IR s1 .
    182 .SH SOURCE
    183 .B https://9fans.github.io/plan9port/unix
    184 .SH SEE ALSO
    185 .IR utf (7),
    186 .IR tcs (1)