idna

Name

idna -- Internationalizing Domain Names in Applications.

Synopsis



#define     IDNA_ACE_PREFIX
int         idna_to_ascii                   (unsigned long *in,
                                             size_t inlen,
                                             char *out,
                                             int allowunassigned,
                                             int usestd3asciirules);
int         idna_to_unicode                 (unsigned long *in,
                                             size_t inlen,
                                             unsigned long *out,
                                             size_t *outlen,
                                             int allowunassigned,
                                             int usestd3asciirules);
int         idna_ucs4_to_ace                (unsigned long *input,
                                             char **output);
int         idna_utf8_to_ace                (const char *input,
                                             char **output);
int         idna_locale_to_ace              (const char *input,
                                             char **output);
int         idna_ucs4ace_to_ucs4            (unsigned long *input,
                                             unsigned long **output);
int         idna_utf8ace_to_ucs4            (const char *input,
                                             unsigned long **output);
int         idna_utf8ace_to_utf8            (const char *input,
                                             char **output);
int         idna_utf8ace_to_locale          (const char *input,
                                             char **output);
int         idna_localeace_to_locale        (const char *input,
                                             char **output);

Description

Until now, there has been no standard method for domain names to use characters outside the ASCII repertoire. The IDNA document defines internationalized domain names (IDNs) and a mechanism called IDNA for handling them in a standard fashion. IDNs use characters drawn from a large repertoire (Unicode), but IDNA allows the non-ASCII characters to be represented using only the ASCII characters already allowed in so-called host names today. This backward-compatible representation is required in existing protocols like DNS, so that IDNs can be introduced with no changes to the existing infrastructure. IDNA is only meant for processing domain names, not free text.

Details

IDNA_ACE_PREFIX

#define IDNA_ACE_PREFIX "iesg--"


idna_to_ascii ()

int         idna_to_ascii                   (unsigned long *in,
                                             size_t inlen,
                                             char *out,
                                             int allowunassigned,
                                             int usestd3asciirules);

The ToASCII operation takes a sequence of Unicode code points that make up one label and transforms it into a sequence of code points in the ASCII range (0..7F). If ToASCII succeeds, the original sequence and the resulting sequence are equivalent labels.

It is important to note that the ToASCII operation can fail. ToASCII fails if any step of it fails. If any step of the ToASCII operation fails on any label in a domain name, that domain name MUST NOT be used as an internationalized domain name. The method for deadling with this failure is application-specific.

The inputs to ToASCII are a sequence of code points, the AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of ToASCII is either a sequence of ASCII code points or a failure condition.

ToASCII never alters a sequence of code points that are all in the ASCII range to begin with (although it could fail). Applying the ToASCII operation multiple times has exactly the same effect as applying it just once.


idna_to_unicode ()

int         idna_to_unicode                 (unsigned long *in,
                                             size_t inlen,
                                             unsigned long *out,
                                             size_t *outlen,
                                             int allowunassigned,
                                             int usestd3asciirules);

The ToUnicode operation takes a sequence of Unicode code points that make up one label and returns a sequence of Unicode code points. If the input sequence is a label in ACE form, then the result is an equivalent internationalized label that is not in ACE form, otherwise the original sequence is returned unaltered.

ToUnicode never fails. If any step fails, then the original input sequence is returned immediately in that step.

The ToUnicode output never contains more code points than its input. Note that the number of octets needed to represent a sequence of code points depends on the particular character encoding used.

The inputs to ToUnicode are a sequence of code points, the AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of ToUnicode is always a sequence of Unicode code points.


idna_ucs4_to_ace ()

int         idna_ucs4_to_ace                (unsigned long *input,
                                             char **output);

Convert UCS-4 domain name to ASCII string. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller.


idna_utf8_to_ace ()

int         idna_utf8_to_ace                (const char *input,
                                             char **output);

Convert UTF-8 domain name to ASCII string. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller.


idna_locale_to_ace ()

int         idna_locale_to_ace              (const char *input,
                                             char **output);

Convert domain name in the locale's encoding to ASCII string. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller.


idna_ucs4ace_to_ucs4 ()

int         idna_ucs4ace_to_ucs4            (unsigned long *input,
                                             unsigned long **output);

Convert possibly ACE encoded domain name in UCS-4 format into a UCS-4 string. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller.


idna_utf8ace_to_ucs4 ()

int         idna_utf8ace_to_ucs4            (const char *input,
                                             unsigned long **output);

Convert possibly ACE encoded domain name in UTF-8 format into a UCS-4 string. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller.


idna_utf8ace_to_utf8 ()

int         idna_utf8ace_to_utf8            (const char *input,
                                             char **output);

Convert possibly ACE encoded domain name in UTF-8 format into a UTF-8 string. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller.


idna_utf8ace_to_locale ()

int         idna_utf8ace_to_locale          (const char *input,
                                             char **output);

Convert possibly ACE encoded domain name in UTF-8 format into a string encoded in the current locale's character set. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller.


idna_localeace_to_locale ()

int         idna_localeace_to_locale        (const char *input,
                                             char **output);

Convert possibly ACE encoded domain name in the locale's character set into a string encoded in the current locale's character set. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller.