iconv
library ¶The Recode library is able to use the capabilities of an
external, pre-installed iconv
library, usually as provided by GNU
libc
or the portable libiconv
written by Bruno Haible. In
fact, many capabilities of the Recode library are duplicated in
an external iconv
library, as they likely share many charsets.
We discuss, here, the issues related to this duplication, and other
peculiarities specific to the iconv
library.
The RECODE_STRICT_MAPPING_FLAG
option, corresponding to the
‘--strict’ flag, is implemented by adding iconv
option
//IGNORE
to the ‘after’ encoding. This has the side effect
that untranslatable input is only signalled at the end of the
conversion, whereas with Recode’s built-in conversion routines the error
will be signalled immediately.
If the string -translit
is appended to the after encoding,
characters being converted are transliterated when needed and possible.
This means that when a character cannot be represented in the target
character set, it can be approximated through one or several similar
looking characters. Characters that are outside of the target character
set and cannot be transliterated are replaced with a question mark (?)
in the output. This corresponds to the iconv
option
//TRANSLIT
.
To check whether iconv
is used for a particular conversion,
just use the ‘-v’ or ‘--verbose’ option, see Controlling how files are recoded, and
check whether ‘:iconv:’ appears as an intermediate charset.
The :iconv:
charset represents a conceptual pivot charset within
the external iconv
library (in fact, this pivot exists, but is
not directly reachable). This charset has a :
(a mere colon) and
:libiconv:
for aliases. It is not allowed to recode from or to
this charset directly. But when this charset is selected as an
intermediate, usually by automatic means, then the external iconv
Recode library is called to handle the transformations. By using an
‘--ignore=:iconv:’ option on the recode
call or
equivalently, but more simply, ‘-x:’, Recode is instructed to avoid
this charset as an intermediate, with the consequence that the external
iconv
library is not used. You can also use
--prefer-iconv
to use iconv
if possible. Consider these
calls:
recode l1..1250 < input > output recode -x: l1..1250 < input > output recode --prefer-iconv l1..1250 < input > output
All should transform input from ISO-8859-1
to CP1250
on output. The first call might use the external iconv
library, while the second call definitely avoids it. The third call
will use the external iconv
library if it supports the required
conversion. Whatever the path used, the results should normally be
identical. However, there might be observable differences. Most of
them might result from reversibility issues, as the external
iconv
engine does not likely address reversibility in the same
way. Even if much less likely, some differences might result from
slight errors in the tables used, such differences should then be
reported as bugs.
Discrepancies might be seen in the area of error detection and recovery.
The Recode library usually tries to detect canonicity errors in
input, and production of ambiguous output, but the external iconv
library does not necessarily do it the same way. Moreover, the
Recode library may not always recover as nicely as possible when
the external iconv
has no translation for a given character.
The external iconv
libraries may offer different sets of charsets
and aliases from one library to another, and also between successive
versions of a single library. Best is to check the documentation of
the external iconv
library, as of the time Recode was
installed, to know which charsets and aliases are being provided.
The ‘--ignore=:iconv:’ or ‘-x:’ options might be useful when
there is a need to make a recoding more exactly repeatable between
machines or installations, the idea being here to remove the variance
possibly introduced by the various implementations of an external
iconv
library. These options might also help deciding whether if
some recoding problem is genuine to Recode, or is induced by the
external iconv
library.