Hi!
Wer einfach nur mal schnell UTF-8 in Latin-1 konvertieren moechte, ohne die neueste Perl-Version oder diverse Module installieren zu muessen, findet folgenden Code vielleicht hilfreich. Voraussetzung fuer das korrekte Funktionieren ist, dass es sich beim Input wirklich um einen UTF-8-String handelt, der vollstaendig in Latin-1 darstellbar ist.
sub utf8_to_latin1($) {
my ($i, @s);
@s = unpack('C*', $_[0]);
for ($i=0; $i<@s; $i++) {
splice(@s, $i, 2, (($s[$i] & 0x03) << 6) | ($s[$i+1] & 0x3F))
if (($s[$i] & 0xFC) == 0xC0);
}
return pack('C*', @s);
}
Das ganze zum Nachvollziehen noch mal in ausfuehrlich:
sub utf8_to_latin1($) {
my ($i, @s);
@s = unpack('C*', $_[0]);
$i = 0;
while ($i < @s) {
if ($s[$i] & 0x80) {
# is a UTF-8 code
if (($s[$i] & 0xFC) == 0xC0) {
# this will render a valid Latin1 char
$s[$i] = (($s[$i] & 0x03) << 6) | ($s[$i+1] & 0x3F);
splice(@s, $i+1, 1);
$i++;
} else {
# any other unicode char
# we could determine the number of bytes of this code and skip them, but as the following
# values all have bit 7 set and bit 6 unset in a valid utf8 stream, we can just skip over
# this byte and the following will be automatically skipped as well. ok, we've seen more
# performant approaches, but this case is not expected to happen at all. after all, the
# string should be encodable in iso-8859-1
$i++;
}
} else {
# ASCII - leave unchanged
$i++;
}
}
return pack('C*', @s);
}
HTH && So long
I'm sorry. It has to end here.