Lieber Tom,
ich habe mir mit Hilfe des SELFHTML-Forumsarchivs folgende Klasse gebaut. Vielleicht nützt sie Dir ja etwas?
<?php
/**
* UTF-8 Ensurer
*
* This class is for making sure some string data is
* encoded in UTF-8.
*
* Felix Riesterer (http://felix-riesterer.de)
*/
class UTF8_Ensurer {
public function ensure ($s) {
if (is_string($s) && !$this->is_utf8($s)) {
$s = utf8_encode($s);
}
return $s;
}
public function iso ($s) {
if (is_string($s) && $this->is_utf8($s)) {
$s = utf8_decode($s);
}
return $s;
}
/* check for UTF-8 encoding: the following set of functions
* has been taken (and adapted) from SELFHTML-Forum archive
* http://forum.de.selfhtml.org/archiv/2005/10/t116805/#m747567
*/
public function is_utf8 ($s) {
$t = $this;
$len = strlen($s);
$i = 0;
while ($i < $len) {
$c = ord($s{$i++});
if ($t->valid_1byte($c)) { // continue
continue;
} elseif ($t->valid_2byte($c)) { // check 1 byte
if ($i >= $len || !$t->valid_nextbyte(ord($s{$i++})))
return false;
} elseif ($t->valid_3byte($c)) { // check 2 bytes
if ($i >= $len || !$t->valid_nextbyte(ord($s{$i++})))
return false;
if ($i >= $len || !$t->valid_nextbyte(ord($s{$i++})))
return false;
} elseif ($t->valid_4byte($c)) { // check 3 bytes
if ($i >= $len || !$t->valid_nextbyte(ord($s{$i++})))
return false;
if ($i >= $len || !$t->valid_nextbyte(ord($s{$i++})))
return false;
if ($i >= $len || !$t->valid_nextbyte(ord($s{$i++})))
return false;
} else {
return false; // 10xxxxxx occuring alone
} // goto next char
}
return true; // done
}
private function valid_1byte ($c) {
if (!is_int($c))
return false;
return ($c & 0x80)==0x00;
}
private function valid_2byte ($c) {
if (!is_int($c))
return false;
return ($c & 0xE0)==0xC0;
}
private function valid_3byte ($c) {
if (!is_int($c))
return false;
return ($c & 0xF0)==0xE0;
}
private function valid_4byte ($c) {
if (!is_int($c))
return false;
return ($c & 0xF8)==0xF0;
}
private function valid_nextbyte ($c) {
if (!is_int($c))
return false;
return ($c & 0xC0)==0x80;
}
}
?>
Dazu noch dieses Schmankerl aus eigener Produktion:
public function normalize_utf8_to_lower_case ($s) {
// replacements for lower-case ASCII-characters
$r = array(
'a' => 'AaÀÁÂÃÅàáâãåĀāĂ㥹ǍǎǺǻẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặΆ',
'ae' => 'ÄäæÆǼǽ',
'b' => 'bB',
'c' => 'CcÇçĆćĈĉĊċČč',
'd' => 'DdÐĎďĐđ',
'e' => 'EeÈÉÊËèéêëĒēĔĕĖėĘęĚěẸẹẺẻẼẽẾếỀềỂểỄễỆệΈ',
'f' => 'Ffƒ',
'g' => 'GgĜĝĞğĠġĢģ',
'h' => 'HhĤĥĦħΉ',
'i' => 'IiÌÍÎÏìíîïĨĩĪīĬĭĮįİıǏǐỈỉỊịΊΐ',
'ij' => 'IJij',
'j' => 'JjĴĵ',
'k' => 'KkĶķĸ',
'l' => 'LlĹĺĻļĽľĿŀŁł',
'm' => 'Mm',
'n' => 'NnÑñŃńŅņŇňʼn',
'ng' => 'Ŋŋ',
'o' => 'OoÒÓÔÕØòóôõøŌōŎŏŐőƠơǑǒǾǿΌỌọỎỏỐốỒồỔổỖỗỘộỚớỜờỞởỠỡỢợ',
'oe' => 'ÖöŒœ',
'p' => 'Pp',
'q' => 'Qq',
'r' => 'RrŔŕŖŗŘř',
's' => 'SsŚśŜŝŞşŠš',
'ss' => 'ßß',
't' => 'TtŢţŤťŦŧ',
'u' => 'UuÙÚÛŨũŪūŬŭŮůŰűŲųƯưǓǔǕǖǗǘǙǚǛǜùúûỤụỦủỨứỪừỬửỮữỰự',
'ue' => 'Üü',
'v' => 'Vv',
'w' => 'WwŴŵẀẁẂẃẄẅ',
'x' => 'Xx',
'y' => 'YyÝýÿŶŷŸΎỲỳỴỵỶỷỸỹ',
'z' => 'ZzŹźŻżŽž'
);
// replace
foreach ($r as $c => $variants) {
$s = preg_replace("~[$variants]~su", $c, $s);
}
return $s;
}
Wenn es Dein Problem nicht löst, dann brauchst Du vielleicht eine neue Betrachtungsweise darauf?
Liebe Grüße,
Felix Riesterer.
--
ie:% br:> fl:| va:) ls:[ fo:) rl:| n4:? de:> ss:| ch:? js:) mo:} zu:)
ie:% br:> fl:| va:) ls:[ fo:) rl:| n4:? de:> ss:| ch:? js:) mo:} zu:)