Hi zusammen!
Ich versuche gerade, diese riesege Perl-RegExp zur Prüfung der Korrektheit der Syntax einer URL nach PHP zu übersetzten. Leider bekomme ich immer die Meldung "Warning: Unknown modifier '*' in [...]\linkcheck.php on line 100". Zeile 100 ist dabei die Zeile mit dem preg_match Aufruf.
Folgendes habe ich bisher zurechtgebastelt:
<schnip>
// Basic definitions:
$lowalpha = '(?:[a-z])';
$hialpha = '(?:[A-Z])';
$alpha = "(?:$lowalpha$hialpha)";
$digit = '(?:\d)';
$safe = '(?:[$_.+-])';
$extra = '(?:[!*'(),])';
$national = '(?:[{}\\^~[]`])';
$punctuation = '(?:[<>#%"])';
$reserved = '(?:[;/?:@&=])';
$hex = '(?:[\dA-Fa-f])';
$escape = "(?:%$hex$hex)";
$unreserved = "(?:$alpha$digit$safe$extra)";
$uchar = "(?:$unreserved$escape)";
$xchar = "(?:$unreserved$escape$reserved)";
$digits = '(?:\d+)';
$alphadigit = "(?:$alpha\d)";
// URL schemeparts for ip based protocols:
$urlpath = "(?:$xchar*)";
$user = "(?:(?:".$uchar."[;?&=])*)"; //"(?:(?:$uchar[;?&=])*)";
$password = "(?:(?:".$uchar."[;?&=])*)";
$port = "(?:$digits)";
$hostnumber = "(?:$digits\.$digits\.$digits\.$digits)";
$toplabel = "(?:(?:$alpha(?:$alphadigit-)*$alphadigit)$alpha)";
$domainlabel = "(?:(?:$alphadigit(?:$alphadigit-)*$alphadigit)$alphadigit)";
$hostname = "(?:(?:$domainlabel\.)*$toplabel)";
$host = "(?:(?:$hostname)(?:$hostnumber))";
$hostport = "(?:(?:$host)(?::$port)?)";
$login = "(?:(?:$user(?::$password)?@)?$hostport)";
$ip_schemepart = "(?://$login(?:/$urlpath)?)";
$schemepart = "(?:$xchar*$ip_schemepart)";
$scheme = "(?:(?:".$lowalpha.$digit."[+.-])+)";
// The generic form of a URL is:
$genericurl = "(?:$scheme:$schemepart)";
// The predefined schemes:
// FTP (see also RFC959)
$fsegment = "(?:(?:".$uchar."[?:@&=])*)";
$ftptype = "(?:[AIDaid])";
$fpath = "(?:$fsegment(?:/$fsegment)*)";
$ftpurl = "(?:ftp://$login(?:/$fpath(?:;type=$ftptype)))";
// FILE
$fileurl = "(?:file://(?:(?:$host)localhost)?/$fpath)";
// HTTP
$httpuchar = "(?:(?:$alpha$digit$safe(?:[!*',]))$escape)";
$hsegment = "(?:(?:".$httpuchar."[;:@&=~])*)";
$search = "(?:(?:".$httpuchar."[;:@&=~])*)";
$hpath = "(?:$hsegment(?:/$hsegment)*)";
$httpurl = "(?:http://$hostport(?:/$hpath(?:\?$search)?)?)";
// GOPHER (see also RFC1436)
$gopher_plus = "(?:$xchar*)";
$selector = "(?:$xchar*)";
$gtype = "(?:$xchar)";
$gopherurl = "(?:gopher://$hostport(?:/$gtype(?:$selector(?:%09$search(?:%09$gopher_plus)?)?)?)?)";
// MAILTO (see also RFC822)
$encoded822addr = "(?:$xchar+)";
$mailtourl = "(?:mailto:$encoded822addr)";
// NEWS (see also RFC1036)
$article = "(?:(?:".$uchar."[;/?:&=])+@$host)";
$group = "(?:$alpha(?:".$alpha.$digit."[.+_-])*)";
$grouppart = "(?:$article$group\*)";
$newsurl = "(?:news:$grouppart)";
// NNTP (see also RFC977)
$nntpurl = "(?:nntp://$hostport/$group(?:/$digits)?)";
// TELNET
$telneturl = "(?:telnet://$login(?:/)?)";
// WAIS (see also RFC1625)
$wpath = "(?:$uchar*)";
$wtype = "(?:$uchar*)";
$database = "(?:$uchar*)";
$waisdoc = "(?:wais://$hostport/$database/$wtype/$wpath)";
$waisindex = "(?:wais://$hostport/$database\?$search)";
$waisdatabase = "(?:wais://$hostport/$database)";
$waisurl = "(?:$waisdatabase$waisindex$waisdoc)";
// PROSPERO
$fieldvalue = "(?:(?:".$uchar."[?:@&]))";
$fieldname = "(?:(?:".$uchar."[?:@&]))";
$fieldspec = "(?:;$fieldname=$fieldvalue)";
$psegment = "(?:(?:".$uchar."[?:@&=]))";
$ppath = "(?:$psegment(?:/$psegment)*)";
$prosperourl = "(?:prospero://$hostport/$ppath(?:$fieldspec)*)";
$url = "$httpurl$ftpurl$newsurl$nntpurl$telneturl$gopherurl$waisurl$mailtourl$fileurl$prosperourl";
$check = preg_match("!$url!", $HTTP_GET_VARS["test"]);
<schnap>
Hat jemand eine Idee, woran das liegen könnte? Oder gibt es diese PHP-Übersetzung vielleicht schon irgendwo?
Grüße,
Stefan