Stefan Kleeschulte: PHP - Regulärer Ausdruck - URL Syntax prüfen

Beitrag lesen

Hi zusammen!

Ich versuche gerade, diese riesege Perl-RegExp zur Prüfung der Korrektheit der Syntax einer URL nach PHP zu übersetzten. Leider bekomme ich immer die Meldung "Warning: Unknown modifier '*' in [...]\linkcheck.php on line 100". Zeile 100 ist dabei die Zeile mit dem preg_match Aufruf.

Folgendes habe ich bisher zurechtgebastelt:

<schnip>

// Basic definitions:
$lowalpha       =  '(?:[a-z])';
$hialpha        =  '(?:[A-Z])';
$alpha          =  "(?:$lowalpha$hialpha)";
$digit          =  '(?:\d)';
$safe           =  '(?:[$_.+-])';
$extra          =  '(?:[!*'(),])';
$national       =  '(?:[{}\\^~[]`])';
$punctuation    =  '(?:[<>#%"])';
$reserved       =  '(?:[;/?:@&=])';
$hex            =  '(?:[\dA-Fa-f])';
$escape         =  "(?:%$hex$hex)";
$unreserved     =  "(?:$alpha$digit$safe$extra)";
$uchar          =  "(?:$unreserved$escape)";
$xchar          =  "(?:$unreserved$escape$reserved)";
$digits         =  '(?:\d+)';
$alphadigit     =  "(?:$alpha\d)";

// URL schemeparts for ip based protocols:
$urlpath        =  "(?:$xchar*)";
$user           =  "(?:(?:".$uchar."[;?&=])*)"; //"(?:(?:$uchar[;?&=])*)";
$password       =  "(?:(?:".$uchar."[;?&=])*)";
$port           =  "(?:$digits)";
$hostnumber     =  "(?:$digits\.$digits\.$digits\.$digits)";
$toplabel       =  "(?:(?:$alpha(?:$alphadigit-)*$alphadigit)$alpha)";
$domainlabel    =  "(?:(?:$alphadigit(?:$alphadigit-)*$alphadigit)$alphadigit)";
$hostname       =  "(?:(?:$domainlabel\.)*$toplabel)";
$host           =  "(?:(?:$hostname)(?:$hostnumber))";
$hostport       =  "(?:(?:$host)(?::$port)?)";
$login          =  "(?:(?:$user(?::$password)?@)?$hostport)";
$ip_schemepart  =  "(?://$login(?:/$urlpath)?)";

$schemepart     =  "(?:$xchar*$ip_schemepart)";
$scheme         =  "(?:(?:".$lowalpha.$digit."[+.-])+)";

// The generic form of a URL is:
$genericurl     =  "(?:$scheme:$schemepart)";

// The predefined schemes:

// FTP (see also RFC959)
$fsegment       =  "(?:(?:".$uchar."[?:@&=])*)";
$ftptype        =  "(?:[AIDaid])";
$fpath          =  "(?:$fsegment(?:/$fsegment)*)";
$ftpurl         =  "(?:ftp://$login(?:/$fpath(?:;type=$ftptype)))";

// FILE
$fileurl        =  "(?:file://(?:(?:$host)localhost)?/$fpath)";

// HTTP
$httpuchar      =  "(?:(?:$alpha$digit$safe(?:[!*',]))$escape)";
$hsegment       =  "(?:(?:".$httpuchar."[;:@&=~])*)";
$search         =  "(?:(?:".$httpuchar."[;:@&=~])*)";
$hpath          =  "(?:$hsegment(?:/$hsegment)*)";
$httpurl        =  "(?:http://$hostport(?:/$hpath(?:\?$search)?)?)";

// GOPHER (see also RFC1436)
$gopher_plus    =  "(?:$xchar*)";
$selector       =  "(?:$xchar*)";
$gtype          =  "(?:$xchar)";
$gopherurl      =  "(?:gopher://$hostport(?:/$gtype(?:$selector(?:%09$search(?:%09$gopher_plus)?)?)?)?)";

// MAILTO (see also RFC822)
$encoded822addr =  "(?:$xchar+)";
$mailtourl      =  "(?:mailto:$encoded822addr)";

// NEWS (see also RFC1036)
$article        =  "(?:(?:".$uchar."[;/?:&=])+@$host)";
$group          =  "(?:$alpha(?:".$alpha.$digit."[.+_-])*)";
$grouppart      =  "(?:$article$group\*)";
$newsurl        =  "(?:news:$grouppart)";

// NNTP (see also RFC977)
$nntpurl        =  "(?:nntp://$hostport/$group(?:/$digits)?)";

// TELNET
$telneturl      =  "(?:telnet://$login(?:/)?)";

// WAIS (see also RFC1625)
$wpath          =  "(?:$uchar*)";
$wtype          =  "(?:$uchar*)";
$database       =  "(?:$uchar*)";
$waisdoc        =  "(?:wais://$hostport/$database/$wtype/$wpath)";
$waisindex      =  "(?:wais://$hostport/$database\?$search)";
$waisdatabase   =  "(?:wais://$hostport/$database)";
$waisurl        =  "(?:$waisdatabase$waisindex$waisdoc)";

// PROSPERO
$fieldvalue     =  "(?:(?:".$uchar."[?:@&]))";
$fieldname      =  "(?:(?:".$uchar."[?:@&]))";
$fieldspec      =  "(?:;$fieldname=$fieldvalue)";
$psegment       =  "(?:(?:".$uchar."[?:@&=]))";
$ppath          =  "(?:$psegment(?:/$psegment)*)";
$prosperourl    =  "(?:prospero://$hostport/$ppath(?:$fieldspec)*)";

$url            =  "$httpurl$ftpurl$newsurl$nntpurl$telneturl$gopherurl$waisurl$mailtourl$fileurl$prosperourl";

$check = preg_match("!$url!", $HTTP_GET_VARS["test"]);

<schnap>

Hat jemand eine Idee, woran das liegen könnte? Oder gibt es diese PHP-Übersetzung vielleicht schon irgendwo?

Grüße,
Stefan