robots.txt von wahsaga, 06.04.2005 12:00

robots.txt

wahsaga Homepage des Autors 06.04.2005 12:00

sonstiges

hi,

Bist Du sicher, daß es der Yahoo! bzw. inktomi-Robot ist?

Es ist:
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

also ist es der Crawler von Yahoo!

laut der angegebenen seite müsste der sich zumindest daran halten:

Yahoo! Slurp will obey the first entry in the robots.txt file with a User-Agent containing "Slurp". If there is no such record, it will obey the first entry with a User-Agent of "*".

ach ja, und im satz darunter steht noch, warum er sie trotzdem _einliest_:

Disallowed documents, including slash (the home page of the site), are not indexed, nor are links in those documents followed. Yahoo! Slurp does read the home page at each site and uses it internally, but if it is disallowed it is neither indexed nor followed.

also, der slurp schlürft ein dokument auch, wenn er es laut robots.txt nicht soll - er folgt dort aber weder links, noch indexiert er das dokument.

hm, komisches verhalten - und zu welchem "internen gebrauch" das passieren soll, ist mir auch unklar.

wenn du slurp also wirklich davon abhalten willst, kommt wohl offenbar nur eine abfrage des user agent strings in frage, per mod_rewrite o.ä., um dann mit einem 401 forbidden zu antworten.

gruß,
wahsaga

--
/voodoo.css:
#GeorgeWBush { position:absolute; bottom:-6ft; }

Beitrag melden

– Informationen zu den Bewertungsregeln

SELFHTML Forum - Ergänzung zur Dokumentation Übersicht

wahsaga: robots.txt

Beitrag lesen

robots.txt

robots.txt