("Im Prinzip" gelöst) Kopieren von Webseiten von Def, 19.04.2008 22:37

Kopieren von Webseiten

Def 19.04.2008 22:09

software

Hi,

ich benutze HTTrack Version 3.42 unter Fedora Linux und versuche - erfolglos - das folgende elektronische Buch auf meine Festplatte zu kopieren: How to Think Like a Computer Scientist. Learning with Python. 2nd Edition. (Gibt als Download offenbar nur "lore source and XHTML", womit ich nichts anfangen kann.) Ich benutze dabei die unveränderten Standardeinstellungen von HTTrack. Andere Webseiten konnte ich damit bisher problemlos kopieren.
Das Problem:
Entweder landet HTTrack - statt den Download zu beginnen - auf der Seite http://localhost.localdomain:8082/server/refresh.html und schreibt:

Problem loading page
Unable to connect
Firefox can't establish a connection to the server at localhost.localdomain:8082.

*   The site could be temporarily unavailable or too busy. Try again in a few
          moments.
    *   If you are unable to load any pages, check your computer's network
          connection.
    *   If your computer or network is protected by a firewall or proxy, make sure
          that Firefox is permitted to access the Web.

Oder, wenn ich den misslungenen Download dann wiederaufnehmen und fortführen will, vermeldet HTTrack stolz: "Site mirroring finished!", hat aber in Wirklichkeit nur die erste Seite (offenbar ohne Style-Informationen) als "index.html" abgespeichert. Das Logfile vermittelt leider auch keine tieferen Einblicke:

HTTrack3.42-noV6+libhtsjava.so.2 launched on Sat, 19 Apr 2008 21:55:27 at http://openbookproject.net/thinkcs/python/english2e/ +*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/*
(webhttrack -q -%i -iC1 http://openbookproject.net/thinkcs/python/english2e/ -O "/home/def/websites/pythontutorial" -n -%P -N0 -s2 -p7 -D -a -K0 -c4 -%k -A25000 -F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%F "" +*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -%s -%u )

Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may contain sensitive information,
such as username/password authentication for websites mirrored in this project
do not share these files/folders if you want these information to remain private

No files purged

HTTrack Website Copier/3.42 mirror complete in 0 seconds : 2 links scanned, 1 files written (20239 bytes overall), no files updated [0 bytes received at 0 bytes/sec]
(No errors, 0 warnings, 0 messages)
21:55:27 Info: Top index rebuilt (done)

---

Das Problem *könnte* damit zusammenhängen, dass auf den genannten Webseiten offenbar XML (XHTML, SVG, soweit ich gesehen habe) verwendet wird, und laut einiger vager Hinweise, die ich im HTTrack-Forum entdeckt habe, (leider nichts Aktuelles dazu) HTTrack möglicherweise schlicht nicht in der Lage ist, XML zu parsen.
Weiß jemand mehr? Ist das die wirkliche Ursache? Gibt es ein alternatives Programm zu HTTrack, dass mit XML umgehen kann?

Danke!
Def

Beitrag melden

– Informationen zu den Bewertungsregeln

("Im Prinzip" gelöst) Kopieren von Webseiten
Def 19.04.2008 22:37

software
– Informationen zu den Bewertungsregeln
Ich weiß nicht, warum mir das bisher nicht aufgefallen war, aber ich habe gerade den Download, der auf der genannten Webseite angeboten wird, nochmals durchgesehen und mit Erstaunen bermerkt, dass es einen Unterordner namens "xhtml" gibt, in dem alle gewünschten Dateien enthalten sind!
Also ist das Problem im Prinzip gelöst.
Warum HTTrack am Kopieren scheitert, würde mich allgemein aber trotzdem interessieren.

Gruß
Def
Beitrag melden

–
Informationen zu den Bewertungsregeln

SELFHTML Forum - Ergänzung zur Dokumentation Übersicht

Def: Kopieren von Webseiten

Kopieren von Webseiten

("Im Prinzip" gelöst) Kopieren von Webseiten

Kopieren von Webseiten

("Im Prinzip" gelöst) Kopieren von Webseiten