下载一个网页的工作本地副本
我想下载一个网页的本地副本,并获得所有的CSS,图像,JavaScript等
在以前的讨论中(比如这里和这里 ,两者都是两年多的时间),通常会提出两个build议: wget -p
和httrack 。 但是,这些build议都失败了。 我非常感谢帮助使用这些工具来完成任务; 替代品也很可爱。
选项1: wget -p
wget -p
成功下载所有网页的先决条件(css,images,js)。 但是,当我在Web浏览器中加载本地副本时,页面无法加载先决条件,因为这些先决条件的path尚未从Web上的版本进行修改。
例如:
- 在页面的html中,需要修改
<link rel="stylesheet href="/stylesheets/foo.css" />
指向foo.css
的新的相对path - 在CSS文件中,同样需要调整
background-image: url(http://img.dovov.combar.png)
。
有没有办法修改wget -p
,使path是正确的?
选项2: httrack
httrack
似乎是一个用于镜像整个网站的好工具,但是我不清楚如何使用它来创build单个页面的本地副本。 在httrack论坛中有很多关于这个话题的讨论(例如这里 ),但似乎没有人能够有一个防弹的解决scheme。
选项3:另一种工具?
有人提出了有偿工具,但我不相信那里没有免费的解决scheme。
非常感谢!
wget有能力做你正在问的东西。 只需尝试以下操作:
wget -p -k http://www.example.com/
-p
将得到所有必要的元素来正确地查看网站(CSS,图像等)。 -k
将更改所有链接(包括CSS和图像的链接)以允许您脱机查看页面,因为它在线显示。
从Wget文档:
'-k' '--convert-links' After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-html content, etc. Each link will be changed in one of the two ways: The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link. Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to '../bar/img.gif'. This kind of transformation works reliably for arbitrary combinations of directories. The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to. Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif. Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet address rather than presenting a broken link. The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory. Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by '-k' will be performed at the end of all the downloads.