首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >无法使用wget下载(wget重试“无限”)

无法使用wget下载(wget重试“无限”)
EN

Stack Overflow用户
提问于 2013-02-20 17:01:14
回答 1查看 6K关注 0票数 0

我必须使用wget抓取网站http://docbao.com.vn/,但是wget总是消息

HTTP请求已发送,正在等待响应...没有收到任何数据。

再试一次。

例如,我在一个类别te.dec中爬行了所有网页,结果是

代码语言:javascript
复制
congnh@congnh-pc:~/Source/datasection/congnh-crawler/sh$ wget "http://docbao.com.vn/chuyenmuc/muc-1/Quoc_te.dec" -O -
--2013-02-20 23:53:16--  http://docbao.com.vn/chuyenmuc/muc-1/Quoc_te.dec
Resolving docbao.com.vn (docbao.com.vn)... 123.30.51.174
Connecting to docbao.com.vn (docbao.com.vn)|123.30.51.174|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

--2013-02-20 23:53:17--  (try: 2)  http://docbao.com.vn/chuyenmuc/muc-1/Quoc_te.dec
Connecting to docbao.com.vn (docbao.com.vn)|123.30.51.174|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

--2013-02-20 23:53:19--  (try: 3)  http://docbao.com.vn/chuyenmuc/muc-1/Quoc_te.dec
Connecting to docbao.com.vn (docbao.com.vn)|123.30.51.174|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

--2013-02-20 23:53:22--  (try: 4)  http://docbao.com.vn/chuyenmuc/muc-1/Quoc_te.dec
Connecting to docbao.com.vn (docbao.com.vn)|123.30.51.174|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

--2013-02-20 23:53:27--  (try: 5)  http://docbao.com.vn/chuyenmuc/muc-1/Quoc_te.dec
Connecting to docbao.com.vn (docbao.com.vn)|123.30.51.174|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

--2013-02-20 23:53:32--  (try: 6)  http://docbao.com.vn/chuyenmuc/muc-1/Quoc_te.dec
Connecting to docbao.com.vn (docbao.com.vn)|123.30.51.174|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

--2013-02-20 23:53:38--  (try: 7)  http://docbao.com.vn/chuyenmuc/muc-1/Quoc_te.dec
Connecting to docbao.com.vn (docbao.com.vn)|123.30.51.174|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

--2013-02-20 23:53:45--  (try: 8)  http://docbao.com.vn/chuyenmuc/muc-1/Quoc_te.dec
Connecting to docbao.com.vn (docbao.com.vn)|123.30.51.174|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.

--2013-02-20 23:53:53--  (try: 9)  http://docbao.com.vn/chuyenmuc/muc-1/Quoc_te.dec
Connecting to docbao.com.vn (docbao.com.vn)|123.30.51.174|:80... connected.
HTTP request sent, awaiting response... No data received.
Retrying.
...

为什么要“无限地”重试?或者有什么问题?

谢谢

丛林

EN

回答 1

Stack Overflow用户

发布于 2013-03-01 14:13:54

很抱歉声明了显而易见的内容,但是:wget重试,因为它没有接收任何数据。它发送HTTP报头,然后远程主机立即关闭连接。我只能猜测,这种不规范的行为是由于服务器端的错误配置造成的,可能是故意的。

在浏览了一下之后,我发现,一旦您发出信号,您可以处理gzip编码的响应,内容就会得到服务。可以通过将--header="accept-encoding: gzip"添加到wget命令中来做到这一点。这对于爬行wget是有问题的,因为它不能恢复到again内容中。您需要编写一个脚本来处理这种情况,或者使用另一个可以处理此类内容的工具。

请注意,并非所有网站都允许对其内容进行抓取。在你这么做之前,请检查他们的TOS。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/14985603

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档