我注意到在使用给定的https网站处理response_data和response_done事件之间有大约120秒的延迟。我检查了一个正常的网页浏览器,并没有经历这种缓慢,所以我怀疑有什么事情,我必须做错。
下面是我跟踪事件时所做的事情(出于某种原因,use LWP::Debug qw(+)没有做任何事情):
use WWW::Mechanize;
use Time::HiRes qw(gettimeofday);
use IO::Handle;
my $mech = WWW::Mechanize->new(
timeout => 3,
autocheck => 1, # check success of each query
stack_depth => 0, # no keeping history
keep_alive => 50, # connection pool
);
$mech->agent_alias( 'Windows IE 6' );
open my $debugfile, '>traffic.txt';
$debugfile->autoflush(1);
$mech->add_handler( request_send => sub {
my $cur_time = gettimeofday();
my $req = shift;
print $debugfile "\n$cur_time === BEGIN HTTP REQUEST ===\n";
print $debugfile $req->dump();
print $debugfile "\n$cur_time === END HTTP REQUEST ===\n";
return
}
);
$mech->add_handler( response_header => sub {
my $cur_time = gettimeofday();
my $res = shift;
print $debugfile "\n$cur_time === GOT RESPONSE HDRS ===\n";
print $debugfile $res->dump();
return
}
);
$mech->add_handler( response_data => sub {
my $cur_time = gettimeofday();
my $res = shift;
my $content_length = length($res->content);
print $debugfile "$cur_time === Got response data chunk resp size = $content_length ===\n";
return
}
);
$mech->add_handler( response_done => sub {
my $cur_time = gettimeofday();
my $res = shift;
print $debugfile "\n$cur_time === BEGIN HTTP RESPONSE ===\n";
print $debugfile $res->dump();
print $debugfile "\n=== END HTTP RESPONSE ===\n";
return
}
);下面是这些痕迹的摘录(URL和cookie被混淆了):
1347463214.24724 === BEGIN HTTP REQUEST ===
GET https://...
Accept-Encoding: gzip
Referer: https://...
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Cookie: ...
Cookie2: $Version="1"
(no content)
1347463214.24724 === END HTTP REQUEST ===
1347463216.13134 === GOT RESPONSE HDRS ===
HTTP/1.1 200 OK
Date: Wed, 12 Sep 2012 15:20:08 GMT
Accept-Ranges: bytes
...
Server: Lotus-Domino
Content-Length: 377806
Content-Type: application/octet-stream
Last-Modified: Fri, 07 Sep 2012 06:25:33 GMT
Client-Peer: ...
Client-Response-Num: 1
Client-SSL-Cert-Issuer: ...
Client-SSL-Cert-Subject: ...
Client-SSL-Cipher: DES-CBC3-SHA
Client-SSL-Socket-Class: IO::Socket::SSL
(no content)
1347463216.48305 === Got response data chunk resp size = 4096 ===
1347463337.98131 === BEGIN HTTP RESPONSE ===
HTTP/1.1 200 OK
Date: Wed, 12 Sep 2012 15:20:08 GMT
Accept-Ranges: bytes
...
Server: Lotus-Domino
Content-Length: 377806
Content-Type: application/octet-stream
Last-Modified: Fri, 07 Sep 2012 06:25:33 GMT
Client-Date: Wed, 12 Sep 2012 15:22:17 GMT
Client-Peer: ...
Client-Response-Num: 1
Client-SSL-Cert-Issuer: ...
Client-SSL-Cert-Subject: ...
Client-SSL-Cipher: DES-CBC3-SHA
Client-SSL-Socket-Class: IO::Socket::SSL
PK\3\4\24\0\6\0\10\0\0\0!\0\x88\xBC\21Xi\2\0\0\x84\22\0\0\23\0\10\2[Content_Types].xml \xA2...
(+ 377294 more bytes not shown)
=== END HTTP RESPONSE ===在“获取响应数据块”和“开始HTTP响应”消息期间,您可以看到121.5秒的间隔。我有一种感觉,有时LWP::UserAgent在收到全部数据后挂起两分钟。
你知道那可能是从哪来的吗?
编辑这里是Wireshark的屏幕截图:我在120秒后得到FIN/ACK消息…

谢谢
发布于 2012-09-12 16:59:30
由于Borodin的回答,我找到了解决这个问题的方法:
我用以下方式修改了response_data事件处理程序子:
if($res->header('Content-Length') == length($res->content)) {
die "OK"; # Got whole data, not waiting for server to end the communication channel.
}
return 1; # In other cases make sure the handler is called for subsequent chunks然后,如果X-Died标头等于OK,则忽略调用方中的错误。
发布于 2012-09-12 16:35:53
我认为你的交易很可能花了那么长时间。LWP::UserAgent的文档显示
response_data处理程序需要为相同请求的后续块返回一个要再次调用的真值。
因此,因为处理程序不返回任何内容,所以只跟踪第一个返回的数据包。
根据您的输出,第一个4KB的数据将在2.2秒内到达,约为每秒2KB。整个数据的长度为369 2KB,因此您可能会多接收92个数据包,以每秒2kb的速度传输将需要3分钟。你在两分钟内就会得到回应,所以我认为你的时间是合理的。
发布于 2016-11-29 17:28:50
我知道现在这个问题已经很老了,但我最近也遇到了同样的问题。只有当未加密的HTTPS响应(包括标头)的大小正好为1024字节时才会发生。Benoit的响应似乎正好是4096字节,所以1024的倍数可能是显著的。我没有对服务器的控制,所以我不能产生任意长度的测试响应,也不能在任何其他服务器上重现这个问题。但是,1024字节的出现是可重复的。
环顾LWP代码(v6.05),我发现sysread被要求一次读取1024字节。所以,它第一次返回所有1024字节。然后立即第二次调用它,而不是返回0表示没有更多的数据,而是返回undef,指示一个错误,并将errno设置为EAGAIN,表示有更多的数据,但还没有可用。这将导致套接字上的选择,因为不会有更多的数据。超时需要120秒,然后返回我们所拥有的数据,这恰好是正确的结果。因此,我们没有错误,只是拖延了很长时间。
我没有足够方便的途径使用Benoit的解决方案。相反,我的解决方法是扩展HTTPS处理代码,以检查上述情况,并返回0而不是undef:
package LWP::Protocol::https::Socket;
sub sysread {
my $self = shift;
my $result = $self->SUPER::sysread(@_);
# If we get undef back then some error occurred. If it's EAGAIN
# then that ought to mean that there is more data to read but
# it's not available yet. We suspect the error may be false.
# $_[2] is the offset, so if it's defined and non-zero we have
# some data in the buffer.
# $_[0] is the buffer, so check it for an entire HTTP response,
# including the headers and the body. If the length specified
# by Content-Length is exactly the length of the body we have in
# the buffer, then take that as being complete and return a length
# here instead. Since it's unlikely that anything was read, the
# buffer will not have increased in size and the result will be zero
# (which was the expected result anyway).
if (!defined($result) &&
$!{EAGAIN} &&
$_[2] &&
$_[0] =~ /^HTTP\/\d+\.\d+\s+\d+\s+.*\s+content-length\s*:\s*(\d+).*?\r?\n\r?\n(.*)$/si &&
length($2) == $1) {
return length($_[0]) - $_[2]; # bufferlen - offset
}
return $result;
}https://stackoverflow.com/questions/12391671
复制相似问题