当使用Splash和Scrapy时,标题是从Splash服务器返回的,而不是从Splash呈现的网站返回的。
response.headers返回:
{b'Server': [b'TwistedWeb/19.7.0'], b'Date': [b'Sun, 11 Jul 2021 07:31:32 GMT'], b'Content-Type': [b'text/html; charset=utf-8']}我正在尝试获取实际网站的标题:
Connection: Keep-Alive
Content-Length: 5
Content-Type: text/html
Date: Sun, 11 Jul 2021 07:05:49 GMT
Keep-Alive: timeout=5, max=100
Server: Apache
X-Cache: HIT我如何才能获得网站的标题,而不是Splash服务器?
发布于 2021-07-11 15:46:24
我把它和下面的代码一起工作:
splash_lua_script = """
function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(0.5))
local entries = splash:history()
local last_response = entries[#entries].response
return {
html = splash:html(),
headers = last_response.headers
}
end
"""然后把它提交给response.headers和Scrapy。
https://stackoverflow.com/questions/68334175
复制相似问题