我试图获得包含异步的80000项的信息,但是当我运行代码“服务器断开连接”时出错了。如果有人能告诉我代码出了什么问题,或者我需要添加什么来解决这个问题
async def get_page(session,url):
async with session.get(url) as r:
contenedor = requests.get(url).text
soup = BeautifulSoup(await r.text(), "lxml")
description_soup = soup.find(itemprop="description")
text = description_soup.get_text().strip().replace("\r\n"," ")
text = text.replace("DrivePart", "Drive Part")
descripcion = text[:text.find("Part")]
partnumber = text[text.find("Part"):]
compatibility = soup.find("span",{"class":"roboto_black10"})
if " is compatible with " in contenedor:
compa = compatibility.get_text().strip()
else:
compa = ""
return [url,descripcion,partnumber,compa]
async def get_all(session, urls):
tasks = []
for url in urls:
task = asyncio.create_task(get_page(session, url))
tasks.append(task)
results = await asyncio.gather(*tasks)
return results
async def main(urls):
async with aiohttp.ClientSession() as session:
data = await get_all(session,urls)
return data
await main(url["url"].tolist())发布于 2022-09-03 05:19:39
我所能看到的代码没有什么问题(除了您两次调用相同的页面,一次用aiohttp,另一次用请求)。我怀疑with服务器不喜欢被大量的http电话攻击。我会在每个电话之间设置一个延迟,或与网站接线员交谈,以允许大量连接。另一种方法是检查http状态代码,然后只在resp.status != 200时才重试逻辑。这将是最有力的解决方案。
https://stackoverflow.com/questions/73589770
复制相似问题