我有一个异步下载多个urls的脚本,然后通过difflib持续监视它们的更改
import asyncio
import difflib
import aiohttp
urls = ['http://www.nytimes.com/',
'http://www.time.com/',
'http://www.economist.com/']
async def get_url(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
old = await resp.text()
print('Initial -',url)
while True:
async with session.get(url) as resp1:
new = await resp.text()
print('Got -',url)
diff = difflib.unified_diff(old, new)
for line in diff:
print(line)
old = new
if __name__ == '__main__':
loop = asyncio.get_event_loop()
ops = []
for url in urls:
ops.append(get_url(url))
loop.run_until_complete(asyncio.wait(ops))当我使用以下几行注释运行它时
for line in diff:
print(line)该脚本按预期运行,每秒检索每个url约3次。
当这些行没有注释时,脚本会变慢,这比串行运行检索要慢得多。
我不知道为什么会发生这种情况,是不是和difflib返回一个生成器有关?
发布于 2017-01-10 17:41:09
首先,你的代码中有一个错误,它应该是new = await resp1.text(),而不是new = await resp.text()。
unified_diff使用字符串列表,而不是直接使用字符串。您可以使用splitlines()将字符串快速拆分成行:
diff = difflib.unified_diff(old.splitlines(), new.splitlines())(目前,长字符串中的每个字符都被视为一行!)
https://stackoverflow.com/questions/40543860
复制相似问题