文章/答案/技术大牛

发布

社区首页 >问答首页 >Asyncio循环中的Asyncio循环

问Asyncio循环中的Asyncio循环
EN

Stack Overflow用户

提问于 2018-05-08 11:03:48

回答 1查看 107关注 0票数 2

我刚刚开始使用Asyncio，我正试着用它来解析一个网站。

我试图解析站点的6个部分(self.signals)，每个部分有N个页面，上面有表，所以基本上我是在尝试异步调用what部分的循环，并同步每个部分中的页面。这就是我到目前为止所拥有的。

class FinViz():
    def __init__(self):
        self.url = 'https://finviz.com/screener.ashx?v=160&s='

        self.signals = {
            'Earnings_Before' : 'n_earningsbefore',
            'Earnings_After' : 'n_earningsafter',
            'Most_Active' : 'ta_mostactive',
            'Top_Gainers' : 'ta_topgainers',
            'Most_Volatile' : 'ta_mostvolatile',
            'News' : 'n_majornews',
            'Upgrade' : 'n_upgrades',
            'Unusual_Volume' : 'ta_unusualvolume' 
        }

        self.ticks = []

    def _parseTable(self, data):
        i, signal = data
        url = self.signals[signal] if i == 0 else self.signals[signal] + '&r={}'.format(str(i * 20 + 1))
        soup = BeautifulSoup(urlopen(self.url + url, timeout = 3).read(), 'html5lib')
        table = soup.find('div', {'id' : 'screener-content'}).find('table', 
            {'width' : '100%', 'cellspacing': '1', 'cellpadding' : '3', 'border' : '0', 'bgcolor' : '#d3d3d3'})
        for row in table.findAll('tr'):
            col = row.findAll('td')[1]
            if col.find('a'):
                self.ticks.append(col.find('a').text)


    async def parseSignal(self, signal):
        try:
            soup = BeautifulSoup(urlopen(self.url + self.signals[signal], timeout = 3).read(), 'html5lib')

            tot = int(soup.find('td', {'class' : 'count-text'}).text.split()[1])

            with concurrent.futures.ThreadPoolExecutor(max_workers = 20) as executor:
                loop = asyncio.get_event_loop()
                futures = []
                for i in range(tot // 20 + (tot % 20 > 0)):
                    futures.append(loop.run_in_executor(executor, self._parseTable, (i, signal)))


                for response in await asyncio.gather(*futures):
                    pass    
        except URLError:
            pass


    async def getAll(self):
        with concurrent.futures.ThreadPoolExecutor(max_workers = 20) as executor:
            loop = asyncio.get_event_loop()
            futures = []
            for signal in self.signals:
                futures.append(await loop.run_in_executor(executor, self.parseSignal, signal))

            for response in await asyncio.gather(*futures):
                pass
        print(self.ticks)

if __name__ == '__main__':

    x = FinViz()
    loop = asyncio.get_event_loop()
    loop.run_until_complete(x.getAll())

这确实成功地完成了工作，但不知何故，它比我在没有asyncio的情况下进行解析要慢。

对异步菜鸟有什么建议吗？

编辑:添加完整代码

asynchronous

beautifulsoup

python-3.5

python-asyncio

python

回答 1

Stack Overflow用户

发布于 2019-02-03 17:43:14

请记住，python有一个GIL，所以线程化代码不会对性能有所帮助。但是，要潜在地提高速度，请注意，使用ProcessPoolExecutor会产生以下开销：

将子流程worker

pickle/unpickling结果发送回主流程

的

pickle/unpickling数据

如果您在分支安全环境中运行并将数据存储在全局变量中，则可以避免1。

您还可以做一些事情，如共享内存映射的file...also，共享原始字符串/字节是最快的。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50225173

复制

相似问题

问Asyncio循环中的Asyncio循环
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Asyncio循环中的Asyncio循环EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Asyncio循环中的Asyncio循环
EN