文章/答案/技术大牛

发布

社区首页 >问答首页 >Python线程/线程实现

问Python线程/线程实现
EN

Stack Overflow用户

提问于 2016-07-29 22:02:26

回答 1查看 2.8K关注 0票数 2

我一直在尝试第一次尝试线程化脚本。它最终将成为一个web刮板，它的工作速度有望比我之前制作的原始线性抓取脚本快一点。

在阅读和玩了几个小时的示例代码之后。就实现而言，我仍然不确定什么是正确的。

目前，我有以下代码，我一直在尝试：

from Queue import Queue
import threading

def scrape(queue):
    global workers
    print worker.getName()
    print queue.get()
    queue.task_done()
    workers -= 1

queue = Queue(maxsize=0)
threads = 10
workers = 0


with open('test.txt') as in_file:       
    for line in in_file:
        queue.put(line)

while not (queue.empty()):
    if (threads != workers):
        worker = threading.Thread(target=scrape, args=(queue,))
        worker.setDaemon(True)
        worker.start()
        workers += 1

这个想法是，我在test.txt文件中有一个URL列表。我打开文件并将所有URL放入队列中。从那里我得到了10个线程，它们从队列中拉出并抓取一个网页，或者在这个例子中简单地打印出被拉出的行。

一旦函数完成，我删除一个“工作线程”，然后一个新的线程替换它，直到队列为空。

在我的真实实现中，在某些时候，我将不得不从我的函数碎片中获取数据，并将其写入.csv文件。但是，现在我只是想了解如何正确地实现线程。

我见过类似上面的使用‘Thread’的例子……我也见过使用继承类的'threading‘例子。我只是想知道我应该使用什么，以及管理它的正确方法。

别对我太客气了，我只是个初学者，想要理解threads....and是的，我知道它可能会变得非常复杂。然而，我认为这对于第一次尝试来说应该是足够简单的…

python

multithreading

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-07-29 22:24:12

在Python2.x上，multiprocessing.dummy (使用线程)是一个很好的选择，因为它易于使用(在Python3.x中也是可能的)

如果你发现抓取是受CPU限制的，并且你有多个CPU核心，这样你就可以非常简单地切换到真正的multiprocessing，可能会获得很大的加速比。

(由于performance optimization，Python通常不能从具有线程的多个CPU中获益，因为具有多个进程-您必须自己找出在您的情况下哪个更快)

使用mutliprocessing.dummy，您可以这样做

from multiprocessing.dummy import Pool
# from multiprocessing import Pool # if you want to use more cpus

def scrape(url):
    data = {"sorted": sorted(url)} # normally you would do something more interesting
    return (url, data)

urls=[]
threads = 10

if __name__=="__main__":
    with open('test.txt') as in_file:       
        urls.extend(in_file) # lines

    p=Pool(threads)
    results=list(p.imap_unordered(scrape,urls))
    p.close()
    print results # normally you would process your results here

在Python3.x上，concurrent.futures可能是更好的选择。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/38660815

复制

相似问题

问Python线程/线程实现
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python线程/线程实现EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python线程/线程实现
EN