文章/答案/技术大牛

发布

社区首页 >问答首页 >Python多处理文件读取

问Python多处理文件读取
EN

Stack Overflow用户

提问于 2020-09-10 19:51:15

回答 1查看 65关注 0票数 0

我有一个关键字列表，，我想验证这些关键字中是否有一个包含超过100000个域名的文件。为了更快地进行处理，我希望实现multiprocessing，以便可以并行地验证每个关键字。

我的代码似乎不能很好地工作，因为单个处理要快得多。怎么了？：

import time
from multiprocessing import Pool


def multiprocessing_func(keyword):

    # File containing more than 100k domain names
    # URL: https://raw.githubusercontent.com/CERT-MZ/projects/master/Domain-squatting/domain-names.txt
    file_domains = open("domain-names.txt", "r")

    for domain in file_domains:
        if keyword in domain:
            print("similar domain identified:", domain)
            
    # Rewind the file, start from the begining
    file_domains.seek(0)


if __name__ == '__main__':

    starttime = time.time()

    # Keywords to check
    keywords = ["google","facebook", "amazon", "microsoft", "netflix"]

    # Create a multiprocessing Pool
    pool = Pool()  

    for keyword in keywords:
        print("Checking keyword:", keyword)
        
        # Without multiprocessing pool
        #multiprocessing_func(keyword)
        
        # With multiprocessing pool
        pool.map(multiprocessing_func, keyword)

    # Total run time
    print('That took {} seconds'.format(time.time() - starttime))

python

multithreading

multiprocessing

回答 1

Stack Overflow用户

发布于 2020-09-10 19:58:57

想想为什么这个计划：

import multiprocessing as mp

def work(keyword):
    print("working on", repr(keyword))

if __name__ == "__main__":
    with mp.Pool(4) as pool:
        pool.map(work, "google")

版画

working on 'g'
working on 'o'
working on 'o'
working on 'g'
working on 'l'
working on 'e'

map()工作在序列上，字符串是序列。与将map()调用放在循环中不同，您可能只想用keywords (整个列表)作为第二个参数调用它一次。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63836761

复制

相似问题

问Python多处理文件读取
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python多处理文件读取EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python多处理文件读取
EN