我有一个关键字列表,,我想验证这些关键字中是否有一个包含超过100000个域名的文件。为了更快地进行处理,我希望实现multiprocessing,以便可以并行地验证每个关键字。
我的代码似乎不能很好地工作,因为单个处理要快得多。怎么了?:
import time
from multiprocessing import Pool
def multiprocessing_func(keyword):
# File containing more than 100k domain names
# URL: https://raw.githubusercontent.com/CERT-MZ/projects/master/Domain-squatting/domain-names.txt
file_domains = open("domain-names.txt", "r")
for domain in file_domains:
if keyword in domain:
print("similar domain identified:", domain)
# Rewind the file, start from the begining
file_domains.seek(0)
if __name__ == '__main__':
starttime = time.time()
# Keywords to check
keywords = ["google","facebook", "amazon", "microsoft", "netflix"]
# Create a multiprocessing Pool
pool = Pool()
for keyword in keywords:
print("Checking keyword:", keyword)
# Without multiprocessing pool
#multiprocessing_func(keyword)
# With multiprocessing pool
pool.map(multiprocessing_func, keyword)
# Total run time
print('That took {} seconds'.format(time.time() - starttime))发布于 2020-09-10 19:58:57
想想为什么这个计划:
import multiprocessing as mp
def work(keyword):
print("working on", repr(keyword))
if __name__ == "__main__":
with mp.Pool(4) as pool:
pool.map(work, "google")版画
working on 'g'
working on 'o'
working on 'o'
working on 'g'
working on 'l'
working on 'e'map()工作在序列上,字符串是序列。与将map()调用放在循环中不同,您可能只想用keywords (整个列表)作为第二个参数调用它一次。
https://stackoverflow.com/questions/63836761
复制相似问题