我最近发现了Disco,和Hadoop相比,我真的很喜欢它,但是我有一个问题。我的项目是这样设置的(如果有帮助的话,我很乐意剪切/粘贴真正的代码):
myfile.py
from disco.core import Job, result_iterator
import collections, sys
from disco.worker.classic.func import chain_reader
from disco.worker.classic.worker import Params
def helper1():
#do stuff
def helper2():
#do stuff
.
.
.
def helperN():
#do stuff
class A(Job):
@staticmethod
def map_reader(fd, params):
#Read input file
yield line
def map(self, line, params):
#Process lines into dictionary
#Iterate dictionary
yield k, v
def reduce(self, iter, out, params):
#iterate iter
#Process k, v into dictionary, aggregating values
#Process dictionry
#Iterate dictionary
out.add(k,v)
Class B(Job):
map_reader = staticmethod(chain_reader)
map = staticmethod(nop_map)
reduce(self, iter, out, params):
#Process iter
#iterate results
out.add(k,v)
if __name__ == '__main__':
from myfile import A, B
job1 = A().run(input=[input_filename], params=Params(k=k))
job2 = B().run(input=[job1.wait()], params=Params(k=k))
with open(output_filename, 'w') as fp:
for count, line in result_iterator(job2.wait(show=True)):
fp.write(str(count) + ',' + line + '\n')我的问题是,工作流程完全跳过了A的减少,然后下降到B的减少。
你知道这是怎么回事吗?
发布于 2016-01-02 20:26:25
这是一个简单但微妙的问题:我没有
show = True为了job1。出于某种原因,在为job2设置了show之后,它向我显示了job1的map()和map-霉运()步骤,因此由于我没有得到我所期望的最终结果,并且输入到job2函数中的结果看上去是错误的,所以我突然得出结论,job1步骤没有正常运行(这进一步支持在我添加job2之前,我验证了job1 1‘s输出的准确性)。
https://stackoverflow.com/questions/34541099
复制相似问题