首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何创建hadoop runner?

如何创建hadoop runner?
EN

Stack Overflow用户
提问于 2013-08-27 09:30:24
回答 1查看 494关注 0票数 0

我有以下简单的mrjob脚本,它逐行读取一个大文件,对每一行执行一个操作并打印输出:

代码语言:javascript
复制
#!/usr/bin/env python                                                                                                           

from mrjob.job import MRJob

class LineProcessor(MRJob):
    def mapper(self, _, line):
        yield (line.upper(), None) # toy example: mapper just uppercase the line

if __name__ == '__main__':
    # mr_job = LineProcessor(args=['-r', 'hadoop', '/path/to/input']) # error!
    mr_job = LineProcessor(args=['/path/to/input'])
    with mr_job.make_runner() as runner:
        runner.run() 
        for line in runner.stream_output():
            key, value = mr_job.parse_output_line(line)
            print key.encode('utf-8')  # don't care about value in my case

(这只是一个简单的例子;在我的实际案例中,处理每一行代码的代价很高,这就是我想运行分布式的原因。)

它只能作为本地进程工作。如果我尝试使用'-r', 'hadoop' (参见上面注释掉的内容),我会得到以下奇怪的错误:

代码语言:javascript
复制
  File "mrjob/runner.py", line 727, in _get_steps
    'error getting step information: %s', stderr)
Exception: ('error getting step information: %s', 'Traceback (most recent call last):\n  File "script.py", line 11, in <module>\n    with mr_job.make_runner() as runner:\n  File "mrjob/job.py", line 515, in make_runner\n    " __main__, which doesn\'t work." % w)\nmrjob.job.UsageError: make_runner() was called with --steps. This probably means you tried to use it from __main__, which doesn\'t work.\n')

我如何在hadoop上实际运行它,即创建一个HadoopJobRunner

EN

回答 1

Stack Overflow用户

发布于 2013-10-05 02:53:41

你失踪了吗?

代码语言:javascript
复制
def steps(self):
        return [self.mr(
                          mapper_init = ...
                          mapper = self.mapper,
                          combiner = ...,
                          reducer = ...,
                  )]

在你的LineProcessor里?

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/18455538

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档