首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在生成一组新的URL以获取Nutch时有问题

在生成一组新的URL以获取Nutch时有问题
EN

Stack Overflow用户
提问于 2016-12-14 19:23:17
回答 1查看 220关注 0票数 0

在生成一组新的URL以获取Nutch方面有问题。

下面是我使用的命令:

代码语言:javascript
复制
# $NUTCH_HOME/runtime/local/bin/nutch generate -topN 10

结果:

代码语言:javascript
复制
Generator: starting at 2016-12-14 19:16:50
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: running in local mode, generating exactly one partition.
Generator: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/root/bob/-topN/current
        at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
        at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:862)
        at org.apache.nutch.crawl.Generator.generate(Generator.java:589)
        at org.apache.nutch.crawl.Generator.run(Generator.java:764)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.nutch.crawl.Generator.main(Generator.java:717)

我遗漏了什么?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-12-15 17:03:13

您的命令中缺少参数,查看一下https://wiki.apache.org/nutch/bin/nutch%20generate,基本上您需要提供爬虫数据库和段子目录的路径。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/41150561

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档