我想用Rubydoop在维基百科历史转储XML文件上运行Hadoop作业。到目前为止,我成功地加载了⁹的⁹类,并将其映射到一个Ruby中:
module Cloud9
require 'java'
require File.expand_path('../../cloud9-1.5.0.jar', __FILE__)
require File.expand_path('../../hadoop-core-1.2.1.jar', __FILE__)
require File.expand_path('../../commons-logging-1.1.1.jar', __FILE__)
java_import 'edu.umd.cloud9.collection.XMLInputFormat'
end
module Wikipedia
class XmlInputFormat < ::Cloud9::XMLInputFormat
end
end并将XmlInputFormat添加到Rubydoop配置作业块中:
input input_path, format: Wikipedia::XmlInputFormat在运行作业时,在启动<page>和</page>标记的拆分过程后,我会得到以下错误:
java.lang.Exception: java.lang.IncompatibleClassChangeError:
Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at edu.umd.cloud9.collection.XMLInputFormat$XMLRecordReader.initialize(XMLInputFormat.java:102)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:521)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)我正在使用cloud9-1.5.0.jar和Rubydoop1.1.0在本地运行Hadoop2.1.2。
所以问题是:这是因为hadoop版本不兼容(旧的/新的Hadoop )吗?云⁹和Rubydoop使用还是本地使用?怎么能修好呢?
发布于 2013-12-16 13:28:07
这是Hadoop1.2.1和Cloud版本1.5.0之间的一个不兼容错误,因为更高版本的Hadoop (2.x)使用TaskAttemptContext接口而不是类。
它现在对我适用于cloud9-1.4.0.jar.
https://stackoverflow.com/questions/20572224
复制相似问题