首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >无法用Pig编写SequenceFile

无法用Pig编写SequenceFile
EN

Stack Overflow用户
提问于 2014-10-28 14:38:56
回答 1查看 748关注 0票数 0

我想将一些Pig变量存储到Hadoop SequenceFile中,以便运行外部MapReduce作业。

假设我的数据具有(chararray,int)模式:

代码语言:javascript
复制
(hello,1)
(test,2)
(example,3)

我编写了这个存储函数:

代码语言:javascript
复制
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.OutputFormat;
import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.pig.StoreFunc;
import org.apache.pig.data.Tuple;


public class StoreTest extends StoreFunc {

    private String storeLocation;
    private RecordWriter writer;
    private Job job;

    public StoreTest(){

    }

    @Override
    public OutputFormat getOutputFormat() throws IOException {
        //return new TextOutputFormat();
        return new SequenceFileOutputFormat();
    }

    @Override
    public void setStoreLocation(String location, Job job) throws IOException {
        this.storeLocation = location;
        this.job = job;
        System.out.println("Load location is " + storeLocation);
        FileOutputFormat.setOutputPath(job, new Path(location));        
        System.out.println("Out path " + FileOutputFormat.getOutputPath(job));
    }

    @Override
    public void prepareToWrite(RecordWriter writer) throws IOException {
        this.writer = writer;
    }

    @Override
    public void putNext(Tuple tuple) throws IOException {
        try {
            Text k = new Text(((String)tuple.get(0)));
            IntWritable v = new IntWritable((Integer)tuple.get(1));
            writer.write(k, v);
        } catch (InterruptedException ex) {
            Logger.getLogger(StoreTest.class.getName()).log(Level.SEVERE, null, ex);
        }

    }
}

这个猪圈代码:

代码语言:javascript
复制
register MyUDFs.jar;
x = load '/user/pinoli/input' as (a:chararray,b:int);
store x into '/user/pinoli/output/' using StoreTest(); 

但是,存储失败了,我得到了以下错误:

代码语言:javascript
复制
ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.io.IOException: wrong key class: org.apache.hadoop.io.Text is not class org.apache.hadoop.io.LongWritable

有办法解决这个问题吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-10-28 15:29:23

问题是您没有设置输出键/值类。您可以在setStoreLocation()方法中这样做:

代码语言:javascript
复制
@Override
public void setStoreLocation(String location, Job job) throws IOException {
    this.storeLocation = location;
    this.job = job;
    this.job.setOutputKeyClass(Text.class);   // !!!
    this.job.setOutputValueClass(IntWritable.class);  // !!!
    ...

}

我想你想用不同的键/值类型来使用你的固定键。在这种情况下,可以将它们的类型传递给构造函数。例如:

代码语言:javascript
复制
private Class<? extends WritableComparable> keyClass;
private Class<? extends Writable> valueClass;
...

public StoreTest() {...}

@SuppressWarnings({ "unchecked", "rawtypes" })
    public StoreTest(String keyClass, String valueClass) {
        try {
            this.keyClass = (Class<? extends WritableComparable>) Class.forName(keyClass);
            this.valueClass = (Class<? extends Writable>) Class.forName(valueClass);
        }
        catch (Exception e) {
            throw new RuntimeException("Invalid key/value type", e);
        }
    }

...

 @Override
    public void setStoreLocation(String location, Job job) throws IOException {
        this.storeLocation = location;
        this.job = job;
        this.job.setOutputKeyClass(keyClass);
        this.job.setOutputValueClass(valueClass);
        ...
    }

然后,在Pig脚本中设置正确的类型:

代码语言:javascript
复制
register MyUDFs.jar;
DEFINE myStorer StoreTest('org.apache.hadoop.io.Text', 'org.apache.hadoop.io.IntWritable');
x = load '/user/pinoli/input' as (a:chararray,b:int);
store x into '/user/pinoli/output/' using myStorer(); 
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/26611113

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档