文章/答案/技术大牛

发布

社区首页 >问答首页 >使用MapReduce作业进行HBase大容量删除

问使用MapReduce作业进行HBase大容量删除
EN

Stack Overflow用户

提问于 2014-04-24 06:49:29

回答 2查看 2.8K关注 0票数 3

使用mapreduce作业，我正在尝试从Hbase表中删除行。

我得到了以下错误。

java.lang.ClassCastException: org.apache.hadoop.hbase.client.Delete cannot be cast to org.apache.hadoop.hbase.KeyValue
        at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.write(HFileOutputFormat.java:124)
        at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:551)
        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
        at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
        at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:144)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.

看起来这是由configureIncrementalLoad设置为KeyValue的输出引起的。它只有PutSortReducer和KeyValueSortReducer，没有DeleteSortReducer。

我的代码：

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class DeleteRows extends Configured implements Tool {

    public static class Map extends
            Mapper<LongWritable, Text, ImmutableBytesWritable, Delete> {

        ImmutableBytesWritable hKey = new ImmutableBytesWritable();
        Delete delRow;

        @Override
        protected void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            hKey.set(value.getBytes());
            delRow = new Delete(hKey.get());
            context.write(hKey, delRow);
            // Update counters
            context.getCounter("RowsDeleted", "Success").increment(1);
        }
    }


    @SuppressWarnings("deprecation")
    public int run(String[] args) throws Exception {
        Configuration conf = new Configuration();
        args = new GenericOptionsParser(conf, args).getRemainingArgs();
        HBaseConfiguration.addHbaseResources(conf);

        Job job = new Job(conf, "Delete stuff!");
        job.setJarByClass(DeleteRows.class);

        job.setMapperClass(Map.class);
        job.setMapOutputKeyClass(ImmutableBytesWritable.class);
        job.setMapOutputValueClass(Delete.class);

        job.setInputFormatClass(TextInputFormat.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));

        HTable hTable = new HTable(args[2]);
        // Auto configure partitioner and reducer
        HFileOutputFormat.configureIncrementalLoad(job, hTable);
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);
        return (0);
    }

    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new DeleteRows(), args);
        System.exit(exitCode);
    }
}

有没有更好/更快的方法来使用行键删除大量的行？显然，删除映射器中的每一行都是可能的，但是我认为这比将删除批量推送到正确的区域服务器要慢。

hbase

scalability

java

hadoop

mapreduce

回答 2

Stack Overflow用户

发布于 2014-04-26 07:26:18

您的目标是在内部生成带有Delete流的HFile (实际上删除标记为KeyValue)。这样做的标准方法是使用HFileOutputFormat。实际上，您只能将KeyValue更改流放入此格式，并且有两个标准的缩减程序：PutSortReducer和KeyValueSortReducer。将reduce任务的数量设置为0实际上是将所有Delete直接传递到输出格式，这当然不起作用。

你最明显的选择是：

添加您的reducer DeleteSortReducer。这样的缩减器非常简单，你几乎可以复制。您只需要从删除中提取单个KeyValue流并对其进行排序。PutSortReducer就是一个很好的例子。Put更改不会排序，所以这就是为什么需要这样的reducer。
不是构造Delete流，而是构造包含删除标记的适当KeyValue流。这对速度来说可能是最好的办法。

票数 2

Stack Overflow用户

发布于 2014-04-25 13:32:19

事实证明，通过使用TableMapReduceUtil.initTableReducerJob而不是HFileOutputFormat.configureIncrementalLoad来设置缩减程序，代码可以很好地工作。

TableMapReduceUtil.initTableReducerJob(tableName, null, job);
job.setNumReduceTasks(0);

但是，这仍然不会为completebulkload实用程序创建删除。它只是执行删除RPC。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/23256629

复制

相似问题

问使用MapReduce作业进行HBase大容量删除
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用MapReduce作业进行HBase大容量删除EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用MapReduce作业进行HBase大容量删除
EN