Amazon EMR-4.5、Hadoop 2.7.2、Pig 0.14
结果似乎不合情理。示例:
tagfile-test.txt (制表符分隔)
AAA 123 2016
BBB 456 2016
CCC 789 2016加载-转储
test = LOAD 'tagfile-test.txt' USING PigStorage('\t','-tagFile') AS (f0, f1, f2, f3);
DUMP test;
(tagfile-test.txt,AAA,123,2016)
(tagfile-test.txt,BBB,456,2016)
(tagfile-test.txt,CCC,789,2016)正确-生成f0、f1、f2
test = LOAD 'tagfile-test.txt' USING PigStorage('\t','-tagFile') AS (f0, f1, f2, f3);
project = FOREACH test GENERATE f0, f1, f2;
DUMP project;
(tagfile-test.txt,AAA,123)
(tagfile-test.txt,BBB,456)
(tagfile-test.txt,CCC,789)错误-生成f0、f1、f3 (结果同上)
test = LOAD 'tagfile-test.txt' USING PigStorage('\t','-tagFile') AS (f0, f1, f2, f3);
project = FOREACH test GENERATE f0, f1, f3;
DUMP project;
(tagfile-test.txt,AAA,123)
(tagfile-test.txt,BBB,456)
(tagfile-test.txt,CCC,789)错误-生成f0、f2、f3 (确认)
test = LOAD 'tagfile-test.txt' USING PigStorage('\t','-tagFile') AS (f0, f1, f2, f3);
project = FOREACH test GENERATE f0, f2, f3;
DUMP project;
(tagfile-test.txt,AAA,2016)
(tagfile-test.txt,BBB,2016)
(tagfile-test.txt,CCC,2016)Pig似乎没有正确识别字段名称。我尝试使用现场位置($0,$1,$2,$3),结果相同。
发布于 2017-07-20 00:02:56
我在使用tagFile选项和pigstorage时遇到了同样的问题,通过在pig脚本中添加以下行解决了这个问题:
设置pig.optimizer.rules.disabled 'ColumnMapKeyPrune';
http://chimera.labs.oreilly.com/books/1234000001811/ch07.html#debugging_tips对ColumnMapKeyPrune进行了很好的解释
发布于 2016-04-13 04:13:15
看起来字段之间是用',‘分隔的,但是您在PigStorage.Also中使用了'\t’作为分隔符来指定字段的数据类型。
尝试更改此设置
test = LOAD 'tagfile-test.txt' USING PigStorage('\t','-tagFile') AS (f0, f1, f2, f3);至
test = LOAD 'tagfile-test.txt' USING PigStorage(',','-tagFile') AS (f0:chararray, f1:chararray, f2:int, f3:int);https://stackoverflow.com/questions/36582239
复制相似问题