如何编写猪查询以获取字段中存在值的计数?
例如:
字段A=字段B
20/ABC;
[21]XYZ;
25%null;
99 WER;
45%-零;
89 FOY;
所需O/P :A字段计数= 6,B字段计数=4
发布于 2015-01-12 07:32:54
Pig并不将上面的输入作为null (它基本上是一个chararray )来处理,因此(is null, is not null)之类的所有内置函数在这种情况下都不能工作。您需要对所有字段进行分组,筛选出空值并获取计数。你能试试下面的脚本吗?
输入
20|ABC;
21|XYZ;
25|null;
99|WER;
45|null;
89|FOY;PigScript:
A = LOAD 'input' USING PigStorage('|') AS (f1:int,f2:chararray);
B = GROUP A ALL;
C = FOREACH B {
filterNull = FILTER A BY (f2!='null;');
GENERATE COUNT(A.f1) AS fieldA, COUNT(filterNull.f2) AS fieldB;
}
DUMP C;输出:
(6,4)发布于 2015-01-12 07:51:12
请找到获得输出的步骤
fieldcount = load '/user/examples/stackoverflow/count.txt' using PigStorage('|') as (a:int, b:chararray);
fieldcount1 = FOREACH fieldcount GENERATE a, REPLACE(b,';','') as b;
fieldcount2 = GROUP fieldcount1 ALL;
fieldcount3 = FOREACH fieldcount2 {
a_cnt = FILTER fieldcount1 BY a is not null;
b_cnt = FILTER fieldcount1 BY b is not null and b != 'null' ;
GENERATE COUNT(a_cnt) as a_count, COUNT(b_cnt) as b_count;
}发布于 2015-06-23 18:17:09
请找出答案:-我的样本数据是
003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000
001 Harit Delhi India 20000
002 Hardy Agra India 20000
011 Banglore一切都被空间隔开了
守则如下:
A = load '/edata' using PigStorage(' ') as (eid:int,name:chararray,city:chararray,country:chararray,salary:int);
s = group A ALL ;
result = foreach s generate COUNT(A.eid),COUNT(A.name),COUNT(A.country),COUNT(A.salary);
dump result ;你会得到以下结果:-
(10,9,9,9)https://stackoverflow.com/questions/27871858
复制相似问题