我无法使用下面的代码使用trip_id生成SUM(km):
twa = LOAD 'hdfs://localhost:54310/sImport_20170508100625/t_waypoint_actual.txt' USING PigStorage('|') as
(id:int, trip_id:chararray, address_id:int, timestamp_utc:chararray, driver_id:int, ETA:chararray,event_id:int, imei_number:chararray, vehicle_imei_id:int,
km:double, avg_speed:double, duration:chararray, signal_strength:float, battery_strength:float, event_type:chararray);
twa_group = GROUP twa BY (id,trip_id,km);
twa_foreach = FOREACH twa_group GENERATE FLATTEN(group), twa.trip_id AS trip_id, (SUM(twa.km)) AS km;
twa_filter = FILTER twa_foreach BY (trip_id == '466');
DUMP twa_filter;错误:
In alias twa_filter, incompatible types in Equal Operator left hand side:bag :tuple(trip_id:chararray) right hand side:chararray我尝试了几种方法,但没有输出。有人能给我建议正确的解决办法吗。提前谢谢。
Input:
id,trip_id,km
1,466,1.4
2,466,2.3
Expected Output:
trip_id,km
466,3.7发布于 2017-05-31 08:57:05
当您从分组数据中选择列时,结果总是一个包,但是当您按该列分组时,您可以从组键中选择它。
twa_foreach = FOREACH twa_group GENERATE group.id as id, groum.km as km,
group.trip_id AS trip_id, (SUM(twa.km)) AS km;
twa_filter = FILTER twa_foreach BY (trip_id == '466');如果您需要使用不在键中的列,则需要使用limit 1 + flatten。
好吧,检查了一下你的代码。看起来您想要获得每个id, trip_id对的km之和。假设它是cat testdata/7.csv
1|456|2.5|somedata1
2|466|2.7|somedata2
2|466|2.7|somedata2
4|456|2.8|somedata3
4|456|2.9|somedata4
4|456|2.9|somedata4
5|466|2.5|somedata5
5|466|2.5|somedata5猪的剧本
twa = LOAD 'testdata/7.csv' USING PigStorage('|') as
(id:int, trip_id:chararray, km:double, event_type:chararray);
twa_group = GROUP twa BY (trip_id);
twa_foreach = FOREACH twa_group GENERATE group AS trip_id, (SUM(twa.km)) AS km;
twa_filter = FILTER twa_foreach BY (trip_id == '466');
DUMP twa_filter;结果是
(466,10.4)如果这对你不管用-你做错了。还可以考虑在分组之前进行过滤,因为“组操作成本很高”
https://stackoverflow.com/questions/44278191
复制相似问题