我有一个样本表如下所示:
caller receiver
100 200
100 300
400 100
100 200我需要找到每个号码的唯一连接数。对于例: 100将有200,300和400这样的连接。
我的产出应该是:
100 3
200 1
300 1
400 1我正在用蜂巢来尝试这个。如果蜂巢做不到,那就可以用猪来做吗?
发布于 2015-06-10 07:42:49
这会解决你的问题。
select q1.caller,count(distinct(q1.receiver)) from
(select caller , receiver from test_1 group by caller,receiver
union all
select receiver as caller,caller as receiver from test_1 group by receiver,caller)q1 group by q1.caller;发布于 2015-06-10 03:23:53
下面是实现所需的方法(虽然我并不完全相信它是最优的,但我会让您来进行优化)。您将需要这个罐子,它非常直接地如何构建。
查询:
add jar ./brickhouse-0.7.1.jar; -- name and path of yours will be different
create temporary function combine_unique as 'brickhouse.udf.collect.CombineUniqueUDAF';
select connection
, size(combine_unique(arr)) c
from (
select connection, arr
from (
select caller as connection
, collect_set(receiver) arr
from some_table
group by caller ) x
union all
select connection, arr
from (
select receiver as connection
, collect_set(caller) arr
from some_table
group by receiver ) y ) f
group by connection输出:
connection c
100 3
200 1
300 1
400 1https://stackoverflow.com/questions/30745223
复制相似问题