我有下面两个hive表,分别称为a和b。我需要为它创建描述性的汇总统计信息。现在,我想计算汇总统计数据,如下所示:
预期输出
Sum of Amount Count Sum of Fraud Amount Count of Fraud
0-100 120 2 70 1
100-500 610 3 410 2
>500 1300 2 700 1其中我需要金额和计数按类别提到的数量,如0-100,100-500和>500。其次,我还需要欺诈金额的总和(其中欺诈= 1)和欺诈的计数。我需要左连接以获得欺诈列来计算它。
例如,类别0-100,金额的总和是120 (50+70),计数是2。欺诈金额的总和是70,其中欺诈是1。对于其他人,我也需要计算。
表a
ID Amount Date
1 110 01-01-2020
2 200 02-01-2020
3 50 03-01-2020
4 600 04-01-2020
5 700 05-01-2020
6 70 06-01-2020
7 300 07-01-2020表b
ID Fraud
1 1
2 0
3 0
4 0
5 1
6 1
7 0我的方法是获得总计数和金额和,但我需要分类,如0-100,100-500和>500
select sum(a.Amount), Count(*), count(b.Fraud)
from sample.data a
left join (select id, fraud from sample.label) b
on a.id = b.id
where date between "2020-01-01" and "2020-01-07"
group by fraud;发布于 2020-12-09 20:12:25
如果我理解正确的话,您只需要通过case表达式进行聚合:
select (case when d.amount <= 100 then '0-100'
when d.amount <= 500 then '101-500'
else '> 500'
end) as grp,
sum(d.Amount), Count(*), sum(l.Fraud)
from sample.data d left join
sample.label l
on a.id = l.id
where d.date between '2020-01-01' and '2020-01-07'
group by (case when d.amount <= 100 then '0-100'
when d.amount <= 500 then '101-500'
else '> 500'
end);https://stackoverflow.com/questions/65216488
复制相似问题