我有两个桌子,我需要加入日期和id。第一个表包含date、id、name列。每个名称与几个ids相关联。数据如下:
date id name
7/11 1 A
7/11 1 A
7/11 1 A
7/11 1 A
7/11 1 A
7/11 2 A
7/11 2 A
7/11 2 A
7/11 2 A
7/11 2 A另一个表有Date、id、shares。它没有与id关联的名称。这张桌子是这样的:
date id shares
7/11 1 5
7/11 2 4最终目标是获取每个名称的共享的总和,或者更确切地说,获取与名称关联的ids列表的总和。以下是代码:
SELECT t1.date, t1.name,
COALESCE(SUM(t2.shares), 0) shares
FROM table1 t1 LEFT JOIN table2 t2
ON t2.date = t1.date AND t2.id = t1.id
GROUP BY t1.date, t1.name这是非常好的工作,但因为table_1列出相同的id 5倍,之和是5倍,它应该是。因此,我只需要从联接中的表1中获取第一行。所需的输出如下:
date name shares
7/11 A 9发布于 2021-08-05 12:48:12
我认为您应该修复您的数据模型,以便没有重复。一种选择是在加入之前删除重复项:
SELECT t1.date, t1.name,
COALESCE(SUM(t2.shares), 0) as shares
FROM (SELECT DISTINCT t1.date, t1.id, t1.name
FROM table1 t1
) t1 LEFT JOIN
table2 t2
ON t2.date = t1.date AND t2.id = t1.id
GROUP BY t1.date, t1.name发布于 2021-08-05 19:37:50
如果无法修复基础数据以删除重复数据,那么使用CTE(或子查询)可能是个好主意。
with
table_a as (select * from `project.dataset.table_a`),
table_b as (select * from `project.dataset.table_b`),
deduped_a as (select distinct date, id, name from table_a)
select
date,
name,
sum(coalesce(shares,0)) as shares
from deduped_a
left join table_b using(id, date)
group by 1,2https://stackoverflow.com/questions/68666837
复制相似问题