我有两个表产品表和类别表。下面是category表的外观-
Product_ID Category sub-category
123 Furniture Bookcases
123 Furniture Chairs
456 Office supplies Labels
456 Office supplies Binders
456 Office supplies Binders产品表如下所示-
Product_ID Quantities_sold
123 300
456 400当我内部连接产品和类别表时,我看到许多重复项,因为有不同的子类别,如下所示-
Product_ID Category sub-category Quantities_sold
123 Furniture Bookcases 300
123 Furniture Chairs 300
456 Office supplies Labels 400
456 Office supplies Binders 400
456 Office supplies Binders 400预期的结果将类似于-
Product_ID Category sub-category Quantities_sold
123 Furniture Bookcases 300
Chairs
456 Office supplies Labels 400
Binders 有没有办法既有所有子类别又没有重复记录?
注意-我有一个巨大的数据集,有数百万条记录,26个类别和135个子类别。
发布于 2020-02-08 05:32:57
下面是针对BigQuery标准SQL的说明
#standardSQL
SELECT Product_ID, Category, SubCategories, Quantities_sold
FROM `project.dataset.product` p
LEFT JOIN (
SELECT Product_ID, Category, STRING_AGG(DISTINCT SubCategory ) SubCategories
FROM `project.dataset.category`
GROUP BY Product_ID, Category
) c
USING (Product_ID) 如果要应用于问题结果中的样本数据,则
Row Product_ID Category SubCategories Quantities_sold
1 123 Furniture Bookcases,Chairs 300
2 456 Office supplies Labels,Binders 400 或者使用ARRAY_AGG而不是STRING_AGG来获取数组形式的子类别,如下例所示
#standardSQL
SELECT Product_ID, Category, SubCategories, Quantities_sold
FROM `project.dataset.product` p
LEFT JOIN (
SELECT Product_ID, Category, ARRAY_AGG(DISTINCT SubCategory ) SubCategories
FROM `project.dataset.category`
GROUP BY Product_ID, Category
) c
USING (Product_ID) 在这种情况下,结果将如下所示
Row Product_ID Category SubCategories Quantities_sold
1 123 Furniture Bookcases 300
Chairs
2 456 Office supplies Labels 400
Binders 发布于 2020-02-08 05:40:52
您可以使用SELECT DISTINCT,它不会向您显示重复记录,如下所示:
SELECT DISTINCT Product_ID, Category, sub-category, Quantities_sold
FROM product, category
INNER JOIN category ON category.Product_ID = product.Product_IDhttps://stackoverflow.com/questions/60121252
复制相似问题