我有一个表格-供应网络,包括四栏:
CustomerID,SupplierID,Supplier_productID,Purchase_Year
。
我想建立一个客户对,其中两个客户购买同一产品从同一供应商在一个重点年。我用self-join在BigQuery.But中做这件事--太慢了。有别的选择吗?
select distinct
a.CustomerID as focal_CustomerID,
b.CustomerID as linked_CustomerID,
a.Purchase_Year,
a.Supplier_productID
from
supplynetwork as a,
supplynetwork as b
where
a.CustomerID<>b.CustomerID and
a.Purchase_Year=b.Purchase_Year and
a.Supplier_productID=b.Supplier_productID and
a.SupplierID=b.SupplierID发布于 2019-08-21 08:30:31
使用联接语法并执行索引CustomerID列
select distinct
a.CustomerID as focal_CustomerID,
b.CustomerID as linked_CustomerID,
a.Purchase_Year,
a.Supplier_productID
from
supplynetwork as a join
supplynetwork as b
on
a.Purchase_Year=b.Purchase_Year and
a.Supplier_productID=b.Supplier_productID and
a.SupplierID=b.SupplierID
where a.CustomerID<>b.CustomerID 发布于 2019-08-21 12:02:46
您可以使用聚合来获取满足单行条件的所有客户:
select Purchase_Year, Supplier_productID, SupplierID,
array_agg(distinct CustomerID) as customers
from supplynetwork sn
group by Purchase_Year, Supplier_productID, SupplierID;然后可以使用数组操作获得对:
with pss as (
select Purchase_Year, Supplier_productID, SupplierID,
array_agg(distinct CustomerID) as customers
from supplynetwork sn
group by Purchase_Year, Supplier_productID, SupplierID
)
select c1, c2, pss.*
from pss cross join
unnest(pss.customers) c1 cross join
unnest(pss.customers) c2
where c1 < c2;发布于 2019-08-21 15:24:22
您可以使用CROSS JOIN,这可能会给您带来简单的好处(即使使用笛卡儿)。试试下面的查询,看看它是否比基线更便宜:
select
focal_CustomerID,
linked_CustomerID,
Purchase_Year,
Supplier_ProductID
from (
select
SupplierID,
Supplier_ProductID,
Purchase_Year,
array_agg(distinct CustomerID) as Customers
from `mydataset.mytable`
group by 1,2,3
), unnest(Customers) focal_CustomerID
cross join unnest(Customers) linked_CustomerID
where focal_CustomerID != linked_CustomerIDhttps://stackoverflow.com/questions/57587421
复制相似问题