我在雪花表中有一个基于键值的记录,其中对于给定的product_id,有几十个键值对记录。见下面的例子:

with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select * from t1;给定此数据模型,所需的是获取符合键值筛选条件的product_ids。
示例1:在key=(年级)有value=(高或中等)和key=(expense_cost)有value=(高)的情况下,获取所有product_id
示例2:获取key=(年级)有value=(高)和key=(maintenance_cost)有value=(高或中等)的所有product_id
我可以使用雪花PIVOT函数来解决这个需求,它首先将键值数据结构转换为列数据结构,然后使用WHERE子句应用Filter条件。有没有更好的方法来解决这个问题而不使用枢轴,例如使用一些窗口功能等?
基于枢轴的解决方案,例如1:
with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select * from (
select product_id, key, value
from t1
where key in ('grade','expense_cost','maintenance_cost')
) pivot(min(value) for key in ('grade','expense_cost','maintenance_cost'))
as p (product_id, grade, expense_cost, maintenance_cost)
where grade in ('high','medium')
and expense_cost in ('high');注意:如果使用Window函数,输出只能包含限定或传递筛选条件的行。输出不能包含限定分区中的所有记录。
发布于 2022-05-05 19:01:25
在下面添加我的解决方案,这是对@Lukasz的解决方案的改进,该解决方案有以下两个问题:
对于筛选标准中的每个属性,他的代码需要添加一个额外的窗口函数,从而线性地增加雪花计算时间。
有几十个属性需要从结果集中删除,并且为每个属性添加如此多的窗口函数将使SQL查询非常臃肿。
下面是我改进的代码:
with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select * ,
(CASE
WHEN key = 'grade' AND value IN ('medium', 'high') THEN TRUE
WHEN key = 'expense_cost' AND value = 'high' THEN TRUE
ELSE FALSE
END) AS is_allowed
from t1
WHERE key IN ('grade', 'expense_cost')
QUALIFY COUNT_IF(is_allowed=TRUE) OVER(PARTITION BY product_id) = 2;发布于 2022-04-22 14:13:15
要使用的模式是QUALIFY和COUNT_IF相结合。如有必要,可在下列情况下对其进行旋转:
“要求1:取所有等级为(高或中等)的product_id & expense_cost =高”
with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select *
from t1
qualify COUNT_IF(key='grade' AND value IN ('medium', 'high'))
OVER(PARTITION BY product_id) > 0
AND COUNT_IF(key = 'expense_cost' AND value = 'high')
OVER(PARTITION BY product_id) > 0;输出:

需求2:在key=(年级)有value=(高)和key=(maintenance_cost)有value=(高或中等)的情况下,获取所有的value=
with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select *
from t1
qualify COUNT_IF(key='grade' AND value IN ('high'))
OVER(PARTITION BY product_id) > 0
AND COUNT_IF(key = 'maintenance_cost' AND value IN ('medium','high'))
OVER(PARTITION BY product_id) > 0
AND COUNT_IF(key='expense_cost' AND value IN ('high'))
OVER(PARTITION BY product_id) = 0 -- explicitly excluding输出:

发布于 2022-04-22 16:59:17
我认为,如果您使用条件聚合子选择符合条件的product_ids,然后在符合条件的product_ids上筛选加上where子句中的条件,就会更容易、更自文档化。下面是需求2的演示,但您可以轻松地将其修改为#1。
with t (product_id, keys, value) as
(select 101, 'grade','high' union all
select 101, 'expense_cost','high' union all
select 101, 'maintenance_cost','medium' union all
select 102, 'grade','medium' union all
select 102, 'expense_cost','high' union all
select 103, 'expense_cost','high' union all
select 103, 'maintenance_cost','medium' ),
product_ids as
(select product_id
from t
group by product_id
having sum(case when keys='grade' and value ='high' then 1 end)>0 and
sum(case when keys='maintenance_cost' and value in ('high', 'medium') then 1 end)>0)
select *
from t
where product_id in (select product_id from product_ids) and
((keys='grade' and value ='high') or (keys='maintenance_cost' and value in ('high', 'medium')))https://stackoverflow.com/questions/71970128
复制相似问题