首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >雪花查询带有筛选条件的分区

雪花查询带有筛选条件的分区
EN

Stack Overflow用户
提问于 2022-04-22 14:08:55
回答 3查看 1.5K关注 0票数 0

我在雪花表中有一个基于键值的记录,其中对于给定的product_id,有几十个键值对记录。见下面的例子:

代码语言:javascript
复制
with t1 (product_id, key, value) as 
    (
    select 101, 'grade',                'high' union all
    select 101, 'expense_cost',     'high' union all
    select 101, 'maintenance_cost', 'medium' union all
    select 102, 'grade',                'medium' union all
    select 102, 'expense_cost',     'high' union all
    select 103, 'expense_cost',     'high' union all
    select 103, 'maintenance_cost', 'medium' 
    )
    select * from t1;

给定此数据模型,所需的是获取符合键值筛选条件的product_ids。

示例1:在key=(年级)有value=(高或中等)和key=(expense_cost)有value=(高)的情况下,获取所有product_id

示例2:获取key=(年级)有value=(高)和key=(maintenance_cost)有value=(高或中等)的所有product_id

我可以使用雪花PIVOT函数来解决这个需求,它首先将键值数据结构转换为列数据结构,然后使用WHERE子句应用Filter条件。有没有更好的方法来解决这个问题而不使用枢轴,例如使用一些窗口功能等?

基于枢轴的解决方案,例如1:

代码语言:javascript
复制
with t1 (product_id, key, value) as 
(
select 101, 'grade',                'high' union all
select 101, 'expense_cost',     'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade',                'medium' union all
select 102, 'expense_cost',     'high' union all
select 103, 'expense_cost',     'high' union all
select 103, 'maintenance_cost', 'medium' 
)
select * from (
select product_id, key, value 
  from t1 
  where key in ('grade','expense_cost','maintenance_cost')
) pivot(min(value) for key in ('grade','expense_cost','maintenance_cost'))
as p (product_id, grade, expense_cost, maintenance_cost)
where grade in ('high','medium')
and expense_cost in ('high');

注意:如果使用Window函数,输出只能包含限定或传递筛选条件的行。输出不能包含限定分区中的所有记录。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2022-05-05 19:01:25

在下面添加我的解决方案,这是对@Lukasz的解决方案的改进,该解决方案有以下两个问题:

对于筛选标准中的每个属性,他的代码需要添加一个额外的窗口函数,从而线性地增加雪花计算时间。

有几十个属性需要从结果集中删除,并且为每个属性添加如此多的窗口函数将使SQL查询非常臃肿。

下面是我改进的代码:

代码语言:javascript
复制
with t1 (product_id, key, value) as 
(
select 101, 'grade',                'high' union all
select 101, 'expense_cost',     'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade',                'medium' union all
select 102, 'expense_cost',     'high' union all
select 103, 'expense_cost',     'high' union all
select 103, 'maintenance_cost', 'medium' 
)
select * , 
 (CASE 
      WHEN key = 'grade' AND value IN ('medium', 'high') THEN TRUE
      WHEN key = 'expense_cost' AND value = 'high' THEN TRUE
      ELSE FALSE
  END) AS is_allowed
from t1
WHERE key IN ('grade', 'expense_cost')
QUALIFY COUNT_IF(is_allowed=TRUE) OVER(PARTITION BY product_id) = 2;
票数 0
EN

Stack Overflow用户

发布于 2022-04-22 14:13:15

要使用的模式是QUALIFYCOUNT_IF相结合。如有必要,可在下列情况下对其进行旋转:

“要求1:取所有等级为(高或中等)的product_id & expense_cost =高”

代码语言:javascript
复制
with t1 (product_id, key, value) as 
    (
    select 101, 'grade',                'high' union all
    select 101, 'expense_cost',     'high' union all
    select 101, 'maintenance_cost', 'medium' union all
    select 102, 'grade',                'medium' union all
    select 102, 'expense_cost',     'high' union all
    select 103, 'expense_cost',     'high' union all
    select 103, 'maintenance_cost', 'medium' 
    )
select * 
from t1
qualify COUNT_IF(key='grade' AND value IN ('medium', 'high')) 
            OVER(PARTITION BY product_id) > 0
    AND COUNT_IF(key = 'expense_cost' AND value = 'high') 
            OVER(PARTITION BY product_id) > 0;

输出:

需求2:在key=(年级)有value=(高)和key=(maintenance_cost)有value=(高或中等)的情况下,获取所有的value=

代码语言:javascript
复制
with t1 (product_id, key, value) as 
    (
    select 101, 'grade',                'high' union all
    select 101, 'expense_cost',     'high' union all
    select 101, 'maintenance_cost', 'medium' union all
    select 102, 'grade',                'medium' union all
    select 102, 'expense_cost',     'high' union all
    select 103, 'expense_cost',     'high' union all
    select 103, 'maintenance_cost', 'medium' 
    )
select * 
from t1
qualify COUNT_IF(key='grade' AND value IN ('high')) 
            OVER(PARTITION BY product_id) > 0
    AND COUNT_IF(key = 'maintenance_cost' AND value IN ('medium','high'))
            OVER(PARTITION BY product_id) > 0
    AND COUNT_IF(key='expense_cost' AND value IN ('high')) 
           OVER(PARTITION BY product_id) = 0  -- explicitly excluding

输出:

票数 1
EN

Stack Overflow用户

发布于 2022-04-22 16:59:17

我认为,如果您使用条件聚合子选择符合条件的product_ids,然后在符合条件的product_ids上筛选加上where子句中的条件,就会更容易、更自文档化。下面是需求2的演示,但您可以轻松地将其修改为#1。

代码语言:javascript
复制
with t (product_id, keys, value) as 

(select 101, 'grade','high' union all
 select 101, 'expense_cost','high' union all
 select 101, 'maintenance_cost','medium' union all
 select 102, 'grade','medium' union all
 select 102, 'expense_cost','high' union all
 select 103, 'expense_cost','high' union all
 select 103, 'maintenance_cost','medium' ),

product_ids as

(select product_id
 from t
 group by product_id
 having sum(case when keys='grade' and value ='high' then 1 end)>0 and
        sum(case when keys='maintenance_cost' and value in ('high', 'medium') then 1 end)>0)
                            
select * 
from t
where product_id in (select product_id from product_ids) and
    ((keys='grade' and value ='high') or (keys='maintenance_cost' and value in ('high', 'medium')))
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71970128

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档