文章/答案/技术大牛

发布

社区首页 >问答首页 >在TSQL中对连续值计数一次

问在TSQL中对连续值计数一次
EN

Stack Overflow用户

提问于 2021-06-28 14:22:30

回答 4查看 107关注 0票数 0

我有一个表，里面有快递数据。我每天都有送货安排。我想统计一下失败的投递次数。如果连续几天投递失败，则计入1。例如，投递安排在2021年4月的每一天。4月15日交货失败。在那之后，从4月18日到4月20日，交货再次失败。虽然交付失败的天数是4天，但我希望将失败的天数计为2，因为连续失败的天数将被计为1。

DeliveryId     DeliveryDate     Status
1              2021-04-14       Success
2              2021-04-15       Failure
3              2021-04-16       Success
4              2021-04-17       Success
5              2021-04-18       Failure
6              2021-04-19       Failure
7              2021-04-20       Failure
8              2021-04-21       Success

我想要一个新的列，它将显示第一次失败的deliveryId，如下所示。

DeliveryId     DeliveryDate     Status     FailedDeliveryId
1              2021-04-14       Success     
2              2021-04-15       Failure     2
3              2021-04-16       Success     
4              2021-04-17       Success     
5              2021-04-18       Failure     5
6              2021-04-19       Failure     5
7              2021-04-20       Failure     5
8              2021-04-21       Success

我已经尝试了几个选项，但未能达到上述结果。我在查询中使用了LAG函数来查找以前的交付状态。但问题是，如果交付失败超过3或4天，那么我将不得不使用滞后功能，以3或4天后，并检查状态。我想让它成为一个动态查询。下面是我用过的

SELECT *, 
       CASE WHEN Status='Failure' AND Prev_Status='Success' THEN DeliveryId 
            WHEN Status='Failure' AND Prev_Status='Failure' THEN Prev_DeliveryId 
            END AS FailureInstance 
FROM (
         SELECT *, 
                LAG(Status,1) OVER(ORDER BY DeliveryDate ASC) Prev_Status,
                LAG(DeliveryId,1) OVER(ORDER BY DeliveryDate ASC) Prev_DeliveryId
         FROM   table1 
     ) A

tsql

sql

sql-server

回答 4

Stack Overflow用户

回答已采纳

发布于 2021-06-28 17:15:52

…

declare @t table(DeliveryId int, DeliveryDate date, Status varchar(10));

insert into @t(DeliveryId, DeliveryDate, Status)
values
(10, '2021-04-14', 'Success'),
(20, '2021-04-15', 'Failure'),
(30, '2021-04-16', 'Success'),
(40, '2021-04-17', 'Success'),
(70, '2021-04-18', 'Failure'),
(60, '2021-04-19', 'Failure'),
(50, '2021-04-20', 'Failure'),
(80, '2021-04-21', 'Success');


select *, 
case when Status='Failure' then min(grpDeliveryId) over(partition by grp) end as FailedDeliveryId,
case when Status='Failure' then datediff(minute, min(DeliveryDate) over(partition by grp), min(grpSuccessDate) over(partition by grp) ) end as MinutesDiffFailSuccess
from
(
select *, 
sum(addorcountme) over(order by DeliveryDate) as grp
from
(
select *, 
case when Status='Failure' and lag(Status) over(order by DeliveryDate)='Failure' then null else 1 end as addorcountme,
case when Status='Failure' and lag(Status) over(order by DeliveryDate)='Failure' then null else DeliveryId end as grpDeliveryId,
case when Status='Failure' and lead(Status) over(order by DeliveryDate)='Success' then lead(DeliveryDate) over(order by DeliveryDate) end as grpSuccessDate
from @t
) as t
) as g;



/*
select *, case when Status='Failure' then min(DeliveryId) over(partition by grp) end as FailedDeliveryId
from
(
select *, 
sum(sumorcountme) over(order by DeliveryDate) as grp
from
(
select *, case when Status='failure' and lag(Status) over(order by DeliveryDate)='Failure' then null else 1 end as sumorcountme
from @t
) as t
) as g; */

票数 1

Stack Overflow用户

发布于 2021-06-28 18:44:49

一种方法只使用窗口函数。您可以根据每行的成功次数为每组故障分配一个组。然后，对于每个故障组，只需获取故障的最小id：

select t.*,
       (case when status = 'Failure'
             then min(case when status = 'Failure' then DeliveryId end) over (partition by grp)
        end) as first_failureId
from (select t.*,
             sum(case when status = 'Success' then 1 else 0 end) over (order by DeliveryDate) as grp
      from t
     ) t;

Here是一个db<>fiddle。

实际上，稍微简单一点的版本以相反的顺序分配组，因此第一条记录是失败的，而不是成功的，因此min()中的case是不必要的：

select t.*,
       (case when status = 'Failure'
             then min(DeliveryId) over (partition by grp)
        end) as first_failureId
from (select t.*,
             sum(case when status = 'Success' then 1 else 0 end) over (order by DeliveryDate desc) as grp
      from t
     ) t
order by DeliveryId;

嗯。..。..。另一种方法使用lag()来检测状态何时发生变化。然后，仅对更改使用累积最大值：

select t.*,
       (case when status = 'Failure'
             then max(case when prev_status is null or prev_status <> status then DeliveryId end) over (order by DeliveryDate)
        end) as first_failureId
from (select t.*,
             lag(status) over (order by DeliveryDate) as prev_status
      from t
     ) t
order by DeliveryId;

票数 1

Stack Overflow用户

发布于 2021-06-28 16:32:09

这是一个“差距和岛屿”的问题。我最喜欢的文章是https://blog.jooq.org/2016/04/25/10-sql-tricks-that-you-didnt-think-were-possible/，技巧#4。

解析孤岛的关键是使用key (即date) - row_number，这将使用相同的编号对孤岛进行分组。结果与您的类似，但它可以在不使用lag的情况下处理任何数量的故障。您希望在此处仅使用故障：

select *,DeliveryDate-row_number() over (order by DeliveryId asc) as grp
from table1
where Status='Failure'

现在你有了这个，你可以在上面添加dense_rank() (order by grp)来获得失败的次数，也许还可以合并成功的结果：

;with cte as
(
    select *,DeliveryDate-row_number() over (order by DeliveryId asc) as grp
    from table1
    where Status='Failure'
)
    select 
        cte.*,dense_rank() over (order by grp) as FailureNum
    from cte
union all
    select *,null as grp,null as FailureNum
    from table1
    where Status='Success'

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/68158445

复制

相似问题

问在TSQL中对连续值计数一次
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在TSQL中对连续值计数一次EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在TSQL中对连续值计数一次
EN