我的任务是编写一份不包括周末和假日的TAT报告。这篇文章的目的是确定我所做的解决方案是否可以更好地优化,通过改进或使用另一种选择。
当我最初着手这项任务时,我想加入一个data表,但这似乎有问题,因为如果我有这样的数据集:
create table #example (ID varchar(50), startdate datetime2, enddate datetime2)
insert into #example
values
('4CF662A3-57EE-446B-9FC3-3F105219F253', '2021-05-07 21:45:00.000', '2021-05-15 08:46:32.000')
,('8B58294D-C20E-4E87-8B9C-186EFB3DCF19', '2021-04-29 08:52:00.000', '2021-05-06 20:02:58.000')
,('6177A53C-B285-4087-9F03-BF3324992DEE', '2021-05-20 10:47:00.000', '2021-05-27 22:05:30.000')
,('26E3A73E-9D5A-472C-890A-57D51B038C79', '2021-05-04 20:24:00.000', '2021-05-12 09:11:47.000')
,('ECC7DB03-A2D9-4D03-AC8D-47B5204C5512', '2021-05-13 20:24:00.000', '2021-05-21 10:29:03.000')
,('0A673471-CA22-4AA5-8024-EEF7CEBA8A0D', '2021-05-10 20:24:00.000', '2021-05-18 11:03:41.000')
,('27ADB272-A9E0-42ED-AE3C-E42F051C60DF', '2021-05-03 20:06:00.000', '2021-05-11 10:59:29.000')
,('5B9D97EC-A994-45E3-BB18-E583C9702389', '2021-05-11 18:39:00.000', '2021-05-19 09:56:18.000')
,('DE88A793-4F1C-479D-BB28-4245D9D30606', '2021-05-10 20:07:00.000', '2021-05-18 11:48:17.000')
,('42962CBB-79AF-44CB-8EB9-565A75EEA565', '2021-04-26 21:35:00.000', '2021-05-04 13:16:25.000')
,('62A980BF-D6B5-47F3-A671-D7F3DE046747', '2021-05-20 18:08:00.000', '2021-05-28 10:06:56.000')
,('E1D89979-7D77-4D63-983E-8127A1CB81FA', '2021-05-18 20:28:00.000', '2021-05-26 12:27:32.000')
,('D3D923AC-21E0-401E-A9DB-7BE1BE87A6B1', '2021-05-13 19:56:00.000', '2021-05-21 11:55:22.000')
,('364F5BF9-55A3-430E-9F41-F3EB41D30633', '2021-05-13 19:58:00.000', '2021-05-21 12:09:43.000')
select * from #example drop table #example然后加入一个日期表,只需告诉我加入的日期是否是工作日、假日等等。
(注:我见过有人试图通过加入日期表来创建一个Business报告,但它们似乎不像预期的那样准确,或者有一些奇怪的问题。由于这些原因,我本人没有深入探讨这个选择。)
接下来,我使用一个循环查看了一个标量UDF。我的灵感主要来自于本文:https://anyonconsulting.com/business_intelligence/how-to-count-the-number-of-business-days-between-two-dates-which-exclude-weekends-and-holidays/。这是我最终开发的查询:
declare @DateFrom as datetime2 = '2021-05-17 19:06:00.000';
declare @DateTo as datetime2 = '2021-05-31 17:59:51.000';
-- Switches things around
if @DateFrom > @DateTo
begin
declare
@T datetime2 = @DateTo
, @F datetime2 = @DateFrom
select @DateFrom = @T
, @DateTo = @F
end
declare @temptable table (
CalendarDate datetime2 not null primary key
, DayID as (datepart(weekday,[CalendarDate]))
, isHoliday as
(case
-- New Year's Day, Observed 1st Day of Year
when month([CalendarDate]) = 1 and day([CalendarDate]) = 1 and datepart(weekday,[CalendarDate]) in (2,3,4,5,6) then 1 --New Year’s Day
when month([CalendarDate]) = 12 and day([CalendarDate]) = 31 and datepart(weekday,[CalendarDate]) =6 then 1 --New Year’s Day
when month([CalendarDate]) = 1 and day([CalendarDate]) = 2 and datepart(weekday,[CalendarDate]) =2 then 1 --New Year’s Day
-- Martin Luther King Jr. Day, Observed 3rd Monday of January
when month([CalendarDate]) = 1 and day([CalendarDate]) >= 15 and day([CalendarDate]) <= 21 and datepart(weekday,[CalendarDate]) = 2 then 1
-- Presidents' Day, Officially Washington’s Birthday, Observed 3rd Monday of February
when month([CalendarDate]) = 2 and day([CalendarDate]) >= 15 and day([CalendarDate]) <= 21 and datepart(weekday,[CalendarDate]) = 2 then 1
-- Memorial Day, Observed Last Monday of May
when month([CalendarDate]) = 5 and day([CalendarDate]) >= 25 and datepart(weekday,[CalendarDate]) = 2 then 1
-- Independence Day, July 4th
when month([CalendarDate]) = 7 and day([CalendarDate]) = 4 and datepart(weekday,[CalendarDate]) IN (2,3,4,5,6) then 1
when month([CalendarDate]) = 7 and day([CalendarDate]) = 3 and datepart(weekday,[CalendarDate]) =6 then 1
when month([CalendarDate]) = 7 and day([CalendarDate]) = 5 and datepart(weekday,[CalendarDate]) =2 then 1
-- Labor Day, Observed First Monday of September
when month([CalendarDate]) = 9 and day([CalendarDate]) <= 7 and datepart(weekday,[CalendarDate]) = 2 then 1
-- Thanksgiving Day, Observed Fourth Thursday of November
when month([CalendarDate]) = 11 and day([CalendarDate]) >= 22 and day([CalendarDate]) <= 28 AND datepart(weekday,[CalendarDate]) = 5 then 1
-- Christmas Day, Observed 25th of December
when month([CalendarDate]) = 12 and day([CalendarDate]) = 25 and datepart(weekday,[CalendarDate]) IN (2,3,4,5,6) then 1
when month([CalendarDate]) = 12 and day([CalendarDate]) = 24 and datepart(weekday,[CalendarDate]) =6 then 1
when month([CalendarDate]) = 12 and day([CalendarDate]) = 26 and datepart(weekday,[CalendarDate]) =2 then 1
else 0
end))
-- Fill the Calendar table
declare @D datetime2 = @DateFrom
while @D <= @DateTo
begin
insert into @temptable (CalendarDate)
select @D
set @D = dateadd(hour, 1, @D)
end
declare
@Count as int = 0
, @Date as datetime2 = @DateFrom
while
@Date <= @DateTo
begin
if ((datepart(weekday, @Date) in (1, 7))
or exists
(select *
from @temptable
where CalendarDate = @Date
and isHoliday = 1
and (DayID <> 7 or DayID <> 1)))
begin
select @Count = @Count + 1
end
select @Date = dateadd(hour, 1, @Date)
end
select convert(decimal(38,2) , (datediff(hour, @DateFrom, @DateTo) - (@Count)) / 24.0) as [TAT]这个脚本,当被放入一个函数中时,是相当准确的,但是性能很慢。在一组43,057行的测试中,它花费了1分19秒(让它执行得很好对我来说很重要,因为查询已经足够长了)。由于性能不佳,我开始考虑其他方法来做到这一点。我研究了一段时间,并对此表示赞赏,其中一个是:https://www.sqlservercentral.com/articles/calculating-work-days。但不幸的是,它对我来说还不够精确,因为我想要小数天的数字(所以我想要4.68,而不是5)。
然后,我开始考虑某种基于集合的函数,灵感来自于这个https://www.sqlservercentral.com/articles/the-joy-of-numbers以及Itzik的based查询,第216-218页关于填充数字表。在进行了一些实验之后,我得到了以下查询:
declare
@TotalTimeDiff int
, @low int
, @DateFrom datetime2
, @DateTo datetime2;
set @DateFrom = '2021-05-17 19:06:00.000';
set @DateTo = '2021-05-31 17:59:51.000';
set @low = 1;
set @TotalTimeDiff = (datediff(hour, @DateFrom, @DateTo))
;with
L0 as (select C from (values(1),(1)) as D(c)),
L1 as (select 1 as c from L0 as A cross join L0 as B),
L2 as (select 1 as c from L1 as A cross join L1 as B),
L3 as (select 1 as c from L2 as A cross join L2 as B),
L4 as (select 1 as c from L3 as A cross join L3 as B),
L5 as (select 1 as c from L4 as A cross join L4 as B), -- Up to 4,294,967,296 rows
Nums as (select ROW_NUMBER() over(order by (select null)) as rownum from L5),
Times as (select top (@TotalTimeDiff) @low + rownum - 1 as n from Nums)
select
[TAT] = isnull(((count(BusinessHours.n) - sum(BusinessHours.NonBusinessHours)) / 24.0), datediff(minute, @DateFrom, @DateTo) / 1440.0)
from (
select tms.n
, [NonBusinessHours] =
case
when datename(dw, dateadd(hour, tms.n, @DateFrom)) in ('Sunday', 'Saturday') then 1
-- New Year's Day, Observed 1st Day of Year
when month(dateadd(hour, tms.n, @DateFrom)) = 1 and day(dateadd(hour, tms.n, @DateFrom)) = 1 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) in (2,3,4,5,6) then 1 -- New Year’s Day
when month(dateadd(hour, tms.n, @DateFrom)) = 12 and day(dateadd(hour, tms.n, @DateFrom)) = 31 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) =6 then 1 --New Year’s Day
when month(dateadd(hour, tms.n, @DateFrom)) = 1 and day(dateadd(hour, tms.n, @DateFrom)) = 2 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) =2 then 1 --New Year’s Day
-- Martin Luther King Jr. Day, Observed 3rd Monday of January
when month(dateadd(hour, tms.n, @DateFrom)) = 1 and day(dateadd(hour, tms.n, @DateFrom)) >= 15 and day(dateadd(hour, tms.n, @DateFrom)) <= 21 and datepart(weekday,dateadd (hour, tms.n, @DateFrom)) = 2 then 1
-- Presidents' Day, Officially Washington’s Birthday, Observed 3rd Monday of February
when month(dateadd(hour, tms.n, @DateFrom)) = 2 and day(dateadd(hour, tms.n, @DateFrom)) >= 15 and day(dateadd(hour, tms.n, @DateFrom)) <= 21 and datepart(weekday,dateadd (hour, tms.n, @DateFrom)) = 2 then 1
-- Memorial Day, Observed Last Monday of May
when month(dateadd(hour, tms.n, @DateFrom)) = 5 and day(dateadd(hour, tms.n, @DateFrom)) >= 25 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) = 2 then 1
-- Independence Day, July 4th
when month(dateadd(hour, tms.n, @DateFrom)) = 7 and day(dateadd(hour, tms.n, @DateFrom)) = 4 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) IN (2,3,4,5,6) then 1
when month(dateadd(hour, tms.n, @DateFrom)) = 7 and day(dateadd(hour, tms.n, @DateFrom)) = 3 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) =6 then 1
when month(dateadd(hour, tms.n, @DateFrom)) = 7 and day(dateadd(hour, tms.n, @DateFrom)) = 5 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) =2 then 1
-- Labor Day, Observed First Monday of September
when month(dateadd(hour, tms.n, @DateFrom)) = 9 and day(dateadd(hour, tms.n, @DateFrom)) <= 7 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) = 2 then 1
-- Thanksgiving Day, Observed Fourth Thursday of November
when month(dateadd(hour, tms.n, @DateFrom)) = 11 and day(dateadd(hour, tms.n, @DateFrom)) >= 22 and day(dateadd(hour, tms.n, @DateFrom)) <= 28 AND datepart(weekday,dateadd (hour, tms.n, @DateFrom)) = 5 then 1
-- Christmas Day, Observed 25th of December
when month(dateadd(hour, tms.n, @DateFrom)) = 12 and day(dateadd(hour, tms.n, @DateFrom)) = 25 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) IN (2,3,4,5,6) then 1
when month(dateadd(hour, tms.n, @DateFrom)) = 12 and day(dateadd(hour, tms.n, @DateFrom)) = 24 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) =6 then 1
when month(dateadd(hour, tms.n, @DateFrom)) = 12 and day(dateadd(hour, tms.n, @DateFrom)) = 26 and datepart(weekday,dateadd(hour, tms.n, @DateFrom)) =2 then 1
else 0 end
from Times as tms) as [BusinessHours]此查询在转换为标量UDF后,在17秒钟内计算出43,057行的业务TAT,这是对循环查询的明显改进。(顺便说一句,这就是Server 2017。)
首先,我想知道最后一个解决方案是否是人们可以考虑的“基于集”的解决方案。它似乎比明显的过程/迭代循环更基于集合,但它不是根据数据集本身计算TAT,而是从表中逐项输入(@DateFrom和@DateTo)。我认为纯“基于集”的解决方案意味着使用从数据集本身到日期表的复杂连接逻辑。然而,这个函数是基于设置的,也就是说,一旦函数本身运行,它就不会经过一个循环,一个接一个地确定每一个小时,而是给出一个数字列表,在相关的日期/时间集中推入逐项列出的小时,以及每个小时是否是一个工作小时。
第二,有人知道有什么方法可以进一步改进吗?或者是另一种方法,以达到这种精确度(将两个日期之间的每一个小时划分为一个营业小时)?比这更快的东西?
谢谢。如果这是不恰当的发布(在这里看起来确实不那么具体),我会感谢重定向,所以我可以在那里张贴这个。我主要是尝试众包这个功能,看看它能走多远,以及把这个函数放在那里,因为它是有趣的收敛后,看看多个解决方案。
发布于 2021-07-12 17:51:06
多亏@Charlieface,我提出了一个非常精确的解决方案,只使用一个日期表。因为我只在生产数据上做了这个解决方案,所以我不会在这里发布,但这是可能的。话虽如此,我只能在一定程度上改善这种解决办法的问题。引起的问题是:
这个解决方案确实比我上面的第二个解决方案执行得更快(大约快5-7秒),但是,考虑到一些不准确的地方,我选择了上面的第二个解决方案。希望这篇文章能帮助那些希望根据自己的自定义业务逻辑计算业务TAT的人提供指导。
https://stackoverflow.com/questions/68135326
复制相似问题