我有七个表需要合并。每个表都有三个键和一个日期。我已经尝试了几种合并数据的方法,包括只将所有记录插入到单个表中,并运行重复数据消除T-SQL CTE语句,如下所示,由堆栈溢出的另一个成员给出:
;WITH CTE AS(
SELECT [key1], [key2], [key3], [date],
RN = ROW_NUMBER()OVER(PARTITION BY [key1], [key2], [key3] ORDER BY isnull([date], '19000101' desc)
FROM dbo.Table1 t1
WHERE EXISTS ( SELECT *
FROM dbo.Table1 t2
WHERE t1.key1=t2.key1
and t1.key2=t2.key2
and t1.key3=t2.key3
and t1.[date] IS NOT NULL
)
)
DELETE FROM CTE WHERE RN > 1我的问题是,每个表都有80Mil条记录,这会导致重复数据消除,从而使SQL Server不堪重负。
我的规则如下:
CASE 1: before dedupication:
key1 key2 key3 date
1 A 1 null
1 A 1 null
1 A 1 null
after deduplication:
key1 key2 key3 date
1 A 1 null
1 A 1 null
1 A 1 null
CASE 2: before dedupication:
key1 key2 key3 date
1 A 1 1/1/2016
1 A 1 1/1/2016
1 A 1 1/1/2016
after deduplication:
key1 key2 key3 date
1 A 1 1/1/2016
CASE 3: before dedupication:
key1 key2 key3 date
1 A 1 1/1/2016
1 A 1 1/2/2016
1 A 1 1/3/2016
after deduplication:
key1 key2 key3 date
1 A 1 1/3/2016
CASE 4: before deduplication
1 A 1 1/1/2016
1 A 1 1/1/2016
1 A 1 null
after deduplication:
key1 key2 key3 date
1 A 1 1/1/2016
CASE 5: before deduplication
1 A 1 1/1/2016
1 A 1 1/2/2016
1 A 1 null
after deduplication:
key1 key2 key3 date
1 A 1 1/2/2016正如您所看到的,这几乎是一个SCD2场景加上空值处理问题。
我的下一次尝试是使用重复数据删除功能将一个表合并到另一个表中,然后重复操作,直到完成所有操作。
我正在考虑合并解决方案,但无法理解如何对这么多记录执行此操作。我正在为SSIS游说,但无济于事。
我正在寻找任何可能的想法来执行这项任务。谢谢。
发布于 2016-09-05 04:37:02
我希望我正确地理解了你的目的。
很抱歉,格式无法保留,或者至少我无法保留。我建议您尝试在sql server管理器中保留该内容。
考虑到数据负载,我建议您考虑实施一些中间步骤,并将相关数据合并到临时表中。如果可能,还要考虑使用NoLock提示。
创建表#tmp1 (key1 int not null,key2 char(1) not null,key3 int not null,key4 date null)
create table #tmp2 (key1 int not null,key2字符(1) not null,key3 int not null,key4 date null)
创建表#tmp3 (key1 int not null,key2 char(1) not null,key3 int not null,key4 date null)
create table #tmp4 (key1 int not null,key2字符(1) not null,key3 int not null,key4 date null)
create table #tmp5 (key1 int not null,key2字符(1) not null,key3 int not null,key4 date null)
插入#tmp1值(1,'A',1,null),(1,'A',1,null),(1,'A',1,null)
插入#tmp2值(1,'A',1,'2016-01-01'),(1,'A',1,'2016-01-01'),(1,'A',1,'2016-01-01')
插入#tmp3值(1,'A',1,'2016-01-01'),(1,'A',1,'2016-01-02'),(1,'A',1,'2016-01-03')
插入#tmp4值(1,'A',1,'2016-01-01'),(1,'A',1,'2016-01-01'),(1,'A',1,null)
插入#tmp5值(1,'A',1,'2016-01-01'),(1,'A',1,'2016-01-02'),(1,'A',1,null)
**-第一种退货类型(我对您的要求表示怀疑)
选择key1、key2、key3、key4
从( select key1,key2,key3,key4 = max(key4) )
from #tmp1
where key4 is not null
group by key1, key2, key3
union all
select key1, key2, key3, key4 = max(key4)
from #tmp2
where key4 is not null
group by key1, key2, key3
union all
select key1, key2, key3, key4 = max(key4)
from #tmp3
where key4 is not null
group by key1, key2, key3
union all
select key1, key2, key3, key4 = max(key4)
from #tmp4
where key4 is not null
group by key1, key2, key3
union all
select key1, key2, key3, key4 = max(key4)
from #tmp5
where key4 is not null
group by key1, key2, key3
-- nulls, do not deduplicate
union all
select a.key1, a.key2, a.key3, a.key4
from #tmp1 a
left join #tmp1 b on a.key1 = b.key1 and a.key2 = b.key2 and a.key3 = b.key3 and b.key4 is not null
where a.key4 is null and b.key4 is null
union all
select a.key1, a.key2, a.key3, a.key4
from #tmp2 a
left join #tmp2 b on a.key1 = b.key1 and a.key2 = b.key2 and a.key3 = b.key3 and b.key4 is not null
where a.key4 is null and b.key4 is null
union all
select a.key1, a.key2, a.key3, a.key4
from #tmp3 a
left join #tmp3 b on a.key1 = b.key1 and a.key2 = b.key2 and a.key3 = b.key3 and b.key4 is not null
where a.key4 is null and b.key4 is null
union all
select a.key1, a.key2, a.key3, a.key4
from #tmp4 a
left join #tmp4 b on a.key1 = b.key1 and a.key2 = b.key2 and a.key3 = b.key3 and b.key4 is not null
where a.key4 is null and b.key4 is null
union all
select a.key1, a.key2, a.key3, a.key4
from #tmp5 a
left join #tmp5 b on a.key1 = b.key1 and a.key2 = b.key2 and a.key3 = b.key3 and b.key4 is not null
where a.key4 is null and b.key4 is null
) a**-第二种退货类型(我对您的要求表示怀疑)
select key1,key2,key3,key4 = max( key4 ) from ( select key1,key2,key3,key4 from #tmp1 where key4 is not null union all select key1,key2,key3,key4 from #tmp2 where key4 is not null union all select key1,key2,key3,key4 from #tmp3 where key4 is not null union all select key1,key2,key3,key4 from #tmp4 where key4 is not null union all select key1,key2,key3,key4 from #tmp5 where key4不为null )非null group by key1,key2,key3 union all -- null,不删除重复数据select a.key1,a.key2,a.key3,a.key4 from #tmp1 a左连接#tmp1 b on a.key1 = b.key1和a.key2 = b.key2,a.key3 = b.key3和b.key4 is not null其中a.key4为null,b.key4为null联合all select a.key1,a.key2,a.key3,a.key4从#tmp2 a左连接#tmp2 b on a.key1 = b.key1和a.key2 = b.key2以及a.key3 = b.key3和b.key4 is not null其中a.key4为null,b.key4为null联合all select a.key1,a.key2,a.key3,a.key4
从#tmp3a左连接#tmp3b on a.key1 = b.key1和a.key2 = b.key2以及a.key3 = b.key3和b.key4 is not null其中a.key4为null而b.key4为null联合所有选择a.key1,a.key2,a.key3,从#tmp4 a左连接#tmp4 b on a.key1 = b.key1 a.key4 a.key2 = b.key2和a.key3 = b.key3 and b.key4 is not null where a.key4 is null和b.key4 is null联合all select a.key1,a.key2,a.key3,从#tmp5 a左连接#tmp5 b on a.key1 = b.key1,a.key2 = b.key2和a.key3 = b.key3,b.key4不为null,其中a.key4为null,b.key4为null
https://stackoverflow.com/questions/39309198
复制相似问题