首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >合并非常大的表

合并非常大的表
EN

Stack Overflow用户
提问于 2016-09-04 00:44:43
回答 1查看 35关注 0票数 1

我有七个表需要合并。每个表都有三个键和一个日期。我已经尝试了几种合并数据的方法,包括只将所有记录插入到单个表中,并运行重复数据消除T-SQL CTE语句,如下所示,由堆栈溢出的另一个成员给出:

代码语言:javascript
复制
 ;WITH CTE AS(
        SELECT [key1], [key2], [key3], [date],
           RN = ROW_NUMBER()OVER(PARTITION BY [key1], [key2], [key3] ORDER BY isnull([date], '19000101' desc)
        FROM dbo.Table1 t1
        WHERE EXISTS ( SELECT * 
                       FROM dbo.Table1 t2 
                       WHERE t1.key1=t2.key1 
                         and t1.key2=t2.key2 
                         and t1.key3=t2.key3 
                         and t1.[date] IS NOT NULL
                     )
    )
    DELETE FROM CTE WHERE RN > 1

我的问题是,每个表都有80Mil条记录,这会导致重复数据消除,从而使SQL Server不堪重负。

我的规则如下:

代码语言:javascript
复制
CASE 1: before dedupication:            
key1    key2    key3    date
1   A   1   null
1   A   1   null
1   A   1   null

after deduplication:            
key1    key2    key3    date
1   A   1   null
1   A   1   null
1   A   1   null

CASE 2: before dedupication:            
key1    key2    key3    date
1   A   1   1/1/2016
1   A   1   1/1/2016
1   A   1   1/1/2016

after deduplication:            
key1    key2    key3    date
1   A   1   1/1/2016

CASE 3: before dedupication:            
key1    key2    key3    date
1   A   1   1/1/2016
1   A   1   1/2/2016
1   A   1   1/3/2016

after deduplication:            
key1    key2    key3    date
1   A   1   1/3/2016

CASE 4: before deduplication            
1   A   1   1/1/2016
1   A   1   1/1/2016
1   A   1   null

after deduplication:            
key1    key2    key3    date
1   A   1   1/1/2016


CASE 5: before deduplication            
1   A   1   1/1/2016
1   A   1   1/2/2016
1   A   1   null

after deduplication:            
key1    key2    key3    date
1   A   1   1/2/2016

正如您所看到的,这几乎是一个SCD2场景加上空值处理问题。

我的下一次尝试是使用重复数据删除功能将一个表合并到另一个表中,然后重复操作,直到完成所有操作。

我正在考虑合并解决方案,但无法理解如何对这么多记录执行此操作。我正在为SSIS游说,但无济于事。

我正在寻找任何可能的想法来执行这项任务。谢谢。

EN

回答 1

Stack Overflow用户

发布于 2016-09-05 04:37:02

我希望我正确地理解了你的目的。

很抱歉,格式无法保留,或者至少我无法保留。我建议您尝试在sql server管理器中保留该内容。

考虑到数据负载,我建议您考虑实施一些中间步骤,并将相关数据合并到临时表中。如果可能,还要考虑使用NoLock提示。

  • First步骤-让我们模拟您的表(5、7或10不会产生difference)

创建表#tmp1 (key1 int not null,key2 char(1) not null,key3 int not null,key4 date null)

create table #tmp2 (key1 int not null,key2字符(1) not null,key3 int not null,key4 date null)

创建表#tmp3 (key1 int not null,key2 char(1) not null,key3 int not null,key4 date null)

create table #tmp4 (key1 int not null,key2字符(1) not null,key3 int not null,key4 date null)

create table #tmp5 (key1 int not null,key2字符(1) not null,key3 int not null,key4 date null)

插入#tmp1值(1,'A',1,null),(1,'A',1,null),(1,'A',1,null)

插入#tmp2值(1,'A',1,'2016-01-01'),(1,'A',1,'2016-01-01'),(1,'A',1,'2016-01-01')

插入#tmp3值(1,'A',1,'2016-01-01'),(1,'A',1,'2016-01-02'),(1,'A',1,'2016-01-03')

插入#tmp4值(1,'A',1,'2016-01-01'),(1,'A',1,'2016-01-01'),(1,'A',1,null)

插入#tmp5值(1,'A',1,'2016-01-01'),(1,'A',1,'2016-01-02'),(1,'A',1,null)

**-第一种退货类型(我对您的要求表示怀疑)

选择key1、key2、key3、key4

从( select key1,key2,key3,key4 = max(key4) )

代码语言:javascript
复制
        from    #tmp1

        where   key4 is not null

        group by key1, key2, key3

        union all

        select  key1, key2, key3, key4 = max(key4)

        from    #tmp2

        where   key4 is not null

        group by key1, key2, key3

        union all

        select  key1, key2, key3, key4 = max(key4)

        from    #tmp3

        where   key4 is not null

        group by key1, key2, key3

        union all

        select  key1, key2, key3, key4 = max(key4)

        from    #tmp4

        where   key4 is not null

        group by key1, key2, key3

        union all

        select  key1, key2, key3, key4 = max(key4)

        from    #tmp5

        where   key4 is not null

        group by key1, key2, key3

        -- nulls, do not deduplicate
        union all
        select  a.key1, a.key2, a.key3, a.key4
        from    #tmp1           a
                left join #tmp1 b on a.key1 = b.key1 and a.key2 = b.key2 and a.key3 = b.key3 and b.key4 is not null
        where   a.key4 is null and b.key4 is null
        union all
        select  a.key1, a.key2, a.key3, a.key4
        from    #tmp2           a
                left join #tmp2 b on a.key1 = b.key1 and a.key2 = b.key2 and a.key3 = b.key3 and b.key4 is not null
        where   a.key4 is null and b.key4 is null
                    union all
        select  a.key1, a.key2, a.key3, a.key4
        from    #tmp3           a
                left join #tmp3 b on a.key1 = b.key1 and a.key2 = b.key2 and a.key3 = b.key3 and b.key4 is not null
        where   a.key4 is null and b.key4 is null
        union all
        select  a.key1, a.key2, a.key3, a.key4
        from    #tmp4           a
                left join #tmp4 b on a.key1 = b.key1 and a.key2 = b.key2 and a.key3 = b.key3 and b.key4 is not null
        where   a.key4 is null and b.key4 is null
        union all
        select  a.key1, a.key2, a.key3, a.key4
        from    #tmp5           a
                left join #tmp5 b on a.key1 = b.key1 and a.key2 = b.key2 and a.key3 = b.key3 and b.key4 is not null
        where   a.key4 is null and b.key4 is null

    )       a

**-第二种退货类型(我对您的要求表示怀疑)

select key1,key2,key3,key4 = max( key4 ) from ( select key1,key2,key3,key4 from #tmp1 where key4 is not null union all select key1,key2,key3,key4 from #tmp2 where key4 is not null union all select key1,key2,key3,key4 from #tmp3 where key4 is not null union all select key1,key2,key3,key4 from #tmp4 where key4 is not null union all select key1,key2,key3,key4 from #tmp5 where key4不为null )非null group by key1,key2,key3 union all -- null,不删除重复数据select a.key1,a.key2,a.key3,a.key4 from #tmp1 a左连接#tmp1 b on a.key1 = b.key1和a.key2 = b.key2,a.key3 = b.key3和b.key4 is not null其中a.key4为null,b.key4为null联合all select a.key1,a.key2,a.key3,a.key4从#tmp2 a左连接#tmp2 b on a.key1 = b.key1和a.key2 = b.key2以及a.key3 = b.key3和b.key4 is not null其中a.key4为null,b.key4为null联合all select a.key1,a.key2,a.key3,a.key4

从#tmp3a左连接#tmp3b on a.key1 = b.key1和a.key2 = b.key2以及a.key3 = b.key3和b.key4 is not null其中a.key4为null而b.key4为null联合所有选择a.key1,a.key2,a.key3,从#tmp4 a左连接#tmp4 b on a.key1 = b.key1 a.key4 a.key2 = b.key2和a.key3 = b.key3 and b.key4 is not null where a.key4 is null和b.key4 is null联合all select a.key1,a.key2,a.key3,从#tmp5 a左连接#tmp5 b on a.key1 = b.key1,a.key2 = b.key2和a.key3 = b.key3,b.key4不为null,其中a.key4为null,b.key4为null

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/39309198

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档