首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >SQL中的传递匹配

SQL中的传递匹配
EN

Stack Overflow用户
提问于 2019-03-26 11:20:52
回答 2查看 270关注 0票数 0

我正在处理一个需求,需要在某些字段中匹配组中的一组记录(G1),并将匹配的记录重新分组为唯一的新组(NG1,NG2.)。要求如下:

样本数据

代码语言:javascript
复制
DECLARE @table TABLE ([Group] varchar(3), Member varchar(3), Address varchar(3), Phone varchar(3), Email varchar(3)) 

insert @table values
('G1', 'M1', 'A1', 'P1', 'E1'),
('G1', 'M2', 'A2', 'P2', 'E2'),
('G1', 'M3', 'A1', 'P3', 'E1'),
('G1', 'M4', 'A4', 'P3', 'E4'),
('G1', 'M5', 'A5', 'P5', 'E2'),
('G1', 'M6', 'A6', 'P6', 'E6'),
('G1', 'M7', 'A7', 'P6', 'E7'),
('G1', 'M8', 'A8', 'P8', 'E4'),
('G1', 'M9', 'A9', 'P9', 'E7'),
('G1', 'M10', 'A10', 'P10', 'E10')

在所附的示例数据中,M1、M3、M4和M8应与M1相同,M3在地址和电子邮件上匹配;M3在电话上依次与M4匹配;后者在电子邮件上与M8匹配。它们是由一个或多个属性关联的。

同样,M6、M7和M9应该在另一个独特的组中;M2、M5在同一组中(电子邮件匹配)。

M10将单独在一个组中,因为它没有任何匹配的记录。

就像G1一样,也会有不同的主要群体。

有人能帮忙吗?注意:我使用的是

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-03-27 08:50:55

我花了3杯咖啡和几杯咖啡,但给你.我最关心的是我是从评论中读到的

这是一个可重复的任务。将有几个小组,我们将不得不为每个小组这样做。所有群体的总记录可能是数百万。

这不能是一个可重复的任务,因为资源消耗将很高,我建议您使用它来规范组一次,并在应用程序或存储过程中添加逻辑,以便将新数据存储在所需的组中。

代码语言:javascript
复制
DECLARE @table TABLE (id int not null identity, [Group] varchar(3), Member varchar(3), Address varchar(3), Phone varchar(3), Email varchar(3)) 

insert @table values
('G1', 'M1', 'A1', 'P1', 'E1'),
('G1', 'M2', 'A2', 'P2', 'E2'),
('G1', 'M3', 'A1', 'P3', 'E1'),
('G1', 'M4', 'A4', 'P3', 'E4'),
('G1', 'M5', 'A5', 'P5', 'E2'),
('G1', 'M6', 'A6', 'P6', 'E6'),
('G1', 'M7', 'A7', 'P6', 'E7'),
('G1', 'M8', 'A8', 'P8', 'E4'),
('G1', 'M9', 'A9', 'P9', 'E7'),
('G1', 'M10', 'A10', 'P10', 'E10');

with 
/* Find all matches
id  Member  MatchWith
1   M1  M3
2   M2  M5
3   M3  M1
3   M3  M4 ...
*/
matches as (
    SELECT t.id, t.[Group], t.Member, a.member as MatchWith
    from 
    @table t
    outer apply (
        select distinct member 
        from @table 
        where member <> t.member and [group] = t.[group] and (Address = t.Address OR Phone = t.Phone OR Email = t.Email)
    ) a
)
/* Stuffing the matches per member
id  Member  AllMatches
1   M1  M1,M3
2   M2  M2,M5
3   M3  M1,M3,M4 .....
*/
, matchsummary as (
    SELECT DISTINCT id, [Group], Member, STUFF((
                SELECT ',' + Member FROM (
                SELECT m.Member
                UNION ALL
                SELECT DISTINCT MatchWith
                FROM matches
                WHERE Member = m.Member) U
                ORDER BY Member
                FOR XML PATH('')
                ), 1, 1, '') as AllMatches
    FROM matches m
)
/* Recursive CTE to find "cousins" records (M1, M3 matches on Address and Email; M3 in turn matches with M4 on Phone)
id  Member  AllMatches  gr
1   M1  M1,M3   1
2   M2  M2,M5   2
3   M3  M1,M3,M4    1
4   M4  M3,M4,M8    1
*/
, tree as (
    select *, ROW_NUMBER() over (order by id) as gr
    from matchsummary where AllMatches LIKE member+'%'
    /* The groups are created using the Members who are the first one in their matches 
    id  Member  AllMatches  gr
    1   M1  M1,M3   1
    2   M2  M2,M5   2
    6   M6  M6,M7   3
    10  M10 M10 4
    */
    union all
    select s.*, t.gr 
    from matchsummary s
    join tree t on s.Member <> t.Member and s.[Group] = t.[Group] and s.AllMatches NOT LIKE s.member+'%' and t.AllMatches like '%' + s.Member
)
select * from tree
order by id
option(maxrecursion 0)

输出:

ID组成员NewGroup 1 G1 M1 1 2 G1 M2 2 3 G1 M3 1 4 G1 M4 1 5 G1 M5 2 6 G1 M6 3 7 G1 M7 3 8 G1 M8 1 9 G1 M9 3 10 G1 M10 4

第二选择

考虑到您的表的大小,我建议您使用这个,我不太喜欢循环,但在这里我认为它们是值得的,这样您就不需要一次处理所有的数据了,

首先,您需要在表中添加一个新列来存储新的组,我的第一个想法是,在插入新记录时,更改应用程序的逻辑以计算该组会更好,但更好地考虑,插入会导致多个组成为一个组,您可能需要在应用程序中快速响应。因此,您可以根据需要设置一个作业来重新分组数据,如果您的表中有一个UpdatedDate字段,您还可以使用一个日志表改进这个解决方案,并且只重新处理上次执行后修改的组。

代码语言:javascript
复制
 IF OBJECT_ID('tempdb..#table') IS NOT NULL
    DROP TABLE #table;
CREATE TABLE #table ([Group] varchar(3), Member varchar(3), Address varchar(3), Phone varchar(3), Email varchar(3)) 

INSERT #table ([Group], Member, Address, Phone, Email)
VALUES
('G1', 'M1', 'A1', 'P1', 'E1'),
('G1', 'M2', 'A2', 'P2', 'E2'),
('G1', 'M3', 'A1', 'P3', 'E1'),
('G1', 'M4', 'A4', 'P3', 'E4'),
('G1', 'M5', 'A5', 'P5', 'E2'),
('G1', 'M6', 'A6', 'P6', 'E6'),
('G1', 'M7', 'A7', 'P6', 'E7'),
('G1', 'M8', 'A8', 'P8', 'E4'),
('G1', 'M9', 'A9', 'P9', 'E7'),
('G1', 'M10', 'A10', 'P10', 'E10');

ALTER TABLE #table ADD newGroup INT

/******************************************************************
START HERE
******************************************************************/

IF OBJECT_ID('tempdb..#Groups') IS NOT NULL
    DROP TABLE #Groups;

SELECT DISTINCT [Group] INTO #Groups FROM #table

DECLARE @Group VARCHAR(3)

WHILE EXISTS (SELECT 1 FROM #Groups)
BEGIN

    SELECT TOP 1 @Group = [Group] FROM #Groups

    UPDATE #table SET newGroup = NULL 
    WHERE [Group] = @Group

    DECLARE @newGroup INT = 1
    DECLARE @member varchar(3)

    WHILE EXISTS (SELECT 1 FROM #table WHERE [Group] = @Group AND newGroup IS NULL)
    BEGIN

        SELECT TOP 1 @member = member FROM #table WHERE [group] = @group AND newGroup IS NULL
    
        UPDATE #table SET newGroup = @newGroup
        WHERE Member = @member

        WHILE @@ROWCOUNT > 0
        BEGIN
            UPDATE T
            SET newGroup = @newGroup
            FROM #table T
            WHERE [Group] = @group AND newGroup IS NULL
            AND EXISTS (
                SELECT 1 FROM #table 
                WHERE newGroup = @newGroup
                AND (Address = t.Address OR Phone = t.Phone OR Email = t.Email)
            )
        END
        SET @newGroup += 1
    END
    DELETE #Groups WHERE [Group] = @Group
END

SELECT * FROM #table
票数 0
EN

Stack Overflow用户

发布于 2019-03-26 11:54:16

在中,假设数据位于名为“DataTable”的表中,我将执行以下操作:

代码语言:javascript
复制
WITH
    [Matches] AS
    (
        SELECT
            D1.[Group],
            D1.[Member],
            D2.[Member] AS [PreviousMatchingMember]
        FROM
            [DataTable] AS D1
            OUTER APPLY (SELECT TOP (1) [Member]
                         FROM [DataTable]
                         WHERE
                             [Group] = D1.[Group] AND
                             [Member] < D1.[Member] AND
                             ([Address] = D1.[Address] OR
                              [Phone] = D1.[Phone] OR
                              [Email] = D1.[Email])
                         ORDER BY
                             [Member]) AS D2
    ),
    [Groups] AS
    (
        SELECT
            [Group],
            [Member],
            [PreviousMatchingMember],
            'NG' + LTRIM(ROW_NUMBER() OVER (ORDER BY [Group], [Member])) AS [NewGroup]
        FROM
            [Matches]
        WHERE
            [PreviousMatchingMember] IS NULL
    UNION ALL
        SELECT
            M.[Group],
            M.[Member],
            M.[PreviousMatchingMember],
            G.[NewGroup]
        FROM
            [Groups] AS G
            INNER JOIN [Matches] AS M ON
                M.[Group] = G.[Group] AND
                M.[PreviousMatchingMember] = G.[Member]
    )
SELECT
    G.[NewGroup],
    G.[Member],
    D.[Address],
    D.[Phone],
    D.[Email]
FROM
    [Groups] AS G
    INNER JOIN [DataTable] AS D ON
        D.[Group] = G.[Group] AND
        D.[Member] = G.[Member]
ORDER BY
    G.[NewGroup],
    G.[Member];

编辑:

正如APC在他对你的问题的评论中指出的那样,如果一个记录涉及多个其他记录(使用不同的地址/电话/电子邮件字段),那么你就有一个(巨大的)问题。您可能最终会拥有可能属于不同组的记录。您可能会决定将这些组视为一个组,但我在这里的解决方案不适合解决如此复杂的问题。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55355914

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档