文章/答案/技术大牛

发布

社区首页 >问答首页 >排除Redshift中后面的重复记录

问排除Redshift中后面的重复记录
EN

Stack Overflow用户

提问于 2017-08-08 06:25:13

回答 1查看 69关注 0票数 1

我有一个无法解决的简单SQL问题(我正在使用Amazon Redshift)。

假设我有以下示例：

id,  type,  channel, date, column1, column2, column3, column4
1,   visit, seo,  07/08/2017: 11:11:22
1,   hit, seo,  07/08/2017: 11:12:34
1,   hit, seo,  07/08/2017: 11:13:22
1,   visit, sem,   07/08/2017: 11:15:11
1,   scarf, display,   07/08/2017: 11:15:45
1,   hit, display,   07/08/2017: 11:15:37
1,   hit, seo,  07/08/2017: 11:18:22
1,   hit, display  07/08/2017: 11:18:23
1,   hit, referal  07/08/2017: 11:19:55

我想选择所有访问(在我的逻辑表中，它对应于与特定ID相关的每一行的开头，还排除了一个接一个的'channel‘重复项，我的示例应该返回：

1,   visit, seo,  07/08/2017: 11:11:22
**1,   hit, seo,  07/08/2017: 11:12:34** (exclude because it follows seo and it's not a visit)
**1,   hit, seo,  07/08/2017: 11:13:22** (exclude because it follows seo and it's not a visit)
1,   visit, sem,   07/08/2017: 11:15:11 (include, new channel)
1,   scarf, display,   07/08/2017: 11:15:45 (include, new channel)
**1,   hit, display,   07/08/2017: 11:15:37** (exclude because it follows display and it's not a visit)
1,   hit, seo,  07/08/2017: 11:18:22 (include because it doesn't follow seo directly, even if seo is already present) 
1,   hit, display  07/08/2017: 11:18:23 ((include because it doesn't follow display directly, even if display is already present) 
1,   hit, referal  07/08/2017: 11:19:55 (include, new channel)

我尝试使用行号(因为我正在使用Redshift)：

select type, date, id, ROW_NUMBER() OVER (PARTITION BY id, channel ORDER BY date) as rn

然后添加一个过滤器：

Where type='visit' or rn=1

但这并没有解决问题，因为它不会返回第7行和第8行：

1, hit, seo, 07/08/2017: 11:18:22 (will be rn=4 for 'id=1, channel=seo' combination)
1, hit, display 07/08/2017: 11:18:23 (will be rn=3 for 'id=1, channel=display' combination)

谁能给我一个提示，这样我就可以解决这个问题了？

row-number

amazon-web-services

amazon-redshift

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-08-08 06:35:37

可以使用lag仅选择前一个通道不同或类型为访问的行

select * from (
    select * , 
        lag(channel) over (partition by id, order by date) prev_channel
    from mytable
) t where prev_channel <> channel or type = 'visit' or prev_channel is null

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/45556580

复制

相似问题

问排除Redshift中后面的重复记录
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问排除Redshift中后面的重复记录EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问排除Redshift中后面的重复记录
EN