我有一个数据库表,其中有重复的地址行。对于重复的地址行,我希望从一列中保留具有最高值的单行,如果仍然有重复的行,则只需选择一个(无关紧要,无法知道哪个更好,因为这些是任意字符串值,max/min应该足够了)。例如,假设我们有下面的表,其中有两个地址,其中一个重复四次:
ADDR1 CITY STATE APN CONFIDENCE
1 Main St. Dover IL apn-1 7
1 Main St. Dover IL apn-2 8
1 Main St. Dover IL apn-3 5
1 Main St. Dover IL apn-4 8
7 Ajax Rd. Dover IL apn-8 9我需要使用地址列上的group by将重复的行过滤掉,仅为地址的置信度得分最高的行,然后从“子选择”中选择未分组列的最大值或最小值(本例中为APN),以获得以下所需的结果:
ADDR1 CITY STATE APN CONFIDENCE
1 Main St. Dover IL apn-4 8
7 Ajax Rd. Dover IL apn-8 9我想出了一个有效的SQL查询:
select sub.ADDR1, sub.CITY, sub.STATE, MAX(APN), sub.CONFIDENCE
from example_db.example_table as table1
left join (
select ADDR1,
CITY,
STATE,
MAX(CONFIDENCE) as CONFIDENCE
from example_db.example_table
group by ADDR1,
CITY,
STATE
) as sub
on (table1.ADDR1 = sub.ADDR1)
and (table1.CITY = sub.CITY)
and (table1.STATE = sub.STATE)
group by sub.ADDR1, sub.CITY, sub.STATE, sub.CONFIDENCE上面是一个用于演示我的用例的简化示例,但是我将在其上运行此SQL代码的数据很大(大约130M行),并且SQL Server实例很少--如果我可以优化上面的内容或使用另一种更有效的方法,我就不想让它过载。
发布于 2020-09-30 03:42:35
您可以使用窗口函数:
select *
from (
select t.*,
row_number() over(partition by addr1, city, state order by confidence desc) rn
from mytable t
) t
where rn = 1发布于 2020-09-30 03:44:38
另一种选择是使用WITH TIES
不完全清楚order by中是否需要APN
示例
Select top 1 with ties *
From YourTable
Order by row_number() over (partition by addr1,city,state order by confidence desc, apn)返回
ADDR1 CITY STATE APN CONFIDENCE
1 Main St. Dover IL apn-2 8
7 Ajax Rd. Dover IL apn-8 9https://stackoverflow.com/questions/64126503
复制相似问题