我以为我已经解决了这个问题,但事实证明我只是删除了第一条记录。下面返回重复的行。它们的计数都是2。我只想删除每个重复记录的第一个。
select scorestudentid, scoreadvisor, scorecorrect, count(*)
from scores
where scoretestid = 3284
group by scorestudentid, scoreadvisor, scorecorrect
having count(scorestudentid) > 1它返回:
scorestudentid scoreadvisor scorecorrect no column name
13033719 28059 3.0 2
13033777 28086 3.0 2
13033826 28147 3.0 2
13033960 28023 3.0 2所以我把这些放在一起,认为它会起作用:
set rowcount 1
delete
from scores
where scoretestid = 3284
and scorestudentid in (
select scorestudentid
from scores
where scoretestid = 3284
group by scorestudentid
having count(scorestudentid) > 1)它真的看起来应该是一个简单的概念,但我不明白。
基于Thomas脚本,我更新了查询以适应,但它仍然不能工作。
Delete Scores
Where Exists (
Select 1
From Scores As S2
Where S2.ScoreStudentId = Scores.ScoreStudentId
And S2.ScoreAdvisor = Scores.ScoreAdvisor
And S2.ScoreCorrect = Scores.ScoreCorrect
Group By S2.ScoreStudentId, S2.ScoreAdvisor, S2.ScoreCorrect
Having Count(*) > 1
And Min(S2.NewScoreID) = Scores.NewScoreID
)
And Scores.ScoreTestId = 3284发布于 2010-04-22 05:49:12
诀窍是使用主键列(您确实有一个,对吗?)并且只需找到与您想要的条件匹配的第一个pk值。如果由于某种疯狂的原因没有主键列,那么添加一个标识列并使其成为主键,然后执行删除操作。
对编辑进行了修订,使其更加通用。如果您删除ScoreTest上的最终筛选器,它将删除所有基于ScoreStudentId、ScoreAdvisor和ScoreCorrect的重复项。
Delete Scores
Where Exists (
Select 1
From Scores As S2
Where S2.ScoreStudentId = Scores.ScoresStudentId
And S2.ScoreAdvisor = Scores.ScoreAdvisor
And S2.ScoreCorrect = Scores.ScoreCorrect
Group By S2.ScoreStudentId, S2.ScoreAdvisor, S2.ScoreCorrect
Having Count(*) > 1
And Min(S2.PrimaryKeyColumn) = Scores.PrimaryKeyColumn
)
And Scores.ScoreTest = 3284发布于 2011-02-17 22:05:46
我相信Thomas的解决方案不适用于主键的唯一标识符。此外,如果一条记录在表中多次重复(即3、4、5+次),则只会删除一条记录。
这是我们使用的:
declare @col1唯一标识符declare @col2 varchar(256) datetime @col3
DECLARE C CURSOR
FOR
select col1, col2, col3
from MyTable
where IsDeleted = 0
group by col1, col2, col3
having count(*) > 1
OPEN C
FETCH NEXT FROM C
INTO @col1, @col2, @col3
WHILE @@FETCH_STATUS = 0
BEGIN
declare @primaryKey uniqueidentifier
set @primaryKey = (select top 1 primaryKey from MyTable
where col1 = @col1 and col2= @col2 and col3 = @col3)
update MyTable
set IsDeleted = 1, DeleteDt = GETDATE()
where col1 = @col1
and col2 = @col2
and col3 = @col3
and PrimaryKey<> @primaryKey
FETCH NEXT FROM C
INTO @col1, @col2, @col3
END
CLOSE C
DEALLOCATE C这个游标的作用是:
发布于 2014-02-02 23:26:59
我将谈论SQL世界中一个有趣的话题。如果你用谷歌搜索这个主题,你会发现从表格中删除重复数据的方法有很多。我不打算写一些非常新的东西,但我会讨论使用传统方法删除重复数据时的性能问题。
从SQL2000中删除重复行:-我已经创建了一个表DuplicateData,并根据EmpId插入了一些重复行。
Create table DuplicateData(varchar int,Name varchar(100))-->表创建
insert into DuplicateData values(4,'Akshay')
insert into DuplicateData values(4,'Akshay')
insert into DuplicateData values(5,'ankit')
insert into DuplicateData values(3,'Vikas')
insert into DuplicateData values(3,'Vikas')
insert into DuplicateData values(3,'Vikas')
insert into DuplicateData values(3,'Vikas')
insert into DuplicateData values(2,'Raj')
insert into DuplicateData values(2,'Raj')
insert into DuplicateData values(1,'Neeraj')
insert into DuplicateData values(1,'Neeraj')
insert into DuplicateData values(1,'Neeraj')在SQL2000中从表中删除重复行的传统方法:-如果我们在查询分析器中运行以下批处理,它将从表DuplicateData中删除所有重复的值。如果您是在测试环境中或在虚拟数据上执行此查询,则此查询是"OK“。但是,如果您有数百万条记录或大数据,就性能而言,此查询将是最差的查询。这可能需要几个小时,也可能需要几天,具体取决于所需表中的数据量。
原因:-下面的查询是一个相关子查询,它将对表中存在的每个EmpId执行,并检查每个EmpId的计数是否大于1,然后逐个删除每条记录。这就是它表现缓慢的原因。
set rowcount 1
delete from DuplicateData where (select count(EmpId) from DuplicateData a where a.EmpId=DuplicateData.EmpId)>1
while @@rowcount>0
delete from DuplicateData where (select count(EmpId) from DuplicateData a where a.EmpId=DuplicateData.EmpId)>1
set rowcount 0我们可以创建一个存储过程来克服这个性能问题。下面是示例。
declare @tmp table(empid int,cnt int, rowid int identity)--> declare table variable
declare @maxcounter as integer--> Declaration of variables
declare @mincounter as integer
declare @rowcnt as integer
declare @empid as int-->End of Declaration
insert into @tmp(empid,cnt)-->Inserting duplicate empid along with no of duplicate entries
select empid,count(empid) from duplicatedata
group by empid having count(empid)>1
select @mincounter=min(rowid),@maxcounter=max(rowid) from @tmp -->assigning minimum and maximum rowid to variables.
while @mincounter <=@maxcounter
begin
select @rowcnt=cnt,@empid=empid from @tmp where rowid=@mincounter
set @rowcnt =@rowcnt-1
set rowcount @rowcnt
delete from duplicatedata where empid=@empid
set rowcount 0
set @mincounter=@mincounter +1
end让我们理解一下上面的while循环,我们在@tmp表中有所有重复的记录,没有重复的条目。现在,我们将遍历@tmp表中存在的每条记录,因此我们将最小和最大行号分配给变量(@maxcounter,@mincounter)。
在While循环体中,我们将“无重复记录”值赋给变量@rowcnt,将empid赋值给变量@empid
在下一条语句中,我们设置@rowcnt=@rowcnt-1,因为此变量不包含特定empid的重复记录,但我们希望从重复的记录中保留一个empid。在下一条语句中,我们将为特定的empid设置一个小于重复记录数量的值。
下一条语句将行数重置为0,最后一条语句增加@mincounter值,以便从@tmp表中提取下一条记录。
https://stackoverflow.com/questions/2686741
复制相似问题