我有一个表(比如tableB),它有大约4000万行(总是在增加)。归档工作目前是通过DELETE .. OUTPUT .. INTO .. FROM ..方式完成的。对1000行进行归档,最初需要3~5秒。但是,随着越来越多的行被删除,这需要更多的时间。例如,删除10M行后,现在需要35~40秒才能删除1000行。
是什么引起的?如何改进这种情况(我至少需要归档3000万行)?如果分区是唯一的方法,那么如何在最短的停机时间内做到这一点呢?
其他信息:
tableB有两个外键列(tableAId,tableCId)tableA中的日期-时间字段执行的(在DELETE语句中使用了一个inner-join )。with (index=ix_time)用于tableA,则计划在索引上显示100%的成本--在tableB的外键颜色上查找索引。tableA和tableB都有自动递增bigint作为主键.tableB有4个指标tableA有5个指标tableA有30M+行。查询计划摘录:

剧本:
DECLARE @older_than datetime2(0) = '2015-10-01';
DECLARE @i int = 1;
DECLARE @j int = 0;
DECLARE @imax int = 1000;
DECLARE @jmax int = 50;
DECLARE @total int = 0;
DECLARE @t1 DATETIME2(3);
DECLARE @t2 DATETIME2(3);
DECLARE @timetook int;
WHILE @i > 0 AND @j < @jmax
BEGIN
SET @t1 = GETDATE();
DELETE TOP (@imax) ss
OUTPUT deleted.[Id]
,deleted.[columnA]
,deleted.[columnB]
INTO [MyArchive_Data].dbo.tableB([Id]
,columnA
,columnB)
FROM [MyLive_Data].dbo.tableB ss
INNER JOIN [MyLive_Data].dbo.tableA s ON s.Id = ss.tableAID
WHERE s.Time < @older_than;
SET @i = @@rowcount;
SET @j = @j + 1;
SET @total = @total + @i;
SET @t2 = GETDATE();
SET @timetook = datediff(second,@t1,@t2);
RAISERROR('LOOP %d COMPLETE [%d rows][%d sec]',10,1,@j, @total, @timetook) with nowait;
WAITFOR DELAY '00:00:03';
END更新
如果我排除循环构造(WHILE @i > 0 AND @j < @jmax),并且只分别运行DELETE语句,则需要10~12秒。我观察了查询计划。那些不一样。在启用循环构造的情况下,它使用了index-seek,但是没有它,就使用了index-scan。为什么?
发布于 2015-12-01 09:58:41
没有看到带有索引的完整表模式,我会说SQL server查询优化器发现扫描聚集索引更有效,因为表上的统计数据显示ID值与时间之间的相关性,如果它从聚集索引的末尾开始并按反向顺序处理,则必须读取较少的行才能找到满足查询的TOP(x)行。布伦特·奥扎尔关于扫描、搜寻和统计的文章
而且,循环中有WAITFOR语句,这可能会导致表上的锁升级,因为删除是在隐式事务中完成的,因此只有在循环结束后才会提交删除。尝试在delete语句之前添加一个BEGIN TRANSACTION,在它之后添加一个COMMIT TRANSACTION。如果可能的话,删除WAITFOR语句,因为它会导致处理上的延迟。
DECLARE @older_than datetime2(0) = '2015-10-01';
DECLARE @i int = 1;
DECLARE @j int = 0;
DECLARE @imax int = 1000;
DECLARE @jmax int = 50;
DECLARE @total int = 0;
DECLARE @t1 DATETIME2(3);
DECLARE @t2 DATETIME2(3);
DECLARE @timetook int;
WHILE @i > 0 AND @j < @jmax
BEGIN
SET @t1 = GETDATE();
BEGIN TRANSACTION
DELETE TOP (@imax) ss
OUTPUT deleted.[Id]
,deleted.[columnA]
,deleted.[columnB]
INTO [MyArchive_Data].dbo.tableB([Id]
,columnA
,columnB)
FROM [MyLive_Data].dbo.tableB ss
INNER JOIN [MyLive_Data].dbo.tableA s ON s.Id = ss.tableAID
WHERE s.Time < @older_than;
COMMIT TRANSACTION
SET @i = @@rowcount;
SET @j = @j + 1;
SET @total = @total + @i;
SET @t2 = GETDATE();
SET @timetook = datediff(second,@t1,@t2);
RAISERROR('LOOP %d COMPLETE [%d rows][%d sec]',10,1,@j, @total, @timetook) with nowait;
WAITFOR DELAY '00:00:03';
END发布于 2015-12-01 13:44:01
下面是一个我期望对非常大的归档处理更有效的例子。此方法按ID范围而不是顶部以批方式删除。您可以根据您的性能和并发需要调整批处理大小。
DECLARE
@older_than datetime2(0) = '2015-10-01'
, @i int = 1
, @j int = 0
, @total int = 0
, @t1 DATETIME2(3)
, @t2 DATETIME2(3)
, @timetook int
, @MinID int
, @MaxId int
, @BatchFirstId int
, @BatchLastId int
, @BatchSize int =100000;
SELECT @MinID = MIN(Id), @MaxID = MAX(Id) FROM dbo.TableA;
SET @BatchFirstId = @MinID;
WHILE @BatchFirstId <= @MaxId
BEGIN
SET @BatchLastID = @BatchFirstId + @BatchSize - 1;
SET @t1 = GETDATE();
DELETE ss
OUTPUT deleted.[Id]
,deleted.[columnA]
,deleted.[columnB]
INTO [MyArchive_Data].dbo.tableB([Id]
,columnA
,columnB)
FROM [MyLive_Data].dbo.tableB ss
INNER JOIN [MyLive_Data].dbo.tableA s ON s.Id = ss.tableAID
WHERE s.Time < @older_than
AND s.Id BETWEEN @BatchFirstID AND @BatchLastID;
SET @i = @@ROWCOUNT;
SET @BatchFirstId += @BatchSize;
SET @j = @j + 1;
SET @total = @total + @i;
SET @t2 = GETDATE();
SET @timetook = datediff(second,@t1,@t2);
RAISERROR('LOOP %d COMPLETE [%d rows][%d sec]',10,1,@j, @total, @timetook) with nowait;
WAITFOR DELAY '00:00:03';
END;https://stackoverflow.com/questions/34011945
复制相似问题