这是一个已经在这里讨论过几次的主题,但对我来说,很难理解如何删除MySQL数据库中重复的行。是的,我见过许多例子,但它们被简化为2列等,解决方案大多数时候只是查询,这还没有得到解释。因为我想安全地清理我们的数据库。我已经备份了我的表,现在我不打算删除重复的行(因为我有很多行)。下面是我尝试过的查询片段,它非常简单:如果重复行,请选择最新的行并显示该行的ID。
SELECT MAX(id) FROM Raw_Validated_backup GROUP BY col1, col2, col3, ... col_n-1 having COUNT(*) > 1;现在,由于我已经选择了最新的重复行,所以我不得不删除重复的行。如果你问我的话,那就更容易做了。这是@Jose Rui Santos编写的查询:
delete test
from test
inner join (
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email
) duplic on duplic.email = test.email
where test.id < duplic.lastId;他回复这个帖子:MySQL delete duplicate records but keep latest
问题是,我如何准确地删除重复的行并保持最新的行?更喜欢采用类似伪格式的格式,这样其他人也可以从中学习。例如:
DELETE table FROM database 而不是具体的,比如:
DELETE email FROM cityMySQL查询
例1:
DELETE t1 FROM Raw_Validated_backup AS t1 INNER JOIN Raw_Validated_backup AS t2
ON t1.time_start=t2.time_start
AND t1.time_end=t2.time_end
AND t1.first_temp_lpn=t2.first_temp_lpn
AND t1.first_WL=t2.first_WL
AND t1.first_temp_lpn_validated=t2.first_temp_lpn_validated
AND t1.second_temp_lpn=t2.second_temp_lpn
AND t1.second_WL=t2.second_WL
AND t1.second_temp_lpn_validated=t2.second_temp_lpn_validated
AND t1.third_temp_lpn=t2.third_temp_lpn
AND t1.third_WL=t2.third_WL
AND t1.third_temp_lpn_validated=t2.third_temp_lpn_validated
AND t1.first_temp_rising=t2.first_temp_rising
AND t1.first_WR=t2.first_WR
AND t1.first_temp_rising_validated=t2.first_temp_rising_validated
AND t1.second_temp_rising=t2.second_temp_rising
AND t1.second_WR=t2.second_WR
AND t1.second_temp_rising_validated=t2.second_temp_rising_validated
AND t1.third_temp_rising=t2.third_temp_rising
AND t1.third_WR=t2.third_WR
AND t1.third_temp_rising_validated=t2.third_temp_rising_validated
AND t1.id<t2.id;例2:
DELETE FROM Raw_Validated_backup WHERE id NOT IN (
SELECT max(id) FROM Raw_Validated_backup
GROUP BY time_start, time_end, first_temp_lpn, first_WL, first_temp_lpn_validated, second_temp_lpn, second_WL, second_temp_lpn_validated, third_temp_lpn, third_WL, third_temp_lpn_validated, first_temp_rising, first_WR, first_temp_rising_validated, second_temp_rising, second_WR, second_temp_rising_validated, third_temp_rising, third_WR, third_temp_rising_validated )发布于 2020-01-02 16:29:33
我会这样做的:
mysql> select * from test;
+----+------------------+------+------+------+
| id | email | col2 | col3 | col4 |
+----+------------------+------+------+------+
| 1 | user@example.org | 2 | 3 | 4 |
| 2 | user@example.org | 2 | 3 | 4 |
| 3 | user@example.org | 2 | 3 | 4 |
| 4 | user@example.org | 2 | 3 | 4 |
+----+------------------+------+------+------+
mysql> DELETE t1 FROM test AS t1 INNER JOIN test AS t2
ON t1.email=t2.email AND t1.col2=t2.col2 AND t1.col3=t2.col3 AND t1.col4=t2.col4
AND t1.id<t2.id;
mysql> select * from test;
+----+------------------+------+------+------+
| id | email | col2 | col3 | col4 |
+----+------------------+------+------+------+
| 4 | user@example.org | 2 | 3 | 4 |
+----+------------------+------+------+------+说明:联接中的条件查找行t2,以便id更大,所有其他列都相等。如果找到这样的行,则t1不能是重复项中id最大的行,因此应删除行t1。
发布于 2020-01-02 16:30:00
在@的答案中,最后的解决方案将完全满足您的需要。
DELETE
FROM table_name
WHERE id NOT IN (
SELECT max(id)
FROM table_name
GROUP BY email
)它将为每个组选择最新记录的所有ID(在本例中是通过电子邮件进行分组,但也可以在所有列上),然后删除查询结果中没有的所有记录。
实际上,您可以通过只运行select来确保删除正确的结果:
SELECT max(id)
FROM table_name
GROUP BY email这将向你展示所有不会被删除的东西。
https://stackoverflow.com/questions/59566579
复制相似问题