我有一个表属性
source| address | price | size | created_at |duplicate
file1 |Fleet St., 1 | 230.00 | 13 | 2019-12-01 | 0
file2 |Help St.43 | 90.00 | 4 | 2018-5-5 | 0
file1 |Fleet St., 1 | 230.00 | 13 | 2019-10-01 | 0
file1 |Fleet St., 1 | 230.00 | 13 | 2017-10-01 | 0我需要找到副本的来源,地址,价格和大小,并标志他们所有的副本,但最近的。
低于期望的输出。
source| address | price | size | created_at |duplicate
file1 |Fleet St., 1 | 230.00 | 13 | 2019-12-01 | 0
file2 |Help St.43 | 90.00 | 4 | 2018-5-5 | 0
file1 |Fleet St., 1 | 230.00 | 13 | 2019-10-01 | 1
file1 |Fleet St., 1 | 230.00 | 13 | 2017-10-01 | 1我已经想出了下面的查询来识别副本,但是我不知道如何继续。
SELECT
source,
address,
COUNT(address),
price,
COUNT(price),
size,
COUNT(size),
MAX(created_at)
FROM properties
GROUP BY
source,
address,
price,
size
HAVING
COUNT(address) > 1 AND
COUNT(price) > 1 AND
COUNT(size) > 1 AND
COUNT(source) > 1任何帮助都将不胜感激。
发布于 2019-08-25 11:37:35
如果要更改值,请使用update。在本例中,使用from和聚合查询:
update properties p join
(select source, address, price, size,
max(created_at) as max_created_at
from properties
group by source, address, price, size
) pp
using (source, address, price, size)
set p.is_duplicate = 1
where p.created_at < pp.max_created_at;请注意,这并不会将最近的复制值设置为0。如果您的数据以NULL值开头,则使用:
update properties p join
(select source, address, price, size,
max(created_at) as max_created_at
from properties
group by source, address, price, size
) pp
using (source, address, price, size)
set p.is_duplicate = (p.created_at < pp.max_created_at);发布于 2019-08-25 10:48:58
假设您使用的是RANK 8+,那么在这里我会使用它:
SELECT
source,
address,
price,
size,
created_at,
CASE WHEN RANK() OVER (PARTITION BY source, address, price, size
ORDER BY created_at DESC) = 1
THEN 0 ELSE 1 END AS duplicate
FROM properties;在早期版本的MySQL中,我们可以尝试连接到子查询,该查询为每个组查找最新的created_at:
SELECT
p1.source,
p1.address,
p1.price,
p1.size,
p1.created_at,
CASE WHEN p2.source IS NOT NULL THEN 0 ELSE 1 END AS duplicate
FROM properties p1
LEFT JOIN
(
SELECT source, address, price, size, MAX(created_at) AS max_created_at
FROM properties
GROUP BY source, address, price, size
) p2
ON p1.source = p2.source AND
p1.address = p2.address AND
p1.price = p2.price AND
p1.size = p2.size AND
p1.created_at = p2.max_created_at;发布于 2019-08-25 11:02:01
您可以使用“存在”(在任何版本的MySQL中都可以使用):
select t.*,
exists(
select 1 from properties
where source = t.source and address = t.address and price = t.price and size = t.size
and created_at > t.created_at
) duplicate
from properties t见演示。
如果要更新列duplicate
update properties p inner join (
select t.*,
exists(
select 1 from properties
where source = t.source and address = t.address and price = t.price and size = t.size
and created_at > t.created_at
) dup
from properties t
) t on p.source = t.source and p.address = t.address and p.price = t.price and p.size = t.size and p.created_at = t.created_at
set p.duplicate = t.dup;见演示。
结果:
| source | address | price | size | created_at | duplicate |
| ------ | ------------ | ----- | ---- | ------------------- | --------- |
| file1 | Fleet St., 1 | 230 | 13 | 2019-12-01 00:00:00 | 0 |
| file2 | Help St.43 | 90 | 4 | 2018-05-05 00:00:00 | 0 |
| file1 | Fleet St., 1 | 230 | 13 | 2019-10-01 00:00:00 | 1 |
| file1 | Fleet St., 1 | 230 | 13 | 2017-10-01 00:00:00 | 1 |https://stackoverflow.com/questions/57645301
复制相似问题