我想删除重复的数据,只有当三列(名称,价格和新价格)与相同的数据匹配。而是在另一个python脚本中。
这样数据就可以插入到数据库中,但是使用另一个python脚本,我想通过cron作业删除这个重复的数据。
因此,在这种情况下:
cur.execute("INSERT INTO cars VALUES(8,'Hummer',41400, 49747)")
cur.execute("INSERT INTO cars VALUES(9,'Volkswagen',21600, 36456)")都是重复的。包含插入数据的示例脚本:
import psycopg2
import sys
con = None
try:
con = psycopg2.connect(database='testdb', user='janbodnar')
cur = con.cursor()
cur.execute("CREATE TABLE cars(id INT PRIMARY KEY, name VARCHAR(20), price INT, new price INT)")
cur.execute("INSERT INTO cars VALUES(1,'Audi',52642, 98484)")
cur.execute("INSERT INTO cars VALUES(2,'Mercedes',57127, 874897)")
cur.execute("INSERT INTO cars VALUES(3,'Skoda',9000, 439788)")
cur.execute("INSERT INTO cars VALUES(4,'Volvo',29000, 743878)")
cur.execute("INSERT INTO cars VALUES(5,'Bentley',350000, 434684)")
cur.execute("INSERT INTO cars VALUES(6,'Citroen',21000, 43874)")
cur.execute("INSERT INTO cars VALUES(7,'Hummer',41400, 49747)")
cur.execute("INSERT INTO cars VALUES(8,'Hummer',41400, 49747)")
cur.execute("INSERT INTO cars VALUES(9,'Volkswagen',21600, 36456)")
cur.execute("INSERT INTO cars VALUES(10,'Volkswagen',21600, 36456)")
con.commit()
except psycopg2.DatabaseError, e:
if con:
con.rollback()
print 'Error %s' % e
sys.exit(1
finally:
if con:
con.close()发布于 2012-09-02 22:26:03
您可以在一条语句中完成此操作,而无需与服务器进行额外的往返。
DELETE FROM cars
USING (
SELECT id, row_number() OVER (PARTITION BY name, price, new_price
ORDER BY id) AS rn
FROM cars
) x
WHERE cars.id = x.id
AND x.rn > 1;window function row_number()需要PostgreSQL 8.4或更高版本。
在一组受骗对象中,最小的id幸存下来。
请注意,我将"new price"更改为new_price。
或者使用EXISTS semi-join,也就是@wildplasser发布的同样效果的评论。
或者,根据CTE-devotee @wildplasser的特殊请求,使用CTE而不是子查询... :)
WITH x AS (
SELECT id, row_number() OVER (PARTITION BY name, price, new_price
ORDER BY id) AS rn
FROM cars
)
DELETE FROM cars
USING x
WHERE cars.id = x.id
AND x.rn > 1;Data modifying CTE需要Postgres9.1或更高版本。
此表单的性能与使用子查询的表单的性能大致相同。
发布于 2012-09-02 22:09:22
使用GROUP BY SQL语句以及初始主键来标识行:
duplicate_query = '''\
SELECT MIN(id), "name", price, "new price"
FROM cars
GROUP BY "name", price, "new price"
HAVING COUNT(ID) > 1
'''上面的查询为具有多个主键id的每组(名称,价格,“新价格”)行选择最低的主键id。对于您的样本数据,这将返回:
7, 'Hummer', 41400, 49747
9, 'Volkswagen', 21600, 36456然后,您可以使用返回的数据来删除重复项:
delete_dupes = '''
DELETE
FROM cars
WHERE
"name"=%(name)s AND price=%(price)s AND "new price"=%(newprice)s AND
id > %(id)s
'''
cur.execute(duplicate_query)
dupes = cur.fetchall()
cur.executemany(delete_dupes, [
dict(name=r[1], price=r[2], newprice=r[3], id=r[0])
for r in dupes])请注意,我们删除主键id大于具有相同3列的第一个id的任何行。对于第一个复制,只有id 8的行匹配,对于第二个复制,匹配id 10的行。
这确实会对找到的每个副本执行单独的删除。您可以使用WHERE EXISTS子选择查询将其合并到一条语句中:
delete_dupes = '''\
DELETE FROM cars cdel
WHERE EXISTS (
SELECT *
FROM cars cex
WHERE
cex."name" = cdel."name" AND
cex.price = cdel.price AND
cex."new price" = cdel."new price" AND
cex.id > cdel.id
)
'''
cur.execute(delete_dupes)这将指示PostgreSQL删除具有相同名称、价格和新价格但主键高于当前行的其他行的任何行。
https://stackoverflow.com/questions/12236473
复制相似问题