我有以下表格:
Orders
order_id
9
10
11
Order_details
order_id, product_id
9, 7
10, 5
10, 6
11, 6
11, 7
Products
product_id, product_name, price
5, potato, 4.99
6, potato *, 7.5
7, orange, 7.99我已经收到了如何在产品名称被复制的情况下找到订单的反馈,但是现在情况有点复杂了,因为复制的位置在上面显示的名称之后有额外的标志“*”。
如何添加到此查询的可能性,只计数订单有一个产品,没有额外的字符,和它的其他?
例如,“土豆”和“土豆”会被忽略,“土豆*”和“土豆*”也会被忽略,但是结果中会出现“土豆”和“土豆”的顺序。
select od.order_id
from order_details od join
products p
on od.product_id = p.product_id
group by od.order_id
having count(p.product_name) > count(distinct p.product_name)发布于 2019-01-30 13:55:04
一种选择可能是做一个简单的替换,从产品名称中删除*:
SELECT
od.order_id
FROM order_details od
INNER JOIN products p
ON od.product_id = p.product_id
GROUP BY
od.order_id
HAVING
COUNT(DISTINCT p.product_name) <>
COUNT(DISTINCT REPLACE(p.product_name, ' *', ''));
演示是为MySQL提供的,但是相同的查询至少应该在其他几个数据库上运行。
理想情况下,最好对产品名称进行regex替换,这将避免*后面的空格作为产品名称的合法部分出现在其他地方的可能性。
编辑:
既然您正在使用Postgres,我们实际上可以做一个更有针对性的regex替换:
SELECT
od.order_id
FROM order_details od
INNER JOIN products p
ON od.product_id = p.product_id
GROUP BY
od.order_id
HAVING
COUNT(DISTINCT p.product_name) <>
COUNT(DISTINCT REGEXP_REPLACE(p.product_name, ' \*$', ''));
发布于 2019-01-30 16:37:33
您可以在(最长的)初始子字符串上链接:
CREATE TABLE products (
product_id INTEGER NOT NULL PRIMARY KEY
, product_name text
, price DECIMAL(8,2)
);
INSERT INTO products(product_id, product_name, price) VALUES
(5, 'potato', 4.99)
,(6, 'potato *', 7.5)
,(1, 'potatoes', 7.48) -- added these
,(2, 'potatoe', 7.49) --
,(7, 'orange', 7.99)
;
ALTER TABLE products
ADD COLUMN parent_id INTEGER REFERENCES products(product_id)
, ADD COLUMN canonical_id INTEGER REFERENCES products(product_id);
UPDATE products
SET canonical_id = product_id;
SELECT*FROM products;
WITH xxx AS ( select product_id, product_name
, length(product_name) AS len
FROM products
)
UPDATE products dst
SET parent_id = src.product_id
FROM xxx src
-- WHERE position (src.product_name IN dst.product_name) = 1
WHERE dst.product_name LIKE src.product_name ||'%'::text
AND src.len > 4
AND src.len < length(dst.product_name)
AND NOT EXISTS (
SELECT * FROM xxx nx
WHERE dst.product_name LIKE nx.product_name|| '%'::text
AND nx.len < length(dst.product_name)
AND nx.len > src.len
AND nx.product_id <> dst.product_id
)
;
SELECT*FROM products;
WITH yyy AS ( select product_id, product_name
, length(product_name) AS len
FROM products
)
UPDATE products dst
SET canonical_id = src.product_id
FROM yyy src
WHERE dst.product_name LIKE src.product_name ||'%'::text
AND src.len > 4
AND src.len < length(dst.product_name)
AND NOT EXISTS (
SELECT * FROM yyy nx
WHERE dst.product_name LIKE nx.product_name|| '%'::text
AND nx.len < src.len
)
;
SELECT*FROM products;结果:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 5
ALTER TABLE
UPDATE 5
product_id | product_name | price | parent_id | canonical_id
------------+--------------+-------+-----------+--------------
5 | potato | 4.99 | | 5
6 | potato * | 7.50 | | 6
1 | potatoes | 7.48 | | 1
2 | potatoe | 7.49 | | 2
7 | orange | 7.99 | | 7
(5 rows)
UPDATE 3
product_id | product_name | price | parent_id | canonical_id
------------+--------------+-------+-----------+--------------
5 | potato | 4.99 | | 5
7 | orange | 7.99 | | 7
6 | potato * | 7.50 | 5 | 6
2 | potatoe | 7.49 | 5 | 2
1 | potatoes | 7.48 | 2 | 1
(5 rows)
UPDATE 3
product_id | product_name | price | parent_id | canonical_id
------------+--------------+-------+-----------+--------------
5 | potato | 4.99 | | 5
7 | orange | 7.99 | | 7
6 | potato * | 7.50 | 5 | 5
2 | potatoe | 7.49 | 5 | 5
1 | potatoes | 7.48 | 2 | 5
(5 rows)注意:这可能需要一些额外的启发式调整。(甚至手工编辑)
https://stackoverflow.com/questions/54442109
复制相似问题