我有两个简单的MySQL表-一个索引表t_id,它有一个唯一的主id;以及一个数据透视表t_data,它将这些id分布在不同的数据字段中:
CREATE TABLE `t_id` (
`id` bigint(12) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `t_data` (
`id` int(11) NOT NULL,
`field` varchar(50) CHARACTER SET cp1251 NOT NULL,
`value` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci
DEFAULT NULL,
UNIQUE KEY `idxfield` (`id`,`field`),
KEY `value` (`value`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8; 以下是一些示例数据:
+----+--------------+-------------------+
| id | field | value |
+----+--------------+-------------------+
| 1 | organization | Apple Inc. |
| 1 | state | CA |
| 2 | organization | Adobe Inc. |
| 2 | state | CA |
| 3 | organization | Alphabet Inc. |
| 3 | state | CA |
| 4 | organization | Rockwell Collins |
| 4 | state | IA |
| 5 | organization | GEICO |
| 5 | state | MD |
| 6 | organization | Anheuser-Busch |
| 6 | state | MO |
| 7 | organization | Bank of America |
| 7 | state | NC |
+----+--------------+-------------------+ 这可以用标准的数据透视表select查询来报告:
select
i.id,
ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',
ifnull (max(case when d.field = 'state' then d.value end),'') 'state'
from `t_id` i
left join `t_data` d
on i.id = d.id
group by i.id
limit 0,10 这个简单的例子只显示了两个“虚拟”字段(组织和州),以及7个唯一的id:
+----+------------------+-------+
| id | organization | state |
+----+------------------+-------+
| 1 | Apple Inc. | CA |
| 2 | Adobe Inc. | CA |
| 3 | Alphabet Inc. | CA |
| 4 | Rockwell Collins | IA |
| 5 | GEICO | MD |
| 6 | Anheuser-Busch | MO |
| 7 | Bank of America | NC |
+----+------------------+-------+ 在我们的实际生产环境中,我们有几十个“虚拟”字段(不只是2个),以及数百万个唯一的id(不只是7个)。数据库在单个id上执行crud类型的查询(不到一秒),甚至一次列出一个限制组(同样不到一秒)。当试图用where子句约束select (查询耗时数十秒)时,就会出现问题。例如,要查找加利福尼亚州的所有组织:
select
x.id,
x.organization,
x.state
from
(
select
i.id,
ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',
ifnull (max(case when d.field = 'state' then d.value end),'') 'state'
from `t_id` i
left join `t_data` d
on i.id = d.id
group by i.id
) as x
where x.state='CA'
limit 0,10
+----+---------------+-------+
| id | organization | state |
+----+---------------+-------+
| 1 | Apple Inc. | CA |
| 2 | Adobe Inc. | CA |
| 3 | Alphabet Inc. | CA |
+----+---------------+-------+ 这是有效的,但它需要很长的时间(同样,10秒)!这里的最佳实践是什么--有没有更好的方法来编写这些类型的查询?如何针对where子句优化这些透视表查询?
发布于 2019-10-06 02:51:12
对于大型数据集,这应该要快得多。此外,它可以很容易地扩展到任何数量的“虚拟”字段。您可以将任何搜索条件放在%%之间。
select
i.id,
coalesce(max(case when field = 'organization' then value end), '') as organization,
coalesce(max(case when field = 'state' then value end), '') as state
from t_id i
left join t_data d
on i.id = d.id
and i.id like '%%'
and i.id in (
select id
from `t_data`
where `field` = 'organization'
and `value` like '%%'
and id in (
select id
from `t_data`
where `field` = 'state'
and `value` like '%%'
)
)
group by i.id 发布于 2019-10-04 11:38:27
如果要查找在加利福尼亚州运营的组织,实际上不需要子查询:
SELECT
i.id,
COALESCE(MAX(CASE WHEN field = 'organization' THEN value END), '') AS organization,
COALESCE(MAX(CASE WHEN field = 'state' THEN value END), '') AS state
FROM t_id i
LEFT JOIN t_data d
ON i.id = d.id
GROUP BY
i.id
HAVING
COUNT(CASE WHEN field = 'state' AND value = 'CA' THEN 1 END) > 0;这里的技巧是在HAVING子句中断言,匹配的id组需要在加利福尼亚州拥有state的记录。
发布于 2019-10-14 09:31:58
这是EAV不是Pivot因此,解决方案在于“自连接”。
SELECT a.id,
a.value AS organization,
b.value AS state
FROM t_data AS a
JOIN t_data AS b ON a.id = b.id
WHERE a.field = 'organization'
AND b.field = 'state';如果您需要t_id来控制哪些in,请悄悄插入
JOIN t_id AS i ON i.id = a.id如果您想限制为CA,请添加
AND b.value = 'CA'并添加
INDEX(field, value)因此,它不必扫描那么多行来查找CA条目。
https://stackoverflow.com/questions/58229470
复制相似问题