文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在使用Where子句时优化/重构MySQL数据透视表性能

问如何在使用Where子句时优化/重构MySQL数据透视表性能
EN

Stack Overflow用户

提问于 2019-10-04 11:17:49

回答 3查看 314关注 0票数 1

我有两个简单的MySQL表-一个索引表t_id，它有一个唯一的主id；以及一个数据透视表t_data，它将这些id分布在不同的数据字段中：

CREATE TABLE `t_id` (  
  `id` bigint(12) NOT NULL AUTO_INCREMENT,  
  PRIMARY KEY (`id`)  
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;  

CREATE TABLE `t_data` (  
  `id` int(11) NOT NULL,  
  `field` varchar(50) CHARACTER SET cp1251 NOT NULL,  
  `value` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci 
DEFAULT NULL,  
  UNIQUE KEY `idxfield` (`id`,`field`),  
  KEY `value` (`value`)  
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

以下是一些示例数据：

+----+--------------+-------------------+  
| id | field        | value             |  
+----+--------------+-------------------+  
| 1  | organization | Apple Inc.        |  
| 1  | state        | CA                |  
| 2  | organization | Adobe Inc.        |  
| 2  | state        | CA                |  
| 3  | organization | Alphabet Inc.     |  
| 3  | state        | CA                |  
| 4  | organization | Rockwell Collins  |  
| 4  | state        | IA                |  
| 5  | organization | GEICO             |  
| 5  | state        | MD                |  
| 6  | organization | Anheuser-Busch    |  
| 6  | state        | MO                |  
| 7  | organization | Bank of America   |  
| 7  | state        | NC                |  
+----+--------------+-------------------+

这可以用标准的数据透视表select查询来报告：

select  
  i.id,  
  ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',  
  ifnull (max(case when d.field = 'state' then d.value end),'') 'state'  
from `t_id` i  
left join `t_data` d  
on i.id = d.id  
group by i.id  
limit 0,10

这个简单的例子只显示了两个“虚拟”字段(组织和州)，以及7个唯一的id：

+----+------------------+-------+  
| id | organization     | state |  
+----+------------------+-------+  
| 1  | Apple Inc.       |  CA   |  
| 2  | Adobe Inc.       |  CA   |  
| 3  | Alphabet Inc.    |  CA   |  
| 4  | Rockwell Collins |  IA   |  
| 5  | GEICO            |  MD   |  
| 6  | Anheuser-Busch   |  MO   |  
| 7  | Bank of America  |  NC   |  
+----+------------------+-------+

在我们的实际生产环境中，我们有几十个“虚拟”字段(不只是2个)，以及数百万个唯一的id(不只是7个)。数据库在单个id上执行crud类型的查询(不到一秒)，甚至一次列出一个限制组(同样不到一秒)。当试图用where子句约束select (查询耗时数十秒)时，就会出现问题。例如，要查找加利福尼亚州的所有组织：

select  
 x.id,  
 x.organization,  
 x.state  
from  
(  
select  
  i.id,  
  ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',  
  ifnull (max(case when d.field = 'state' then d.value end),'') 'state'  
from `t_id` i  
left join `t_data` d  
on i.id = d.id  
group by i.id  
) as x  
where x.state='CA'  
limit 0,10  


+----+---------------+-------+  
| id | organization  | state |  
+----+---------------+-------+  
| 1  | Apple Inc.    |  CA   |  
| 2  | Adobe Inc.    |  CA   |  
| 3  | Alphabet Inc. |  CA   |  
+----+---------------+-------+

这是有效的，但它需要很长的时间(同样，10秒)！这里的最佳实践是什么--有没有更好的方法来编写这些类型的查询？如何针对where子句优化这些透视表查询？

optimization

pivot-table

entity-attribute-value

mysql

回答 3

Stack Overflow用户

回答已采纳

发布于 2019-10-06 02:51:12

对于大型数据集，这应该要快得多。此外，它可以很容易地扩展到任何数量的“虚拟”字段。您可以将任何搜索条件放在%%之间。

select  
    i.id,  
    coalesce(max(case when field = 'organization' then value end), '') as organization,  
    coalesce(max(case when field = 'state' then value end), '') as state  
from t_id i  
left join t_data d  
on i.id = d.id  
and i.id like '%%'  
and i.id in (  
  select id  
  from `t_data`  
  where `field` = 'organization'   
  and `value` like '%%'  
  and  id  in (  
    select id  
    from `t_data`  
    where `field` = 'state'  
    and `value` like '%%'  
  )  
)  
group by i.id

票数 1

Stack Overflow用户

发布于 2019-10-04 11:38:27

如果要查找在加利福尼亚州运营的组织，实际上不需要子查询：

SELECT
    i.id,
    COALESCE(MAX(CASE WHEN field = 'organization' THEN value END), '') AS organization,
    COALESCE(MAX(CASE WHEN field = 'state' THEN value END), '') AS state
FROM t_id i
LEFT JOIN t_data d
    ON i.id = d.id
GROUP BY
    i.id
HAVING
    COUNT(CASE WHEN field = 'state' AND value = 'CA' THEN 1 END) > 0;

这里的技巧是在HAVING子句中断言，匹配的id组需要在加利福尼亚州拥有state的记录。

票数 2

Stack Overflow用户

发布于 2019-10-14 09:31:58

这是EAV不是Pivot因此，解决方案在于“自连接”。

SELECT  a.id,
        a.value AS organization,
        b.value AS state
    FROM t_data AS a
    JOIN t_data AS b  ON a.id = b.id
    WHERE a.field = 'organization'
      AND b.field = 'state';

如果您需要t_id来控制哪些in，请悄悄插入

    JOIN t_id AS i  ON i.id = a.id

如果您想限制为CA，请添加

      AND b.value = 'CA'

并添加

INDEX(field, value)

因此，它不必扫描那么多行来查找CA条目。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58229470

复制

相似问题

问如何在使用Where子句时优化/重构MySQL数据透视表性能
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在使用Where子句时优化/重构MySQL数据透视表性能EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在使用Where子句时优化/重构MySQL数据透视表性能
EN