文章/答案/技术大牛

发布

社区首页 >问答首页 >它是更好的复合索引(项目，item_type)或单列索引的item_type_1，item_type_2，.？

问它是更好的复合索引(项目，item_type)或单列索引的item_type_1，item_type_2，.？
EN

Database Administration用户

提问于 2019-07-23 09:31:56

回答 4查看 5K关注 0票数 4

根据一些业务逻辑，我们有三种类型的项要存储和查询。

create table a_table
(
   item_a   varchar2(30),
   item_b   varchar2(16),
   item_c   varchar2(2),
   -- other columns
);

以及各项指标

create unique index idx_1 on a_table (item_a);
create unique index idx_2 on a_table (item_b);
create unique index idx_3 on a_table (item_c);

特定类型的特定项目的数据将简单地理解为：

-- reading item_a
select ... from a_table where item_a = '...';

-- reading item_b
select ... from a_table where item_b = '...';

-- ...

如果项的类型为a，则不能为b类型，因此每一行必须只有一个带有值的item_a、item_b或item_c，其他行必须为null。(这可以以任何方式强制执行，它是否影响insert离子并不重要。)

理论上，项目类型的数量可能会增加(也许第四种项目类型可以在将来的某个地方添加)。

这一事实推动了另一种解决方案，它也避免了丑陋的互斥列(在给定的示例中不加检查)：

create table a_table
(
   item       varchar2(30) not null,
   item_type  varchar2(10) not null,
   -- other columns
);

create unique index idx_1 on a_table (item_type, item);

-- reading item_X
select ... from a_table where item='...' and item_type='item_X';

忽略了第一种解决方案在项目类型增加时的明显缺点和其他可能的设计缺陷，在访问(使用索引列选择行)、访问第一种情况下的表和访问第二种情况下的表之间是否存在重要的性能差异？

item_a和item_b的基数可达数百万，item_c的基数肯定在200以下。

Notes

假设item_a、item_b和item_c的选择数相等
在第一个解决方案中，考虑一个复合索引(item_c, item_a, item_b)当且仅当它将比三个单独的索引更好(从性能上讲)，但不要忘记，必须与第二个解决方案进行比较，其中两个索引列都将在where条件下使用:假设这两种情况都使用最好的“正确”索引/索引。

oracle

database-design

index

回答 4

Database Administration用户

回答已采纳

发布于 2019-07-24 10:21:02

我认为，从绩效的角度来看，这将是非常接近的。

但有些想法：

一个列索引不存储空值。因此，单个索引将更小。综合指数(item_type，项目)大小将大于单个指标之和。在极端情况下，甚至指数的高度也可能不同。
三个单独索引中的两个将是唯一的，因此与非唯一索引相比，读取的一致性要少一次。
对于item_c列，甲骨文可以建立精确的直方图。
在(item_type，item )的情况下，oracle将正确计算带有项目类型A和B的查询的基数，但对于" item _type='C‘和item=:X“则可能低估。这可能不是问题，但是在复杂的查询中，使用联接可以给出无效的计划。所以你需要检查一下。
即将将索引(item_a，item_b，item_c)与(item_type，item )进行比较--就像其他人说的那样，它几乎只对"item_a=:X“这样的查询起作用，而当"item_b=:Y”时就不能工作了。因此，对于第一种情况，使用索引(item_a、item_b、item_c)只是一个坏主意，因为我们对item_b和item_c列有相同数量的查询。

    create table hr.tt nologging
    as
    select case when mod(rownum,3) = 0 then rownum end AS item_a,
           case when mod(rownum,3) = 1 then rownum end AS item_b,
           case when mod(rownum,3) = 2 then round(dbms_random.value(1,200)) end AS item_c,
           o.* 
     from all_objects o, 
    (select * from dual connect by level < 100) d 
     where rownum <= 5e6; 

    create unique index hr.item_a_idx on hr.tt(item_a) nologging;

    create unique index hr.item_b_idx on hr.tt(item_b) nologging;      

    create index hr.item_c_idx on hr.tt(item_c) nologging;


    create table hr.tt2 nologging
    as
    select case when mod(rownum,3) in (0,1) then rownum
                else round(dbms_random.value(1,200)) 
                end AS item,
           case when mod(rownum,3) = 0 then 'A' 
                when mod(rownum,3) = 1 then 'B' 
                when mod(rownum,3) = 2 then 'C' end AS item_type,
           o.* 
     from all_objects o, 
    (select * from dual connect by level < 100) d 
     where rownum <= 5e6;

    create index hr.item_item_type_idx on hr.tt2(item_type, item) nologging;

一些疑问

SYS@mydb> select * from hr.tt where item_a = 300;


------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |   197 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TT         |     1 |   197 |     3   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | ITEM_A_IDX |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
...
          4  consistent gets

SYS@mydb> select * from hr.tt2 where item_type = 'A'  and item = 300;

1 row selected.


--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |     1 |   115 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TT2                |     1 |   115 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ITEM_ITEM_TYPE_IDX |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          5  consistent gets

下面是一个优化器计算错误的例子(oracle11.2)。优化器认为在“索引范围扫描”中只有一行。实际上我们看到了8438行。这两列的扩展优化器统计信息可能有帮助，但可能没有帮助。

SYS@mydb> select * from hr.tt2 where item_type = 'C'  and item = 150;

8438 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4063424880

--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |     1 |   115 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TT2                |     1 |   115 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ITEM_ITEM_TYPE_IDX |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ITEM_TYPE"='C' AND "ITEM"=150)


Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       8618  consistent gets
       . . .
       8438  rows processed

下面是单个item_c列的相同情况。这里的优化器计算是正确的- 8133行估计，8290行选择。

SYS@mydb> select * from hr.tt where item_c = 150;

8290 rows selected.


Execution Plan
----------------------------------------------------------
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |  8133 |   826K|  7737   (1)| 00:01:33 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TT         |  8133 |   826K|  7737   (1)| 00:01:33 |
|*  2 |   INDEX RANGE SCAN          | ITEM_C_IDX |  8133 |       |    19   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ITEM_C"=150)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       8499  consistent gets
       . . .
       8290  rows processed

票数 5

Database Administration用户

发布于 2019-07-24 07:28:46

我不会说甲骨文，但我看不出第一个场景有什么好处，无论是逻辑上还是性能上。正如@brian-leach在他的评论中所建议的那样，你应该强制执行相互排他性，但这会减缓插入速度。

使用这个场景，我不仅会担心将item_d添加到集合中，而且当item_x获得一个附加属性时会发生什么？模型2很容易以一种体面的方式扩展，模型1最终会变成一个空属性的地狱。

显然，您确信第二个选项更好(我非常同意)，以使其他人相信您应该在比较它们的地方设置一个测试。在这两个模型中生成1000000 item_a、1000000 item_b和1000 item_c应该是一个简单的任务。

票数 3

Database Administration用户

发布于 2019-07-23 13:34:39

与以往一样，这一切都取决于您将如何访问数据。

如果您只通过item_1查询表，那么对另外两个表进行索引将是浪费时间。

如果您通常通过item_1 (有时也通过item_2 )查询表，那么在item_1和item_2上创建一个复合索引可能是个好主意。您的数据库也可以为item_1只查询使用复合索引，但您必须使用最常使用的字段“启动”该索引。

找到答案的最好方法？试试看。

创建一个测试数据库，加载一些有代表性的数据，创建一些索引，并为您的典型查询获取执行计划。

您的item/item_type表使用了一个名为“实体/属性值”的概念，通常情况下，这个概念的缩放非常糟糕。适当的索引应该会给您更好的性能。

票数 1

页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://dba.stackexchange.com/questions/243517

复制

相似问题

问它是更好的复合索引(项目，item_type)或单列索引的item_type_1，item_type_2，.？
EN

Notes

回答 4

Database Administration用户

Database Administration用户

Database Administration用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问它是更好的复合索引(项目，item_type)或单列索引的item_type_1，item_type_2，.？EN

Notes

回答 4

Database Administration用户

Database Administration用户

Database Administration用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问它是更好的复合索引(项目，item_type)或单列索引的item_type_1，item_type_2，.？
EN