文章/答案/技术大牛

发布

社区首页 >问答首页 >使用数据和大(数百万行)部分索引重新检查cond的选择性能差

问使用数据和大(数百万行)部分索引重新检查cond的选择性能差
EN

Database Administration用户

提问于 2016-05-11 22:02:40

回答 1查看 1.1K关注 0票数 3

查询：

EXPLAIN (ANALYZE, BUFFERS) SELECT
    COUNT (*) AS "count"
FROM
    "Posts" AS "Post"
WHERE
    "Post"."createdAt" > '2015-08-19 14:55:50.398'
AND "Post"."new" = TRUE;

索引：

CREATE INDEX posts_new_createdat_idx ON "Posts" ("createdAt")
WHERE
    NEW = TRUE

计划：

Aggregate  (cost=389915.59..389915.60 rows=1 width=0) (actual time=4234.772..4234.773 rows=1 loops=1)
  Buffers: shared hit=254427
  ->  Bitmap Heap Scan on "Posts" "Post"  (cost=14415.81..387990.63 rows=769985 width=0) (actual time=123.805..3859.150 rows=1138854 loops=1)
        Recheck Cond: (("createdAt" > '2015-08-19 14:55:50.398+00'::timestamp with time zone) AND new)
        Rows Removed by Index Recheck: 8238790
        Buffers: shared hit=254427
        ->  Bitmap Index Scan on posts_new_createdat_idx  (cost=0.00..14223.32 rows=769985 width=0) (actual time=122.601..122.601 rows=1138854 loops=1)
              Index Cond: ("createdAt" > '2015-08-19 14:55:50.398+00'::timestamp with time zone)
              Buffers: shared hit=3114
Total runtime: 4234.989 ms

模式：

CREATE TABLE "public"."Posts" (
    "id" int4 NOT NULL DEFAULT nextval('"Posts_id_seq"'::regclass),
    "actionId" int4,
    "commentCount" int4 DEFAULT 0,
    "facebook" bool,
    "featurePostOnDate" timestamp(6) WITH TIME ZONE,
    "forcedPrivate" bool,
    "instagram" bool,
    "isReported" bool,
    "likeCount" int4 DEFAULT 0,
    "note" text COLLATE "default",
    "photo" varchar(255) COLLATE "default",
    "private" bool,
    "new" bool,
    "popular" bool,
    "twitter" bool,
    "userId" int4,
    "objectId" varchar(255) COLLATE "default",
    "createdAt" timestamp(6) WITH TIME ZONE,
    "updatedAt" timestamp(6) WITH TIME ZONE,
    "activityLogId" int4,
    "weightLogId" int4,
    "workoutId" int4,
    "workoutLogId" int4,
    "thumbnail" varchar(255) COLLATE "default"
)

数据：

new = true为99%或记录
任何超过2周的帖子都可以被忽略(从计数和索引)。

数据库详细信息：

Host           = Amazon AWS
Engine         = PostgreSQL 9.3.10
Instance Class = db.r3.8xlarge
Storage Type   = SSD
IOPS           = 3000
StorageAmount  = 500 GB

正如您所看到的，部分索引非常大。是否有更好的索引方法，以便重新检查条件不会太重？

postgresql

index

optimization

回答 1

Database Administration用户

发布于 2016-05-12 19:17:15

在这个特定查询中进行重新检查的唯一原因是位图太大，无法适应work_mem，因此必须将其降级为有损。因此，为了避免再次检查，如果你有能力的话，试着增加work_mem。你不应该为了持有1138854元组而把它增加很多。

新版本的PostgreSQL通过在“解释(分析)”中包括这样的行使这一点更加清晰：

Rows Removed by Index Recheck: 33
Heap Blocks: exact=558 lossy=15682

但是，一旦你摆脱了复核，就不要指望这里有什么神奇的东西。与其他事情相比，您可能高估了在重新检查上花费的时间。

票数 1

页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://dba.stackexchange.com/questions/138276

复制

相似问题

问使用数据和大(数百万行)部分索引重新检查cond的选择性能差
EN

回答 1

Database Administration用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用数据和大(数百万行)部分索引重新检查cond的选择性能差EN

回答 1

Database Administration用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用数据和大(数百万行)部分索引重新检查cond的选择性能差
EN