文章/答案/技术大牛

发布

社区首页 >问答首页 >需要帮助优化缓慢的Postgres查询

问需要帮助优化缓慢的Postgres查询
EN

Database Administration用户

提问于 2018-08-28 03:27:59

回答 1查看 69关注 0票数 0

感觉好像我们已经创建了所有适当的索引，但是有些东西让人感觉很不舒服，需要很长时间。尽管在这两个字段上都有索引，但它仍在对代码段执行顺序扫描。

snippets中有47M条记录，abuse_types_snippets中有510万条记录。

SELECT count(1)
    FROM snippets
    WHERE snippets.id IN (
        SELECT abuse_types_snippets.snippet_id
        FROM abuse_types_snippets
        WHERE abuse_types_snippets.abuse_type_id = 1047122679
    )
    AND snippets.status IN (3,4,5) 
    AND snippets.created_at BETWEEN '2018-01-01' AND '2018-08-22'

Aggregate  (cost=1053185.69..1053185.70 rows=1 width=0) (actual time=52116.440..52116.441 rows=1 loops=1)
  ->  Hash Join  (cost=40922.22..1052044.88 rows=456325 width=0) (actual time=3779.600..51979.138 rows=500147 loops=1)
        Hash Cond: (snippets.id = abuse_types_snippets.snippet_id)
        ->  Seq Scan on snippets  (cost=0.00..948983.91 rows=3321771 width=4) (actual time=0.009..45231.634 rows=3277699 loops=1)
              Filter: ((created_at >= '2018-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2018-08-22 00:00:00'::timestamp without time zone) AND (status = ANY ('{3,4,5}'::integer[])))
              Rows Removed by Filter: 1502893
        ->  Hash  (cost=30179.02..30179.02 rows=654816 width=4) (actual time=3778.402..3778.402 rows=668040 loops=1)
              Buckets: 16384  Batches: 8  Memory Usage: 3283kB
              ->  Index Only Scan using index_abuse_types_snippets_on_abuse_type_id_and_snippet_id on abuse_types_snippets  (cost=0.43..30179.02 rows=654816 width=4) (actual time=0.916..3457.052 rows=668040 loops=1)
                    Index Cond: (abuse_type_id = 1047122656)
                    Heap Fetches: 398150
Planning time: 0.319 ms
Execution time: 52117.034 ms

添加，替代方法(加入而不是子选择，没有更好的性能)：

EXPLAIN ANALYZE SELECT count(1)
    FROM snippets
    join abuse_types_snippets on abuse_types_snippets.snippet_id = snippets.id
    WHERE abuse_types_snippets.abuse_type_id = 1047122656
    AND snippets.status IN (3,4,5) 
    AND snippets.created_at BETWEEN '2018-01-01' AND '2018-08-22'

Aggregate  (cost=1061553.74..1061553.75 rows=1 width=0) (actual time=53597.015..53597.015 rows=1 loops=1)
  ->  Hash Join  (cost=1003482.48..1060412.92 rows=456325 width=0) (actual time=49697.488..53445.218 rows=500147 loops=1)
        Hash Cond: (abuse_types_snippets.snippet_id = snippets.id)
        ->  Index Only Scan using index_abuse_types_snippets_on_abuse_type_id_and_snippet_id on abuse_types_snippets  (cost=0.43..30183.02 rows=654816 width=4) (actual time=0.977..3060.787 rows=668108 loops=1)
              Index Cond: (abuse_type_id = 1047122656)
              Heap Fetches: 396932
        ->  Hash  (cost=948983.91..948983.91 rows=3321771 width=4) (actual time=49233.187..49233.187 rows=3277699 loops=1)
              Buckets: 16384  Batches: 64  Memory Usage: 1819kB
              ->  Seq Scan on snippets  (cost=0.00..948983.91 rows=3321771 width=4) (actual time=0.010..47723.895 rows=3277699 loops=1)
                    Filter: ((created_at >= '2018-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2018-08-22 00:00:00'::timestamp without time zone) AND (status = ANY ('{3,4,5}'::integer[])))
                    Rows Removed by Filter: 1503495
Planning time: 0.324 ms
Execution time: 53597.528 ms

表模式：

CREATE TABLE "public"."snippets" (
    "id" int4 NOT NULL DEFAULT nextval('snippets_id_seq'::regclass),
    "status" int4 DEFAULT 0,
    "provider" varchar NOT NULL,
    "score" jsonb DEFAULT '{}'::jsonb,
    "connection_id" int4,
    "child_id" int4,
    "flagged" bool DEFAULT false,
    "created_at" timestamp(6) NOT NULL,
    "updated_at" timestamp(6) NOT NULL,
    "posted_at" timestamp(6) NOT NULL,
    "other_id" text,
    "other_name" text,
    "content_hash" varchar,
    "locked_at" timestamp(6) NULL,
    "activity_ids_for_dynamodb" varchar[] DEFAULT '{}'::character varying[],
    "filter_media" bool DEFAULT false,
    "severity" float8 DEFAULT 1.0,
    "migrated" bool DEFAULT false,
    "notified_authorities_at" timestamp(6) NULL,
    CONSTRAINT "snippets_pkey" PRIMARY KEY ("id") NOT DEFERRABLE INITIALLY IMMEDIATE,
    CONSTRAINT "fk_rails_2c8d6dd3ab" FOREIGN KEY ("connection_id") REFERENCES "public"."connections" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION NOT DEFERRABLE INITIALLY IMMEDIATE,
    CONSTRAINT "fk_rails_bf28d24ffc" FOREIGN KEY ("child_id") REFERENCES "public"."children" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION NOT DEFERRABLE INITIALLY IMMEDIATE
)
WITH (OIDS=FALSE);
ALTER TABLE "public"."snippets" OWNER TO "bark";
CREATE INDEX "index_snippets_on_child_id" ON "public"."snippets" USING btree(child_id ASC NULLS LAST);
CREATE INDEX "index_snippets_on_child_id_and_status" ON "public"."snippets" USING btree(child_id ASC NULLS LAST, status ASC NULLS LAST);
CREATE INDEX "index_snippets_on_connection_id" ON "public"."snippets" USING btree(connection_id ASC NULLS LAST);
CREATE INDEX "index_snippets_on_content_hash" ON "public"."snippets" USING btree(content_hash ASC NULLS LAST);
CREATE INDEX "index_snippets_on_created_at" ON "public"."snippets" USING btree(created_at ASC NULLS LAST);
CREATE INDEX "index_snippets_on_created_at_and_status" ON "public"."snippets" USING btree(created_at ASC NULLS LAST, status ASC NULLS LAST);
CREATE INDEX "index_snippets_on_locked_at" ON "public"."snippets" USING btree(locked_at ASC NULLS LAST);
CREATE INDEX "index_snippets_on_posted_at" ON "public"."snippets" USING btree(posted_at ASC NULLS LAST);
CREATE INDEX "index_snippets_on_status" ON "public"."snippets" USING btree(status ASC NULLS LAST);

CREATE TABLE "public"."abuse_types_snippets" (
    "abuse_type_id" int4 NOT NULL,
    "snippet_id" int4 NOT NULL,
    CONSTRAINT "fk_rails_4e19f07aa3" FOREIGN KEY ("abuse_type_id") REFERENCES "public"."abuse_types" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION NOT DEFERRABLE INITIALLY IMMEDIATE,
    CONSTRAINT "fk_rails_c0ca4cc2fc" FOREIGN KEY ("snippet_id") REFERENCES "public"."snippets" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION NOT DEFERRABLE INITIALLY IMMEDIATE
)
WITH (OIDS=FALSE);
ALTER TABLE "public"."abuse_types_snippets" OWNER TO "bark";
CREATE INDEX "index_abuse_types_snippets_on_abuse_type_id" ON "public"."abuse_types_snippets" USING btree(abuse_type_id ASC NULLS LAST);
CREATE UNIQUE INDEX "index_abuse_types_snippets_on_abuse_type_id_and_snippet_id" ON "public"."abuse_types_snippets" USING btree(abuse_type_id ASC NULLS LAST, snippet_id ASC NULLS LAST);
CREATE INDEX "index_abuse_types_snippets_on_snippet_id" ON "public"."abuse_types_snippets" USING btree(snippet_id ASC NULLS LAST);

postgresql

index

performance

postgresql-performance

回答 1

Database Administration用户

回答已采纳

发布于 2018-08-28 04:55:49

正如您所说，这里的问题是不顾索引运行的顺序扫描。当查询规划者认为使用索引无法获得足够的数据时，就会发生这种情况。

它应该在选择数据时使用索引index_snippets_on_created_at_and_status，但是它没有，很可能是因为基数错误。简而言之，您需要在status和created_at上切换位置

如果您不知道btree索引是如何工作的，这里有一个(糟糕的)尝试来解释它。

索引的布局就像一棵树，第一列作为顶叶(根/茎)，第二列作为叶的子叶(或枝条)。这意味着在遍历索引时，它会查找第一列，然后选择与第二列匹配的子列。

假设您在created_at列中有一个时间戳，并且假设有5个不同的状态，那么每天在最上面的行中有864000个可能的位置需要查找，然后筛选出那些不匹配状态的行。这将导致很大的成本，特别是因为您已经将表中的大部分包含在所选内容中。

另一方面，如果status是第一个，它只需在索引中查找3个位置，然后对第二列进行范围扫描，这将降低成本。

因此，改变立场可能有帮助，但这当然不能保证。由于您从4.7M中选择了320万行，规划者可能选择不使用索引，因为主要成本可能是行的实际获取。

如果发生这种情况，您也可以尝试在索引中包括id (作为最后一列)，因为那时规划者可能可以使用它从abuse_types_snippets中筛选，直到最后一分钟才查找行。

票数 1

页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://dba.stackexchange.com/questions/216009

复制

相似问题

问需要帮助优化缓慢的Postgres查询
EN

回答 1

Database Administration用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问需要帮助优化缓慢的Postgres查询EN

回答 1

Database Administration用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问需要帮助优化缓慢的Postgres查询
EN