感觉好像我们已经创建了所有适当的索引,但是有些东西让人感觉很不舒服,需要很长时间。尽管在这两个字段上都有索引,但它仍在对代码段执行顺序扫描。
snippets中有47M条记录,abuse_types_snippets中有510万条记录。
SELECT count(1)
FROM snippets
WHERE snippets.id IN (
SELECT abuse_types_snippets.snippet_id
FROM abuse_types_snippets
WHERE abuse_types_snippets.abuse_type_id = 1047122679
)
AND snippets.status IN (3,4,5)
AND snippets.created_at BETWEEN '2018-01-01' AND '2018-08-22'
Aggregate (cost=1053185.69..1053185.70 rows=1 width=0) (actual time=52116.440..52116.441 rows=1 loops=1)
-> Hash Join (cost=40922.22..1052044.88 rows=456325 width=0) (actual time=3779.600..51979.138 rows=500147 loops=1)
Hash Cond: (snippets.id = abuse_types_snippets.snippet_id)
-> Seq Scan on snippets (cost=0.00..948983.91 rows=3321771 width=4) (actual time=0.009..45231.634 rows=3277699 loops=1)
Filter: ((created_at >= '2018-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2018-08-22 00:00:00'::timestamp without time zone) AND (status = ANY ('{3,4,5}'::integer[])))
Rows Removed by Filter: 1502893
-> Hash (cost=30179.02..30179.02 rows=654816 width=4) (actual time=3778.402..3778.402 rows=668040 loops=1)
Buckets: 16384 Batches: 8 Memory Usage: 3283kB
-> Index Only Scan using index_abuse_types_snippets_on_abuse_type_id_and_snippet_id on abuse_types_snippets (cost=0.43..30179.02 rows=654816 width=4) (actual time=0.916..3457.052 rows=668040 loops=1)
Index Cond: (abuse_type_id = 1047122656)
Heap Fetches: 398150
Planning time: 0.319 ms
Execution time: 52117.034 ms添加,替代方法(加入而不是子选择,没有更好的性能):
EXPLAIN ANALYZE SELECT count(1)
FROM snippets
join abuse_types_snippets on abuse_types_snippets.snippet_id = snippets.id
WHERE abuse_types_snippets.abuse_type_id = 1047122656
AND snippets.status IN (3,4,5)
AND snippets.created_at BETWEEN '2018-01-01' AND '2018-08-22'
Aggregate (cost=1061553.74..1061553.75 rows=1 width=0) (actual time=53597.015..53597.015 rows=1 loops=1)
-> Hash Join (cost=1003482.48..1060412.92 rows=456325 width=0) (actual time=49697.488..53445.218 rows=500147 loops=1)
Hash Cond: (abuse_types_snippets.snippet_id = snippets.id)
-> Index Only Scan using index_abuse_types_snippets_on_abuse_type_id_and_snippet_id on abuse_types_snippets (cost=0.43..30183.02 rows=654816 width=4) (actual time=0.977..3060.787 rows=668108 loops=1)
Index Cond: (abuse_type_id = 1047122656)
Heap Fetches: 396932
-> Hash (cost=948983.91..948983.91 rows=3321771 width=4) (actual time=49233.187..49233.187 rows=3277699 loops=1)
Buckets: 16384 Batches: 64 Memory Usage: 1819kB
-> Seq Scan on snippets (cost=0.00..948983.91 rows=3321771 width=4) (actual time=0.010..47723.895 rows=3277699 loops=1)
Filter: ((created_at >= '2018-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2018-08-22 00:00:00'::timestamp without time zone) AND (status = ANY ('{3,4,5}'::integer[])))
Rows Removed by Filter: 1503495
Planning time: 0.324 ms
Execution time: 53597.528 ms表模式:
CREATE TABLE "public"."snippets" (
"id" int4 NOT NULL DEFAULT nextval('snippets_id_seq'::regclass),
"status" int4 DEFAULT 0,
"provider" varchar NOT NULL,
"score" jsonb DEFAULT '{}'::jsonb,
"connection_id" int4,
"child_id" int4,
"flagged" bool DEFAULT false,
"created_at" timestamp(6) NOT NULL,
"updated_at" timestamp(6) NOT NULL,
"posted_at" timestamp(6) NOT NULL,
"other_id" text,
"other_name" text,
"content_hash" varchar,
"locked_at" timestamp(6) NULL,
"activity_ids_for_dynamodb" varchar[] DEFAULT '{}'::character varying[],
"filter_media" bool DEFAULT false,
"severity" float8 DEFAULT 1.0,
"migrated" bool DEFAULT false,
"notified_authorities_at" timestamp(6) NULL,
CONSTRAINT "snippets_pkey" PRIMARY KEY ("id") NOT DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT "fk_rails_2c8d6dd3ab" FOREIGN KEY ("connection_id") REFERENCES "public"."connections" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION NOT DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT "fk_rails_bf28d24ffc" FOREIGN KEY ("child_id") REFERENCES "public"."children" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION NOT DEFERRABLE INITIALLY IMMEDIATE
)
WITH (OIDS=FALSE);
ALTER TABLE "public"."snippets" OWNER TO "bark";
CREATE INDEX "index_snippets_on_child_id" ON "public"."snippets" USING btree(child_id ASC NULLS LAST);
CREATE INDEX "index_snippets_on_child_id_and_status" ON "public"."snippets" USING btree(child_id ASC NULLS LAST, status ASC NULLS LAST);
CREATE INDEX "index_snippets_on_connection_id" ON "public"."snippets" USING btree(connection_id ASC NULLS LAST);
CREATE INDEX "index_snippets_on_content_hash" ON "public"."snippets" USING btree(content_hash ASC NULLS LAST);
CREATE INDEX "index_snippets_on_created_at" ON "public"."snippets" USING btree(created_at ASC NULLS LAST);
CREATE INDEX "index_snippets_on_created_at_and_status" ON "public"."snippets" USING btree(created_at ASC NULLS LAST, status ASC NULLS LAST);
CREATE INDEX "index_snippets_on_locked_at" ON "public"."snippets" USING btree(locked_at ASC NULLS LAST);
CREATE INDEX "index_snippets_on_posted_at" ON "public"."snippets" USING btree(posted_at ASC NULLS LAST);
CREATE INDEX "index_snippets_on_status" ON "public"."snippets" USING btree(status ASC NULLS LAST);
CREATE TABLE "public"."abuse_types_snippets" (
"abuse_type_id" int4 NOT NULL,
"snippet_id" int4 NOT NULL,
CONSTRAINT "fk_rails_4e19f07aa3" FOREIGN KEY ("abuse_type_id") REFERENCES "public"."abuse_types" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION NOT DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT "fk_rails_c0ca4cc2fc" FOREIGN KEY ("snippet_id") REFERENCES "public"."snippets" ("id") ON UPDATE NO ACTION ON DELETE NO ACTION NOT DEFERRABLE INITIALLY IMMEDIATE
)
WITH (OIDS=FALSE);
ALTER TABLE "public"."abuse_types_snippets" OWNER TO "bark";
CREATE INDEX "index_abuse_types_snippets_on_abuse_type_id" ON "public"."abuse_types_snippets" USING btree(abuse_type_id ASC NULLS LAST);
CREATE UNIQUE INDEX "index_abuse_types_snippets_on_abuse_type_id_and_snippet_id" ON "public"."abuse_types_snippets" USING btree(abuse_type_id ASC NULLS LAST, snippet_id ASC NULLS LAST);
CREATE INDEX "index_abuse_types_snippets_on_snippet_id" ON "public"."abuse_types_snippets" USING btree(snippet_id ASC NULLS LAST);发布于 2018-08-28 04:55:49
正如您所说,这里的问题是不顾索引运行的顺序扫描。当查询规划者认为使用索引无法获得足够的数据时,就会发生这种情况。
它应该在选择数据时使用索引index_snippets_on_created_at_and_status,但是它没有,很可能是因为基数错误。简而言之,您需要在status和created_at上切换位置
如果您不知道btree索引是如何工作的,这里有一个(糟糕的)尝试来解释它。
索引的布局就像一棵树,第一列作为顶叶(根/茎),第二列作为叶的子叶(或枝条)。这意味着在遍历索引时,它会查找第一列,然后选择与第二列匹配的子列。
假设您在created_at列中有一个时间戳,并且假设有5个不同的状态,那么每天在最上面的行中有864000个可能的位置需要查找,然后筛选出那些不匹配状态的行。这将导致很大的成本,特别是因为您已经将表中的大部分包含在所选内容中。
另一方面,如果status是第一个,它只需在索引中查找3个位置,然后对第二列进行范围扫描,这将降低成本。
因此,改变立场可能有帮助,但这当然不能保证。由于您从4.7M中选择了320万行,规划者可能选择不使用索引,因为主要成本可能是行的实际获取。
如果发生这种情况,您也可以尝试在索引中包括id (作为最后一列),因为那时规划者可能可以使用它从abuse_types_snippets中筛选,直到最后一分钟才查找行。
https://dba.stackexchange.com/questions/216009
复制相似问题