首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >PostgreSQL中的慢速查询,从定义在两列中的范围之间选择一行

PostgreSQL中的慢速查询,从定义在两列中的范围之间选择一行
EN

Database Administration用户
提问于 2017-11-09 14:03:50
回答 2查看 1.2K关注 0票数 6

我已经导入了一个包含3,319,097行的ip2location_db11 lite数据库副本,并且我正在优化一个数值范围查询,其中低值和高值位于表的单独列中(ip_fromip_to)。

导入数据库:

代码语言:javascript
复制
CREATE TABLE ip2location_db11
(
  ip_from bigint NOT NULL, -- First IP address in netblock.
  ip_to bigint NOT NULL, -- Last IP address in netblock.
  country_code character(2) NOT NULL, -- Two-character country code based on ISO 3166.
  country_name character varying(64) NOT NULL, -- Country name based on ISO 3166.
  region_name character varying(128) NOT NULL, -- Region or state name.
  city_name character varying(128) NOT NULL, -- City name.
  latitude real NOT NULL, -- City latitude. Default to capital city latitude if city is unknown.
  longitude real NOT NULL, -- City longitude. Default to capital city longitude if city is unknown.
  zip_code character varying(30) NOT NULL, -- ZIP/Postal code.
  time_zone character varying(8) NOT NULL, -- UTC time zone (with DST supported).
  CONSTRAINT ip2location_db11_pkey PRIMARY KEY (ip_from, ip_to)
);
\copy ip2location_db11 FROM 'IP2LOCATION-LITE-DB11.CSV' WITH CSV QUOTE AS '"';

我第一个天真的索引尝试是在每一列上创建单独的索引,结果是连续扫描,查询时间为400 My:

代码语言:javascript
复制
account=> CREATE INDEX ip_from_db11_idx ON ip2location_db11 (ip_from);
account=> CREATE INDEX ip_to_db11_idx ON ip2location_db11 (ip_to);

account=> EXPLAIN ANALYZE VERBOSE SELECT * FROM ip2location_db11 WHERE 2538629520 BETWEEN ip_from AND ip_to;

                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.ip2location_db11  (cost=0.00..48930.99 rows=43111 width=842) (actual time=286.714..401.805 rows=1 loops=1)
   Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone
   Filter: (('2538629520'::bigint >= ip2location_db11.ip_from) AND ('2538629520'::bigint <= ip2location_db11.ip_to))
   Rows Removed by Filter: 3319096
 Planning time: 0.155 ms
 Execution time: 401.834 ms
(6 rows)

account=> \d ip2location_db11
          Table "public.ip2location_db11"
    Column    |          Type          | Modifiers
--------------+------------------------+-----------
 ip_from      | bigint                 | not null
 ip_to        | bigint                 | not null
 country_code | character(2)           | not null
 country_name | character varying(64)  | not null
 region_name  | character varying(128) | not null
 city_name    | character varying(128) | not null
 latitude     | real                   | not null
 longitude    | real                   | not null
 zip_code     | character varying(30)  | not null
 time_zone    | character varying(8)   | not null
Indexes:
    "ip2location_db11_pkey" PRIMARY KEY, btree (ip_from, ip_to)
    "ip_from_db11_idx" btree (ip_from)
    "ip_to_db11_idx" btree (ip_to)

我的第二次尝试是创建一个多列btree索引,这会导致查询次数为290 My的索引扫描:

代码语言:javascript
复制
account=> CREATE INDEX ip_range_db11_idx ON ip2location_db11 (ip_from,ip_to);

account=> EXPLAIN ANALYZE VERBOSE SELECT * FROM ip2location_db11 WHERE 2538629520 BETWEEN ip_from AND ip_to;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ip_to_db11_idx on public.ip2location_db11 (cost=0.43..51334.91 rows=756866 width=69) (actual time=1.109..289.143 rows=1 loops=1)
   Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone
   Index Cond: ('2538629520'::bigint <= ip2location_db11.ip_to)
   Filter: ('2538629520'::bigint >= ip2location_db11.ip_from)
   Rows Removed by Filter: 1160706
 Planning time: 0.324 ms
 Execution time: 289.172 ms
(7 rows)

n4l_account=> \d ip2location_db11
          Table "public.ip2location_db11"
    Column    |          Type          | Modifiers
--------------+------------------------+-----------
 ip_from      | bigint                 | not null
 ip_to        | bigint                 | not null
 country_code | character(2)           | not null
 country_name | character varying(64)  | not null
 region_name  | character varying(128) | not null
 city_name    | character varying(128) | not null
 latitude     | real                   | not null
 longitude    | real                   | not null
 zip_code     | character varying(30)  | not null
 time_zone    | character varying(8)   | not null
Indexes:
    "ip2location_db11_pkey" PRIMARY KEY, btree (ip_from, ip_to)
    "ip_from_db11_idx" btree (ip_from)
    "ip_range_db11_idx" btree (ip_from, ip_to)
    "ip_to_db11_idx" btree (ip_to)

更新:根据评论中的要求,我重新做了上面的查询.重新创建表(165 of、65 of、86 of、83 of、86 of、64 of、85 of、811 of、868 of、845 of、810 of、781 of、797 of、890 of、806 of)后的前15个查询的时间:

代码语言:javascript
复制
account=> EXPLAIN (ANALYZE, VERBOSE, BUFFERS, TIMING) SELECT * FROM ip2location_db11 WHERE 2538629520 BETWEEN ip_from AND ip_to;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.ip2location_db11  (cost=28200.29..76843.12 rows=368789 width=842) (actual time=64.866..64.866 rows=1 loops=1)
   Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone
   Recheck Cond: (('2538629520'::bigint >= ip2location_db11.ip_from) AND ('2538629520'::bigint <= ip2location_db11.ip_to))
   Heap Blocks: exact=1
   Buffers: shared hit=8273
   ->  Bitmap Index Scan on ip_range_db11_idx  (cost=0.00..28108.09 rows=368789 width=0) (actual time=64.859..64.859 rows=1 loops=1)
         Index Cond: (('2538629520'::bigint >= ip2location_db11.ip_from) AND ('2538629520'::bigint <= ip2location_db11.ip_to))
         Buffers: shared hit=8272
 Planning time: 0.099 ms
 Execution time: 64.907 ms
(10 rows)

account=> EXPLAIN (ANALYZE, VERBOSE, BUFFERS, TIMING) SELECT * FROM ip2location_db11 WHERE 2538629520 BETWEEN ip_from AND ip_to;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.ip2location_db11  (cost=0.00..92906.18 rows=754776 width=69) (actual time=577.234..811.757 rows=1 loops=1)
   Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone
   Filter: (('2538629520'::bigint >= ip2location_db11.ip_from) AND ('2538629520'::bigint <= ip2location_db11.ip_to))
   Rows Removed by Filter: 3319096
   Buffers: shared hit=33 read=43078
 Planning time: 0.667 ms
 Execution time: 811.783 ms
(7 rows)

从导入的CSV文件中的示例行:

代码语言:javascript
复制
"0","16777215","-","-","-","-","0.000000","0.000000","-","-"
"16777216","16777471","AU","Australia","Queensland","Brisbane","-27.467940","153.028090","4000","+10:00"
"16777472","16778239","CN","China","Fujian","Fuzhou","26.061390","119.306110","350004","+08:00"

是否有更好的方法来索引这个表来改进查询,或者是否有一个更有效的查询可以获得相同的结果?

EN

回答 2

Database Administration用户

发布于 2017-11-09 14:28:13

由于有了注释,我有了一个解决方案,通过使用gist空间索引并相应地调整查询,将查询时间缩短到0.073ms:

代码语言:javascript
复制
account=> DROP INDEX ip_to_db11_idx;
account=> DROP INDEX ip_from_db11_idx;
account=> DROP INDEX ip_range_db11_idx;
account=> CREATE INDEX ip2location_db11_gist ON ip2location_db11 USING gist ((box(point(ip_from,ip_from),point(ip_to,ip_to))) box_ops);

account=> EXPLAIN ANALYZE VERBOSE SELECT * FROM ip2location_db11 WHERE  box(point(ip_from,ip_from),point(ip_to,ip_to)) @> box(point (2538629520,2538629520), point(2538629520,2538629520));


              QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.ip2location_db11  (cost=190.14..10463.13 rows=3319 width=69) (actual time=0.032..0.033 rows=1 loops=1)
   Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone
   Recheck Cond: (box(point((ip2location_db11.ip_from)::double precision, (ip2location_db11.ip_from)::double precision),
 point((ip2location_db11.ip_to)::double precision, (ip2location_db11.ip_to)::double precision)) @> '(2538629520,2538629520),(2538629520,2538629520)'::box)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on ip2location_db11_gist  (cost=0.00..189.31 rows=3319 width=0) (actual time=0.022..0.022 rows=1 loops=1)
         Index Cond: (box(point((ip2location_db11.ip_from)::double precision, (ip2location_db11.ip_from)::double precision), point((ip2location_db11.ip_to)::double precision, (ip2location_db11.ip_to)::double precision)) @> '(2538629520,2538629520),(2538629520,2538629520)'::box)
 Planning time: 2.119 ms
 Execution time: 0.073 ms
(8 rows)

引用:

http://www.siafoo.net/article/53#comment_二百八十八

http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks#Fast_间隔_.28of_时间_或_ip地址_addresses.29_搜索_使用_空间_索引

票数 1
EN

Database Administration用户

发布于 2017-11-09 15:33:16

ip4r

首先,将扩展(更好的说明)添加到Github上。

代码语言:javascript
复制
CREATE EXTENSION ip4r;

让我们从与之前几乎相同的事情开始,创建ip类型为ip4。不设置PRIMARY KEY,也不对类型添加索引。我们会在一次又一次更换桌子。

代码语言:javascript
复制
CREATE TABLE ip2location_db11
(
  ip_from ip4 NOT NULL,   -- First IP address in netblock.
  ip_to   ip4 NOT NULL, -- Last IP address in netblock.
  ....
);
\copy ip2location_db11 FROM 'IP2LOCATION-LITE-DB11.CSV' WITH CSV QUOTE AS '"';

现在让我们将它们升级到ip4r

代码语言:javascript
复制
BEGIN;
  ALTER TABLE ip2location_db11
    ADD iploc_range ip4r;
  UPDATE ip2location_db11
    SET iploc_range = ip4r(ip_from,ip_to);
  ALTER TABLE ip2location_db11
    DROP COLUMN ip_from,
    DROP COLUMN ip_to;
COMMIT;

现在让我们把它编入索引

代码语言:javascript
复制
CREATE INDEX ON ip2location_db11
   USING gist (iploc_range);
VACUUM ANALYZE ip2location_db11;

对此提出质疑,

代码语言:javascript
复制
SELECT *
FROM ip2location_db11
WHERE iploc_range >>= '1.2.3.4';
票数 1
EN
页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://dba.stackexchange.com/questions/190533

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档