我已经导入了一个包含3,319,097行的ip2location_db11 lite数据库副本,并且我正在优化一个数值范围查询,其中低值和高值位于表的单独列中(ip_from,ip_to)。
导入数据库:
CREATE TABLE ip2location_db11
(
ip_from bigint NOT NULL, -- First IP address in netblock.
ip_to bigint NOT NULL, -- Last IP address in netblock.
country_code character(2) NOT NULL, -- Two-character country code based on ISO 3166.
country_name character varying(64) NOT NULL, -- Country name based on ISO 3166.
region_name character varying(128) NOT NULL, -- Region or state name.
city_name character varying(128) NOT NULL, -- City name.
latitude real NOT NULL, -- City latitude. Default to capital city latitude if city is unknown.
longitude real NOT NULL, -- City longitude. Default to capital city longitude if city is unknown.
zip_code character varying(30) NOT NULL, -- ZIP/Postal code.
time_zone character varying(8) NOT NULL, -- UTC time zone (with DST supported).
CONSTRAINT ip2location_db11_pkey PRIMARY KEY (ip_from, ip_to)
);
\copy ip2location_db11 FROM 'IP2LOCATION-LITE-DB11.CSV' WITH CSV QUOTE AS '"';我第一个天真的索引尝试是在每一列上创建单独的索引,结果是连续扫描,查询时间为400 My:
account=> CREATE INDEX ip_from_db11_idx ON ip2location_db11 (ip_from);
account=> CREATE INDEX ip_to_db11_idx ON ip2location_db11 (ip_to);
account=> EXPLAIN ANALYZE VERBOSE SELECT * FROM ip2location_db11 WHERE 2538629520 BETWEEN ip_from AND ip_to;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.ip2location_db11 (cost=0.00..48930.99 rows=43111 width=842) (actual time=286.714..401.805 rows=1 loops=1)
Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone
Filter: (('2538629520'::bigint >= ip2location_db11.ip_from) AND ('2538629520'::bigint <= ip2location_db11.ip_to))
Rows Removed by Filter: 3319096
Planning time: 0.155 ms
Execution time: 401.834 ms
(6 rows)
account=> \d ip2location_db11
Table "public.ip2location_db11"
Column | Type | Modifiers
--------------+------------------------+-----------
ip_from | bigint | not null
ip_to | bigint | not null
country_code | character(2) | not null
country_name | character varying(64) | not null
region_name | character varying(128) | not null
city_name | character varying(128) | not null
latitude | real | not null
longitude | real | not null
zip_code | character varying(30) | not null
time_zone | character varying(8) | not null
Indexes:
"ip2location_db11_pkey" PRIMARY KEY, btree (ip_from, ip_to)
"ip_from_db11_idx" btree (ip_from)
"ip_to_db11_idx" btree (ip_to)我的第二次尝试是创建一个多列btree索引,这会导致查询次数为290 My的索引扫描:
account=> CREATE INDEX ip_range_db11_idx ON ip2location_db11 (ip_from,ip_to);
account=> EXPLAIN ANALYZE VERBOSE SELECT * FROM ip2location_db11 WHERE 2538629520 BETWEEN ip_from AND ip_to;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ip_to_db11_idx on public.ip2location_db11 (cost=0.43..51334.91 rows=756866 width=69) (actual time=1.109..289.143 rows=1 loops=1)
Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone
Index Cond: ('2538629520'::bigint <= ip2location_db11.ip_to)
Filter: ('2538629520'::bigint >= ip2location_db11.ip_from)
Rows Removed by Filter: 1160706
Planning time: 0.324 ms
Execution time: 289.172 ms
(7 rows)
n4l_account=> \d ip2location_db11
Table "public.ip2location_db11"
Column | Type | Modifiers
--------------+------------------------+-----------
ip_from | bigint | not null
ip_to | bigint | not null
country_code | character(2) | not null
country_name | character varying(64) | not null
region_name | character varying(128) | not null
city_name | character varying(128) | not null
latitude | real | not null
longitude | real | not null
zip_code | character varying(30) | not null
time_zone | character varying(8) | not null
Indexes:
"ip2location_db11_pkey" PRIMARY KEY, btree (ip_from, ip_to)
"ip_from_db11_idx" btree (ip_from)
"ip_range_db11_idx" btree (ip_from, ip_to)
"ip_to_db11_idx" btree (ip_to)更新:根据评论中的要求,我重新做了上面的查询.重新创建表(165 of、65 of、86 of、83 of、86 of、64 of、85 of、811 of、868 of、845 of、810 of、781 of、797 of、890 of、806 of)后的前15个查询的时间:
account=> EXPLAIN (ANALYZE, VERBOSE, BUFFERS, TIMING) SELECT * FROM ip2location_db11 WHERE 2538629520 BETWEEN ip_from AND ip_to;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.ip2location_db11 (cost=28200.29..76843.12 rows=368789 width=842) (actual time=64.866..64.866 rows=1 loops=1)
Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone
Recheck Cond: (('2538629520'::bigint >= ip2location_db11.ip_from) AND ('2538629520'::bigint <= ip2location_db11.ip_to))
Heap Blocks: exact=1
Buffers: shared hit=8273
-> Bitmap Index Scan on ip_range_db11_idx (cost=0.00..28108.09 rows=368789 width=0) (actual time=64.859..64.859 rows=1 loops=1)
Index Cond: (('2538629520'::bigint >= ip2location_db11.ip_from) AND ('2538629520'::bigint <= ip2location_db11.ip_to))
Buffers: shared hit=8272
Planning time: 0.099 ms
Execution time: 64.907 ms
(10 rows)
account=> EXPLAIN (ANALYZE, VERBOSE, BUFFERS, TIMING) SELECT * FROM ip2location_db11 WHERE 2538629520 BETWEEN ip_from AND ip_to;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.ip2location_db11 (cost=0.00..92906.18 rows=754776 width=69) (actual time=577.234..811.757 rows=1 loops=1)
Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone
Filter: (('2538629520'::bigint >= ip2location_db11.ip_from) AND ('2538629520'::bigint <= ip2location_db11.ip_to))
Rows Removed by Filter: 3319096
Buffers: shared hit=33 read=43078
Planning time: 0.667 ms
Execution time: 811.783 ms
(7 rows)从导入的CSV文件中的示例行:
"0","16777215","-","-","-","-","0.000000","0.000000","-","-"
"16777216","16777471","AU","Australia","Queensland","Brisbane","-27.467940","153.028090","4000","+10:00"
"16777472","16778239","CN","China","Fujian","Fuzhou","26.061390","119.306110","350004","+08:00"是否有更好的方法来索引这个表来改进查询,或者是否有一个更有效的查询可以获得相同的结果?
发布于 2017-11-09 14:28:13
由于有了注释,我有了一个解决方案,通过使用gist空间索引并相应地调整查询,将查询时间缩短到0.073ms:
account=> DROP INDEX ip_to_db11_idx;
account=> DROP INDEX ip_from_db11_idx;
account=> DROP INDEX ip_range_db11_idx;
account=> CREATE INDEX ip2location_db11_gist ON ip2location_db11 USING gist ((box(point(ip_from,ip_from),point(ip_to,ip_to))) box_ops);
account=> EXPLAIN ANALYZE VERBOSE SELECT * FROM ip2location_db11 WHERE box(point(ip_from,ip_from),point(ip_to,ip_to)) @> box(point (2538629520,2538629520), point(2538629520,2538629520));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.ip2location_db11 (cost=190.14..10463.13 rows=3319 width=69) (actual time=0.032..0.033 rows=1 loops=1)
Output: ip_from, ip_to, country_code, country_name, region_name, city_name, latitude, longitude, zip_code, time_zone
Recheck Cond: (box(point((ip2location_db11.ip_from)::double precision, (ip2location_db11.ip_from)::double precision),
point((ip2location_db11.ip_to)::double precision, (ip2location_db11.ip_to)::double precision)) @> '(2538629520,2538629520),(2538629520,2538629520)'::box)
Heap Blocks: exact=1
-> Bitmap Index Scan on ip2location_db11_gist (cost=0.00..189.31 rows=3319 width=0) (actual time=0.022..0.022 rows=1 loops=1)
Index Cond: (box(point((ip2location_db11.ip_from)::double precision, (ip2location_db11.ip_from)::double precision), point((ip2location_db11.ip_to)::double precision, (ip2location_db11.ip_to)::double precision)) @> '(2538629520,2538629520),(2538629520,2538629520)'::box)
Planning time: 2.119 ms
Execution time: 0.073 ms
(8 rows)引用:
http://www.siafoo.net/article/53#comment_二百八十八
http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks#Fast_间隔_.28of_时间_或_ip地址_addresses.29_搜索_使用_空间_索引
发布于 2017-11-09 15:33:16
ip4r首先,将扩展(更好的说明)添加到Github上。
CREATE EXTENSION ip4r;让我们从与之前几乎相同的事情开始,创建ip类型为ip4。不设置PRIMARY KEY,也不对类型添加索引。我们会在一次又一次更换桌子。
CREATE TABLE ip2location_db11
(
ip_from ip4 NOT NULL, -- First IP address in netblock.
ip_to ip4 NOT NULL, -- Last IP address in netblock.
....
);
\copy ip2location_db11 FROM 'IP2LOCATION-LITE-DB11.CSV' WITH CSV QUOTE AS '"';现在让我们将它们升级到ip4r
BEGIN;
ALTER TABLE ip2location_db11
ADD iploc_range ip4r;
UPDATE ip2location_db11
SET iploc_range = ip4r(ip_from,ip_to);
ALTER TABLE ip2location_db11
DROP COLUMN ip_from,
DROP COLUMN ip_to;
COMMIT;现在让我们把它编入索引
CREATE INDEX ON ip2location_db11
USING gist (iploc_range);
VACUUM ANALYZE ip2location_db11;对此提出质疑,
SELECT *
FROM ip2location_db11
WHERE iploc_range >>= '1.2.3.4';https://dba.stackexchange.com/questions/190533
复制相似问题