文章/答案/技术大牛

发布

社区首页 >问答首页 >int + tstzrange列上的简单联接--仅在大约100万行上缓慢得令人痛苦

问int + tstzrange列上的简单联接--仅在大约100万行上缓慢得令人痛苦
EN

Stack Overflow用户

提问于 2017-08-09 13:24:27

回答 1查看 29关注 0票数 0

我正在为查询的性能而挣扎，该查询涉及int-column和tstzrange上的“简单”左联接

SELECT
      table_1.id_col
    , table_1.time_range
    , table_1.other_col_1
    , table_2.other_col_2
FROM table_1
LEFT JOIN table_2
ON table_1.id_col = table_2.id_col
AND table_1.time_range = table_2.time_range

这个查询需要~80-100秒才能运行，最终结果集为~100万行 (table_1和table_2的顺序相同)。

这个查询是一个更复杂的CTE查询的一部分(实际上它选择了这些1 million行的一个小子集)，但是我已经取消了这个部分，这是一个瓶颈。

我已经为这些列的组合添加了适当的索引(GIST-index)，但是从解释中我猜，当我加入几乎所有的行时，这种索引就会被丢弃。

有什么方法可以提高业绩吗？

就像在物理上为顺序扫描预先设定行？

My table：

CREATE TABLE data.table_1 (
    table_1_id SERIAL NOT NULL, 
    id_col INTEGER NOT NULL, 
    time_range TSTZRANGE NOT NULL, 
    other_col_1 INTEGER, 
    PRIMARY KEY (table_1_id), 
);

CREATE INDEX idx_table_1_id_col ON data.table_1 (id_col);
CREATE INDEX idx_table_1_time_range ON data.table_1 USING gist (time_range);
CREATE INDEX idx_table_1_id_col_time_range ON data.table_1 USING gist (id_col, time_range);

CREATE TABLE data.table_2 (
    table_2_id SERIAL NOT NULL, 
    id_col INTEGER NOT NULL, 
    time_range TSTZRANGE NOT NULL, 
    other_col_2 DOUBLE PRECISION, 
    PRIMARY KEY (table_2_id), 
);

CREATE INDEX idx_table_2_id_col ON data.table_2 (id_col);
CREATE INDEX idx_table_2_time_range ON data.table_2 USING gist (time_range);
CREATE INDEX idx_table_2_id_col_time_range ON data.table_2 USING gist (id_col, time_range);

以下是详细的解释：

[
  {
    "Plan": {
      "Node Type": "Hash Join",
      "Join Type": "Left",
      "Startup Cost": 198185.10,
      "Total Cost": 4163704.54,
      "Plan Rows": 73508636,
      "Plan Width": 20,
      "Actual Startup Time": 31055.086,
      "Actual Total Time": 89488.540,
      "Actual Rows": 1015568,
      "Actual Loops": 1,
      "Output": ["table_1.id_col", "table_1.other_col_1", "table_2.other_col_2"],
      "Hash Cond": "((table_1.id_col = table_2.id_col) AND (table_1.time_range = table_2.time_range))",
      "Shared Hit Blocks": 165149,
      "Shared Read Blocks": 632793,
      "Shared Dirtied Blocks": 0,
      "Shared Written Blocks": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 38220,
      "Temp Written Blocks": 37966,
      "I/O Read Time": 0.000,
      "I/O Write Time": 0.000,
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Parent Relationship": "Outer",
          "Relation Name": "table_1",
          "Schema": "data",
          "Alias": "table_1",
          "Startup Cost": 0.00,
          "Total Cost": 1492907.36,
          "Plan Rows": 73508636,
          "Plan Width": 34,
          "Actual Startup Time": 24827.453,
          "Actual Total Time": 77143.930,
          "Actual Rows": 904431,
          "Actual Loops": 1,
          "Output": ["table_1.id_col", "table_1.other_col_1", "table_1.time_range"],
          "Shared Hit Blocks": 165147,
          "Shared Read Blocks": 592674,
          "Shared Dirtied Blocks": 0,
          "Shared Written Blocks": 0,
          "Local Hit Blocks": 0,
          "Local Read Blocks": 0,
          "Local Dirtied Blocks": 0,
          "Local Written Blocks": 0,
          "Temp Read Blocks": 0,
          "Temp Written Blocks": 0,
          "I/O Read Time": 0.000,
          "I/O Write Time": 0.000
        },
        {
          "Node Type": "Hash",
          "Parent Relationship": "Inner",
          "Startup Cost": 88292.64,
          "Total Cost": 88292.64,
          "Plan Rows": 4817164,
          "Plan Width": 34,
          "Actual Startup Time": 6204.927,
          "Actual Total Time": 6204.927,
          "Actual Rows": 4817085,
          "Actual Loops": 1,
          "Output": ["table_2.other_col_2", "table_2.id_col", "table_2.time_range"],
          "Hash Buckets": 65536,
          "Original Hash Buckets": 65536,
          "Hash Batches": 128,
          "Original Hash Batches": 128,
          "Peak Memory Usage": 2930,
          "Shared Hit Blocks": 2,
          "Shared Read Blocks": 40119,
          "Shared Dirtied Blocks": 0,
          "Shared Written Blocks": 0,
          "Local Hit Blocks": 0,
          "Local Read Blocks": 0,
          "Local Dirtied Blocks": 0,
          "Local Written Blocks": 0,
          "Temp Read Blocks": 0,
          "Temp Written Blocks": 31422,
          "I/O Read Time": 0.000,
          "I/O Write Time": 0.000,
          "Plans": [
            {
              "Node Type": "Seq Scan",
              "Parent Relationship": "Outer",
              "Relation Name": "table_2",
              "Schema": "data",
              "Alias": "table_2",
              "Startup Cost": 0.00,
              "Total Cost": 88292.64,
              "Plan Rows": 4817164,
              "Plan Width": 34,
              "Actual Startup Time": 0.650,
              "Actual Total Time": 3769.157,
              "Actual Rows": 4817085,
              "Actual Loops": 1,
              "Output": ["table_2.other_col_2", "table_2.id_col", "table_2.time_range"],
              "Shared Hit Blocks": 2,
              "Shared Read Blocks": 40119,
              "Shared Dirtied Blocks": 0,
              "Shared Written Blocks": 0,
              "Local Hit Blocks": 0,
              "Local Read Blocks": 0,
              "Local Dirtied Blocks": 0,
              "Local Written Blocks": 0,
              "Temp Read Blocks": 0,
              "Temp Written Blocks": 0,
              "I/O Read Time": 0.000,
              "I/O Write Time": 0.000
            }
          ]
        }
      ]
    },
    "Planning Time": 0.350,
    "Triggers": [
    ],
    "Execution Time": 89689.809
  }
]

postgresql

date-range

回答 1

Stack Overflow用户

发布于 2017-08-09 13:59:06

使用CLUSTER对数据进行物理排序，将查询时间降至~5秒，这是可以的，考虑到我将进一步选择行的一个子集：

CLUSTER table_1 USING idx_table_1_id_col_time_range;
CLUSTER table_2 USING idx_table_2_id_col_time_range;

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/45592013

复制

相似问题

问int + tstzrange列上的简单联接--仅在大约100万行上缓慢得令人痛苦
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问int + tstzrange列上的简单联接--仅在大约100万行上缓慢得令人痛苦EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问int + tstzrange列上的简单联接--仅在大约100万行上缓慢得令人痛苦
EN