我有一个由1000万行组成的表,其中我试图找出谁是某些机器(id)的第一个/最后一个维护者,这取决于某些日期,也取决于机器的状态。我的查询使用六个联接,还有其他首选选项吗?编辑:原始表有索引,试图优化查询替换联接--如果可能的话?使用示例的SQL:
SQL Fiddle
编辑(在下面添加了补充信息):
示例表:
CREATE TABLE vendor_info (
id INT,
datestamp INT,
statuz INT,
maintainer VARCHAR(25));
INSERT INTO vendor_info VALUES (1, 20180101, 0, 'Jay');
INSERT INTO vendor_info VALUES (2, 20180101, 0, 'Eric');
INSERT INTO vendor_info VALUES (3, 20180101, 1, 'David');
INSERT INTO vendor_info VALUES (1, 20180201, 1, 'Jay');
INSERT INTO vendor_info VALUES (2, 20180201, 0, 'Jay');
INSERT INTO vendor_info VALUES (3, 20180201, 1, 'Jay');
INSERT INTO vendor_info VALUES (1, 20180301, 1, 'Jay');
INSERT INTO vendor_info VALUES (2, 20180301, 1, 'David');
INSERT INTO vendor_info VALUES (3, 20180301, 1, 'Eric');查询和期望的输出:
SELECT
id
, MIN(datestamp) AS min_datestamp
, MAX(datestamp) AS max_datestamp
, MAX(case when statuz = 0 then datestamp end) AS max_s0_date
, MAX(case when statuz = 1 then datestamp end) AS max_s1_date
, MIN(case when statuz = 0 then datestamp end) AS min_s0_date
, MIN(case when statuz = 1 then datestamp end) AS min_s1_date
INTO vendor_dates
FROM vendor_info
GROUP BY id;
SELECT
vd.id
, v1.maintainer AS first_maintainer
, v2.maintainer AS last_maintainer
, v3.maintainer AS last_s0_maintainer
, v4.maintainer AS last_s1_maintainer
, v5.maintainer AS first_s0_maintainer
, v6.maintainer AS first_s1_maintainer
FROM vendor_dates vd
LEFT JOIN vendor_info v1 ON vd.id = v1.id AND vd.min_datestamp = v1.datestamp
LEFT JOIN vendor_info v2 ON vd.id = v2.id AND vd.max_datestamp = v2.datestamp
LEFT JOIN vendor_info v3 ON vd.id = v3.id AND vd.max_s0_date = v3.datestamp
LEFT JOIN vendor_info v4 ON vd.id = v4.id AND vd.max_s1_date = v4.datestamp
LEFT JOIN vendor_info v5 ON vd.id = v5.id AND vd.min_s0_date = v5.datestamp
LEFT JOIN vendor_info v6 ON vd.id = v6.id AND vd.min_s1_date = v6.datestamp;发布于 2018-09-04 08:50:15
添加信息将第二次查询的持续时间从超过300 of缩短到重复运行时的平均30 of以下。
主键群集(id、日期标记)
将2步过程转换为CTE可使总持续时间更多在重复运行期间减少到15 to以下。
CTE方法允许查询优化器使用新的主键。
CREATE TABLE vendor_info (
id INT,
datestamp INT,
statuz INT,
maintainer VARCHAR(25)
PRIMARY KEY CLUSTERED (id, datestamp)
);
INSERT INTO vendor_info VALUES (1, 20180101, 0, 'Jay');
INSERT INTO vendor_info VALUES (2, 20180101, 0, 'Eric');
INSERT INTO vendor_info VALUES (3, 20180101, 1, 'David');
INSERT INTO vendor_info VALUES (1, 20180201, 1, 'Jay');
INSERT INTO vendor_info VALUES (2, 20180201, 0, 'Jay');
INSERT INTO vendor_info VALUES (3, 20180201, 1, 'Jay');
INSERT INTO vendor_info VALUES (1, 20180301, 1, 'Jay');
INSERT INTO vendor_info VALUES (2, 20180301, 1, 'David');
INSERT INTO vendor_info VALUES (3, 20180301, 1, 'Eric');
WITH vendor_dates AS
(SELECT
id
, MIN(datestamp) AS min_datestamp
, MAX(datestamp) AS max_datestamp
, MAX(case when statuz = 0 then datestamp end) AS max_s0_date
, MAX(case when statuz = 1 then datestamp end) AS max_s1_date
, MIN(case when statuz = 0 then datestamp end) AS min_s0_date
, MIN(case when statuz = 1 then datestamp end) AS min_s1_date
FROM vendor_info
GROUP BY id
)
SELECT
vd.id
, v1.maintainer AS first_maintainer
, v2.maintainer AS last_maintainer
, v3.maintainer AS last_s0_maintainer
, v4.maintainer AS last_s1_maintainer
, v5.maintainer AS first_s0_maintainer
, v6.maintainer AS first_s1_maintainer
FROM vendor_dates vd
LEFT JOIN vendor_info v1 ON vd.id = v1.id AND vd.min_datestamp = v1.datestamp
LEFT JOIN vendor_info v2 ON vd.id = v2.id AND vd.max_datestamp = v2.datestamp
LEFT JOIN vendor_info v3 ON vd.id = v3.id AND vd.max_s0_date = v3.datestamp
LEFT JOIN vendor_info v4 ON vd.id = v4.id AND vd.max_s1_date = v4.datestamp
LEFT JOIN vendor_info v5 ON vd.id = v5.id AND vd.min_s0_date = v5.datestamp
LEFT JOIN vendor_info v6 ON vd.id = v6.id AND vd.min_s1_date = v6.datestamp;发布于 2018-09-04 10:40:17
检查以下查询。
WITH
a AS (
SELECT
id, datestamp, maintainer, statuz,
MIN(datestamp) OVER(PARTITION BY id) AS fm,
MAX(datestamp) OVER(PARTITION BY id) AS lm,
MIN(datestamp) OVER(PARTITION BY id, statuz) AS fZm,
MAX(datestamp) OVER(PARTITION BY id, statuz) AS lZm
FROM vendor_info
)
SELECT
id,
MIN(IIF(datestamp = fm, maintainer, NULL)) AS first_maintainer,
MAX(IIF(datestamp = lm, maintainer, NULL)) AS last_maintainer,
MAX(IIF(datestamp = lZm AND statuz = 0, maintainer, NULL)) AS last_s0_maintainer,
MAX(IIF(datestamp = lZm AND statuz = 1, maintainer, NULL)) AS last_s1_maintainer,
MIN(IIF(datestamp = fZm AND statuz = 0, maintainer, NULL)) AS first_s0_maintainer,
MIN(IIF(datestamp = fZm AND statuz = 1, maintainer, NULL)) AS first_s1_maintainer
FROM a
GROUP BY id;它可以在SQL Fiddle上测试。
发布于 2018-09-04 09:45:44
我还没来得及生成1000万条测试记录,但尝试使用id上的索引、日期标记--我对它抱有希望--执行计划看起来很好--编辑了我生成的50多万条记录,只要(id,datestamp)索引(或其他合适的索引)存在,它看起来就会很快。
SELECT tID.id, V1.first_maintainer, V2.last_maintainer, V3.last_s0_maintainer, V4.last_s1_maintainer, V5.first_s0_maintainer, V6.first_s1_maintainer
FROM (SELECT DISTINCT ID from vendor_info) tID
OUTER APPLY
(SELECT TOP 1 vi1.maintainer first_maintainer
FROM vendor_info vi1
WHERE vi1.id = tID.id
ORDER BY vi1.datestamp ASC) V1
OUTER APPLY
(SELECT TOP 1 vi2.maintainer last_maintainer
FROM vendor_info vi2
WHERE vi2.id = tID.id
ORDER BY vi2.datestamp DESC) V2
OUTER APPLY
(SELECT TOP 1 vi3.maintainer last_s0_maintainer
FROM vendor_info vi3
WHERE vi3.statuz = 0 AND vi3.id = tID.id
ORDER BY vi3.datestamp DESC) V3
OUTER APPLY
(SELECT TOP 1 vi4.maintainer last_s1_maintainer
FROM vendor_info vi4
WHERE vi4.statuz = 1 AND vi4.id = tID.id
ORDER BY vi4.datestamp DESC) V4
OUTER APPLY
(SELECT TOP 1 vi5.maintainer first_s0_maintainer
FROM vendor_info vi5
WHERE vi5.statuz = 0 AND vi5.id = tID.id
ORDER BY vi5.datestamp ASC) V5
OUTER APPLY
(SELECT TOP 1 vi6.maintainer first_s1_maintainer
FROM vendor_info vi6
WHERE vi6.statuz = 1 AND vi6.id = tID.id
ORDER BY vi6.datestamp ASC) V6https://stackoverflow.com/questions/52162194
复制相似问题