我正在为Gardening.SE上的一篇元帖子做一些研究(在那里我是临时主持人)。我正在使用Stack Exchange数据资源管理器(SEDE)查找有关第一次在鉴定标记中发布问题的用户的信息,以及他们随后如何与回答问题的人进行交互,是否是帖子上的评论、关于答案的评论、编辑帖子以提供更多信息等等。
我有下面的查询,它给出了没有答案的人的列表,还有一个包含鉴定标记的问题。我计划通过添加额外的标准来对其进行改进,例如,查找OP将评论留给其他人回复的实例。
-- Find users who have exactly one post
-- with that post being a question
-- and the question has the [identification] tag
SELECT
Users.Id AS [User Link], Posts.Id AS [Post Link]
FROM
Users INNER JOIN Posts ON Posts.OwnerUserId = Users.Id
INNER JOIN PostTags ON Posts.Id = PostTags.PostId
INNER JOIN Tags ON Tags.Id = PostTags.TagId
WHERE
Tags.TagName LIKE 'identification' AND
Posts.ParentId IS NULL AND -- ParentId is NULL if post is a question
Users.Id in (
-- Users with exactly one question
SELECT Users.Id
FROM
Users INNER JOIN Posts ON Posts.OwnerUserId = Users.Id
WHERE
Posts.ParentId IS NULL
GROUP BY
Users.id
HAVING
Count(Posts.Id) = 1
) AND
Users.Id NOT IN (
-- Users with 1 or more answers
SELECT Users.Id
FROM
Users INNER JOIN Posts ON Posts.OwnerUserId = Users.Id
WHERE
Posts.ParentId IS NOT NULL
GROUP BY
Users.id
HAVING
Count(Posts.Id) > 0
)
ORDER BY
Users.Id DESC -- newest users first.我担心的是,这个SQL远远不是最优的。我特别关注的是用户和Posts表之间重复的内部连接,以及查找一个问题的子查询和至少一个答案的用户之间的相似之处。
Gardening.SE是一个小站点,所以这个查询所用的时间总是少于75 is才能返回它的101个记录,但即便如此,如果可能的话,我还是想让它更高效地运行。正如我说过的,我计划增强这个查询,这意味着要拉出更多的表,所以我也担心它会以这种方式扩展。
数据资源管理器链接:
发布于 2014-04-10 05:33:22
优化SQL的一半技巧是了解数据库模式。
模式文档可以在这篇Meta文章中找到:公共数据转储和SEDE的数据库模式文档。
我在您的查询中看到了四个功能问题,并在此之后提出了一些建议:
ClosedDate)Tags以获得一条记录。这可以独立解决。至于建议:
使用这些建议,我重新构造了您的查询,以完成以下几项工作:
identification我把你的问题分叉了... ...对我来说,它用了33毫秒的时间
-- Find users who have exactly one post
-- with that post being a question
-- and the question has the [xxxx] tag (default identification)
declare @tag as nvarchar(25) = ##tag:string?identification##
;
declare @tagid as int
;
-- get the tag id here so it is not needed as a join table
select @tagid = Id
from Tags
where TagName = lower(@tag)
;
print 'Tag for ' + @tag + ' is ' + Convert(NVarchar(10), @tagid)
;
with Answerers as (
select distinct OwnerUserId as [UserId]
from Posts p
where PostTypeId = 2 -- answers
and CommunityOwnedDate is null -- not CW
and ClosedDate is null -- not closed
), SingleQ as (
Select OwnerUserId as [UserId],
Min(Id) as [FirstPost],
Sum(Score) as [Score],
count (*) as [QCount]
from Posts p
where PostTypeId = 1 -- question
and CommunityOwnedDate is null -- not CW
and ClosedDate is null -- not closed
and not exists (
select UserId
from Answerers
where UserId = OwnerUserId) -- active answerers.
group by OwnerUserId
having count(*) = 1
)
SELECT
ROW_NUMBER() OVER (order by Users.Id Desc) as [Recent],
Users.Id AS [User Link],
Users.DisplayName AS [Sort By Name],
SingleQ.FirstPost AS [Post Link],
SingleQ.Score AS [Score]
FROM Users
INNER JOIN SingleQ ON Users.Id = SingleQ.UserId
INNER JOIN PostTags ON SingleQ.FirstPost = PostTags.PostId
WHERE PostTags.TagId = @tagid
ORDER BY
Users.Id DESC -- newest users first.https://codereview.stackexchange.com/questions/46780
复制相似问题