有人建议在Meta上为2015年年7月社区挑战制作一个机器人,它可以发现标题不佳的问题。我编写这个查询是为了帮助其他人找到示例,帮助他们编写代码。
我写的时候就把它写成文档,这样使用SQL的初学者仍然可以很容易地对其进行更改。还有改进的余地吗?
以下是对SEDE的查询
DECLARE @Question INT = (SELECT Id FROM PostTypes WHERE Name = 'Question');
DECLARE @Answer INT = (SELECT Id FROM PostTypes WHERE Name = 'Answer');
SELECT
/* Uncomment below if you want to limit the results */
/* TOP 100 */
Posts.Id AS [Post Link]
/* only for sanity check in case the filter is removed from WHERE clause */
, CASE
WHEN Posts.PostTypeId = @Question THEN 'Question'
WHEN Posts.PostTypeId = @Answer THEN 'Answer'
ELSE 'Other' END
AS [Post Type]
, Posts.Score
, Posts.CreationDate
, Users.Id AS [User Link]
, Users.Reputation
FROM Posts
INNER JOIN Users
ON Posts.OwnerUserId = Users.Id
WHERE
/* Filter out non-question posts */
Posts.PostTypeId = @Question
/* Generic title */
AND (
Posts.Title LIKE '%please%' OR
Posts.Title LIKE '%review%' OR
Posts.Title LIKE '%improve%' OR
Posts.Title LIKE '%better%' OR
Posts.Title LIKE '%how to%'OR
Posts.Title LIKE '%how can i%' OR
Posts.Title LIKE '%how do i%'
)
/* Post not closed yet */
AND Posts.ClosedDate IS NULL
/* Question doesn't have positive score */
/* AND Posts.Score <= 0 */
/* User has low reputation */
AND Users.Reputation <= 500
ORDER BY
/* Lowest scores first - Change to DESC if you want Highest first */
Posts.Score ASC
/* Newest first - Change to ASC if you want oldest first */
, Posts.CreationDate DESC发布于 2015-06-11 20:48:12
您的查询结构良好,且一致,但有一个问题您未能将.并不是所有的问题都有用户。迁移问题或删除用户时,问题可能没有返回到user表的链接。
这需要对用户进行外部连接。
此外,我发现CTE表达式非常适合混叠列名。例如,Posts和User表都有一个ID列,您需要在不同的位置设置Posts.ID和Users.ID。两者都有一个CreatedDate。当这些别名变得冗长和复杂时,就会影响查询的可读性。
哦,关于可读性,case语句是死代码。不要使用没有用途的代码,或者在部署之前删除“调试代码”。如果您不能信任自己在查询中具有正确的条件,那么您也不能相信自己拥有正确的调试代码。
因此,如果您在CTE中将这些值化名,并使用CTE使外部连接变得整洁,则可以将查询减少到:
SELECT PostID as [Post Link],
Score,
PostDate,
UserID as [User Link],
Reputation
FROM BadTitles
LEFT OUTER JOIN Users on UserId = Id
ORDER BY Score ASC, PostDate DESC好的,这是一个欺骗,因为BadTitles是一个CTE,但是,使用展开的注释,完整的查询是:
DECLARE @Question INT = (SELECT Id FROM PostTypes WHERE Name = 'Question');
DECLARE @Answer INT = (SELECT Id FROM PostTypes WHERE Name = 'Answer');
with BadTitles as (
SELECT Id as PostId,
OwnerUserId as UserId,
Score,
CreationDate as PostDate
FROM Posts
WHERE
/* Filter out non-question posts */
Posts.PostTypeId = @Question
/* Post not closed yet */
AND Posts.ClosedDate IS NULL
/* Generic title */
AND (
Posts.Title LIKE '%please%'
OR Posts.Title LIKE '%review%'
OR Posts.Title LIKE '%improve%'
OR Posts.Title LIKE '%better%'
OR Posts.Title LIKE '%how to%'
OR Posts.Title LIKE '%how can i%'
OR Posts.Title LIKE '%how do i%'
)
)
SELECT PostID as [Post Link],
Score,
PostDate,
UserID as [User Link],
Reputation
FROM BadTitles
LEFT OUTER JOIN Users on UserId = Id
WHERE (Reputation IS NULL OR Reputation <= 500)
ORDER BY
/* Lowest scores first - Change to DESC if you want Highest first */
Score ASC,
/* Newest first - Change to ASC if you want oldest first */
PostDate DESC请注意,这会产生来自迁移的上百个附加问题,等等。
发布于 2015-06-11 20:28:18
我注意到你在select语句中把逗号放在第一位,我也是这样做的,在我的where条件中把连词放在第一位,它有类似的优点。
和( Posts.Title‘%请’%‘或’%‘% Posts.Title’%‘或’‘%’Posts.Title‘
我会这样写的
AND (
Posts.Title LIKE '%please%'
OR Posts.Title LIKE '%review%'
OR Posts.Title LIKE '%improve%'
OR Posts.Title LIKE '%better%'
OR Posts.Title LIKE '%how to%'
OR Posts.Title LIKE '%how can i%'
OR Posts.Title LIKE '%how do i%'
)发布于 2019-08-24 12:54:56
让我在查询中解决一个可能的干的问题,以换取可能较差的性能。
我特别要说的是:
/* Generic title */
AND (
Posts.Title LIKE '%please%' OR
Posts.Title LIKE '%review%' OR
Posts.Title LIKE '%improve%' OR
Posts.Title LIKE '%better%' OR
Posts.Title LIKE '%how to%'OR
Posts.Title LIKE '%how can i%' OR
Posts.Title LIKE '%how do i%'
)在兼容级别130,Server提供了一个字符串_拆分函数。它允许您给它一个字符串,并在一个字符上将它拆分成一个表。例如:
STRING_SPLIT('please|review|improve|better|how to|how can i|how do i','|') 将为我们提供一个表,其列值包含用|分隔的每个单词的行。
我们可以使用该表对Posts表执行CROSS APPLY,因为这允许我们将类似的信号减少到:
/* Generic title */
AND CHARINDEX(
Phrases.Value
, Posts.Title COLLATE Latin1_General_CI_AI
) > 0 注意,我用查林地克斯替换了类似的内容。这本身并不重要,因为这些构造都不会从索引中受益。
我还在COLLATE中添加了一个个案与口音不敏感的校对指令,这样就不会错过使用大写字母的情况。
此外,我建议删除PostType的CASE/ the /END语句。您可以加入PostTypes表,然后在投影中使用Name列。
将所有这些查询应用于最终结果:
DECLARE @Question INT = (SELECT Id FROM PostTypes WHERE Name = 'Question');
DECLARE @Answer INT = (SELECT Id FROM PostTypes WHERE Name = 'Answer');
DECLARE @Phrases NVARCHAR(200) ='please|review|improve|better|how to|how can i|how do i';
SELECT DISTINCT -- the distinct is a bit bogus here if you keep the Phrase column
/* Uncomment below if you want to limit the results */
TOP 1000
Posts.Id AS [Post Link]
, PostTypes.Name AS [Post Type]
, Posts.Score
, Posts.CreationDate
, Users.Id AS [User Link]
, Users.Reputation
, Phrases.Value
FROM Posts
INNER JOIN PostTypes
ON PostTypes.Id = Posts.PostTypeId
CROSS APPLY
STRING_SPLIT(@Phrases, '|') Phrases
LEFT OUTER JOIN Users
ON Posts.OwnerUserId = Users.Id
WHERE
/* Filter out non-question posts */
Posts.PostTypeId = @Question
/* Generic title */
AND CHARINDEX(
Phrases.Value
, Posts.Title COLLATE Latin1_General_CI_AI
) > 0
/* Post not closed yet */
AND Posts.ClosedDate IS NULL
/* Question doesn't have positive score */
/* AND Posts.Score <= 0 */
/* User has low reputation - Un-comment to limit query further */
AND (Users.Id is NULL OR Users.Reputation <= 500)
ORDER BY
/* Lowest scores first - Change to DESC if you want Highest first */
Posts.Score ASC
/* Newest first - Change to ASC if you want oldest first */
, Posts.CreationDate DESC请注意,它如何显示额外的列短语,以指示将包含的标题限定的短语。

https://codereview.stackexchange.com/questions/93356
复制相似问题