首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >“我怎样才能使这个SEDE查询更好?”(错误的标题查找者)

“我怎样才能使这个SEDE查询更好?”(错误的标题查找者)
EN

Code Review用户
提问于 2015-06-11 18:52:50
回答 3查看 425关注 0票数 29

有人建议在Meta上为2015年年7月社区挑战制作一个机器人,它可以发现标题不佳的问题。我编写这个查询是为了帮助其他人找到示例,帮助他们编写代码。

我写的时候就把它写成文档,这样使用SQL的初学者仍然可以很容易地对其进行更改。还有改进的余地吗?

以下是对SEDE的查询

代码语言:javascript
复制
DECLARE @Question INT = (SELECT Id FROM PostTypes WHERE Name = 'Question');
DECLARE @Answer INT = (SELECT Id FROM PostTypes WHERE Name = 'Answer');

SELECT 
/* Uncomment below if you want to limit the results */
/* TOP 100 */
    Posts.Id AS [Post Link]
  /* only for sanity check in case the filter is removed from WHERE clause */
  , CASE 
      WHEN Posts.PostTypeId = @Question THEN 'Question'
      WHEN Posts.PostTypeId = @Answer THEN 'Answer'
      ELSE 'Other' END 
      AS [Post Type]
  , Posts.Score
  , Posts.CreationDate
  , Users.Id AS [User Link]
  , Users.Reputation
FROM Posts 
INNER JOIN Users
  ON Posts.OwnerUserId = Users.Id
WHERE
  /* Filter out non-question posts */
  Posts.PostTypeId = @Question

  /* Generic title */
  AND (
    Posts.Title LIKE '%please%' OR
    Posts.Title LIKE '%review%' OR
    Posts.Title LIKE '%improve%' OR
    Posts.Title LIKE '%better%' OR
    Posts.Title LIKE '%how to%'OR
    Posts.Title LIKE '%how can i%' OR
    Posts.Title LIKE '%how do i%'
  )

  /* Post not closed yet */
  AND Posts.ClosedDate IS NULL

  /* Question doesn't have positive score */
  /* AND Posts.Score <= 0 */

  /* User has low reputation */
  AND Users.Reputation <= 500

ORDER BY

  /* Lowest scores first - Change to DESC if you want Highest first */
  Posts.Score ASC

  /* Newest first - Change to ASC if you want oldest first */
  , Posts.CreationDate DESC
EN

回答 3

Code Review用户

回答已采纳

发布于 2015-06-11 20:48:12

您的查询结构良好,且一致,但有一个问题您未能将.并不是所有的问题都有用户。迁移问题或删除用户时,问题可能没有返回到user表的链接。

这需要对用户进行外部连接。

此外,我发现CTE表达式非常适合混叠列名。例如,Posts和User表都有一个ID列,您需要在不同的位置设置Posts.IDUsers.ID。两者都有一个CreatedDate。当这些别名变得冗长和复杂时,就会影响查询的可读性。

哦,关于可读性,case语句是死代码。不要使用没有用途的代码,或者在部署之前删除“调试代码”。如果您不能信任自己在查询中具有正确的条件,那么您也不能相信自己拥有正确的调试代码。

因此,如果您在CTE中将这些值化名,并使用CTE使外部连接变得整洁,则可以将查询减少到:

代码语言:javascript
复制
SELECT PostID as [Post Link],
       Score,
       PostDate,
       UserID as [User Link],
       Reputation
FROM BadTitles
LEFT OUTER JOIN Users on UserId = Id
ORDER BY Score ASC, PostDate DESC

好的,这是一个欺骗,因为BadTitles是一个CTE,但是,使用展开的注释,完整的查询是

代码语言:javascript
复制
DECLARE @Question INT = (SELECT Id FROM PostTypes WHERE Name = 'Question');
DECLARE @Answer INT = (SELECT Id FROM PostTypes WHERE Name = 'Answer');

with BadTitles as (
    SELECT Id as PostId,
           OwnerUserId as UserId,
           Score,
           CreationDate as PostDate
    FROM Posts
    WHERE 
      /* Filter out non-question posts */
          Posts.PostTypeId = @Question

      /* Post not closed yet */
      AND Posts.ClosedDate IS NULL

      /* Generic title */
      AND (
            Posts.Title LIKE '%please%'
        OR  Posts.Title LIKE '%review%'
        OR  Posts.Title LIKE '%improve%'
        OR  Posts.Title LIKE '%better%'
        OR  Posts.Title LIKE '%how to%'
        OR  Posts.Title LIKE '%how can i%'
        OR  Posts.Title LIKE '%how do i%'
        )
)
SELECT PostID as [Post Link],
       Score,
       PostDate,
       UserID as [User Link],
       Reputation
FROM BadTitles
LEFT OUTER JOIN Users on UserId = Id
WHERE (Reputation IS NULL OR Reputation <= 500)
ORDER BY

  /* Lowest scores first - Change to DESC if you want Highest first */
  Score ASC,

  /* Newest first - Change to ASC if you want oldest first */
  PostDate DESC

请注意,这会产生来自迁移的上百个附加问题,等等。

票数 14
EN

Code Review用户

发布于 2015-06-11 20:28:18

我注意到你在select语句中把逗号放在第一位,我也是这样做的,在我的where条件中把连词放在第一位,它有类似的优点。

和( Posts.Title‘%请’%‘或’%‘% Posts.Title’%‘或’‘%’Posts.Title‘

我会这样写的

代码语言:javascript
复制
  AND (
    Posts.Title LIKE '%please%' 
    OR Posts.Title LIKE '%review%'
    OR Posts.Title LIKE '%improve%'
    OR Posts.Title LIKE '%better%'
    OR Posts.Title LIKE '%how to%'
    OR Posts.Title LIKE '%how can i%'
    OR Posts.Title LIKE '%how do i%'
  )
票数 7
EN

Code Review用户

发布于 2019-08-24 12:54:56

让我在查询中解决一个可能的干的问题,以换取可能较差的性能。

我特别要说的是:

代码语言:javascript
复制
 /* Generic title */
  AND (
    Posts.Title LIKE '%please%' OR
    Posts.Title LIKE '%review%' OR
    Posts.Title LIKE '%improve%' OR
    Posts.Title LIKE '%better%' OR
    Posts.Title LIKE '%how to%'OR
    Posts.Title LIKE '%how can i%' OR
    Posts.Title LIKE '%how do i%'
  )

在兼容级别130,Server提供了一个字符串_拆分函数。它允许您给它一个字符串,并在一个字符上将它拆分成一个表。例如:

代码语言:javascript
复制
STRING_SPLIT('please|review|improve|better|how to|how can i|how do i','|') 

将为我们提供一个表,其列值包含用|分隔的每个单词的行。

我们可以使用该表对Posts表执行CROSS APPLY,因为这允许我们将类似的信号减少到:

代码语言:javascript
复制
  /* Generic title */
  AND CHARINDEX(
          Phrases.Value
        , Posts.Title COLLATE Latin1_General_CI_AI
      ) > 0 

注意,我用查林地克斯替换了类似的内容。这本身并不重要,因为这些构造都不会从索引中受益。

我还在COLLATE中添加了一个个案与口音不敏感的校对指令,这样就不会错过使用大写字母的情况。

此外,我建议删除PostType的CASE/ the /END语句。您可以加入PostTypes表,然后在投影中使用Name列。

将所有这些查询应用于最终结果

代码语言:javascript
复制
DECLARE @Question INT = (SELECT Id FROM PostTypes WHERE Name = 'Question');
DECLARE @Answer INT = (SELECT Id FROM PostTypes WHERE Name = 'Answer');

DECLARE @Phrases NVARCHAR(200) ='please|review|improve|better|how to|how can i|how do i';

SELECT DISTINCT -- the distinct is a bit bogus here if you keep the Phrase column
/* Uncomment below if you want to limit the results */
  TOP 1000 
    Posts.Id AS [Post Link]
  , PostTypes.Name AS [Post Type]
  , Posts.Score
  , Posts.CreationDate
  , Users.Id AS [User Link]
  , Users.Reputation
  , Phrases.Value
FROM Posts 
INNER JOIN PostTypes 
  ON PostTypes.Id = Posts.PostTypeId
CROSS APPLY 
  STRING_SPLIT(@Phrases, '|') Phrases
LEFT OUTER JOIN Users
  ON Posts.OwnerUserId = Users.Id
WHERE
  /* Filter out non-question posts */
  Posts.PostTypeId = @Question

  /* Generic title */
  AND CHARINDEX(
          Phrases.Value
        , Posts.Title COLLATE Latin1_General_CI_AI
      ) > 0 

  /* Post not closed yet */
  AND Posts.ClosedDate IS NULL

  /* Question doesn't have positive score */
  /* AND Posts.Score <= 0 */

  /* User has low reputation - Un-comment to limit query further */
  AND (Users.Id is NULL OR Users.Reputation <= 500)

ORDER BY

  /* Lowest scores first - Change to DESC if you want Highest first */
  Posts.Score ASC

  /* Newest first - Change to ASC if you want oldest first */
  , Posts.CreationDate DESC

请注意,它如何显示额外的列短语,以指示将包含的标题限定的短语。

票数 5
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/93356

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档