我有一个收集SERP(搜索引擎结果)并将它们存储在数据库中的系统。我正在使用MSSql2008R2。
对于每个给定的关键字,都有一个包含10个SERP元素的SERP结果组。
Keyword ->
SERP Group ->
- SERP item
- SERP item
- ... (3-10)实体
public class SerpResultsGroup
{
public SerpResultsGroup()
{
SerpResults = new List<SerpResult>();
}
public int Id { get; set; }
public int KeywordId { get; set; }
public Keyword Keyword { get; set; }
public ICollection<SerpResult> SerpResults { get; set; }
}
public class SerpResult
{
public int Id { get; set; }
public int Position { get; set; }
public string Url { get; set; }
public int SerpResultsGroupId { get; set; }
public SerpResultsGroup SerpResultsGroup { get; set; }
}DB模式图片
现在,我需要组合所有SERP组(或SERP项),并找到至少N(3) SERP项相同的此类组。
这里有一幅图来说明这种逻辑。分组算法,插图。图片说明了相关组是如何匹配的。为了简单起见,图中只有3个组,但实际上数据库中可能有N组SERP项。
假设我有一个组/关键字(group =关键字),并且我希望找到所有匹配的组,其中至少有3 (N)个与来自给定组的urls匹配。这些组之间还应该至少有3 (N)个匹配的urls。
同样的标准,但是现在我们没有一个起始组,我们想把所有的组组合在一起。
我怎样才能完成这项任务?我需要使用交叉连接吗?
更新
下面的查询似乎返回一组id为11的给定组的匹配组,但它不检查返回的组中的每个组是否彼此匹配。
SELECT sr.SerpResultsGroupId, sr1.SerpResultsGroupId
FROM SerpResults sr
INNER JOIN SerpResults sr1 ON sr.Url = sr1.Url
WHERE sr.SerpResultsGroupId != sr1.SerpResultsGroupId
AND sr.SerpResultsGroupId = 11
GROUP BY sr.SerpResultsGroupId, sr1.SerpResultsGroupId
HAVING COUNT(sr1.SerpResultsGroupId) >= 3发布于 2016-11-02 16:03:12
用例的简化解决方案--3种常用的项目。相关群体将拥有相同的rnk。
select t.*
from (select count (*) over (partition by s1.SERP_item,s2.SERP_item,s3.SERP_item) as cnt
,rank () over (order by s1.SERP_item,s2.SERP_item,s3.SERP_item) as rnk
,s1.SERP_Group
,s1.SERP_item as SERP_item1
,s2.SERP_item as SERP_item2
,s3.SERP_item as SERP_item3
from SERP as s1
join SERP as s2
on s2.SERP_Group = s1.SERP_Group
and s2.SERP_item > s1.SERP_item
join SERP as s3
on s3.SERP_Group = s2.SERP_Group
and s3.SERP_item > s2.SERP_item
) t
where t.cnt > 1
order by t.rnk
;cnt rnk SERP_Group SERP_item1 SERP_item2 SERP_item3
--- --- ---------- ---------- ---------- ----------
2 30 139 4 17 19
2 30 744 4 17 19
4 36 372 4 31 33
4 36 375 4 31 33
4 36 391 4 31 33
4 36 408 4 31 33n公共项的通用解决方案。相关群体将拥有相同的rnk。
with t (SERP_Group,SERP_item,n,SERP_items)
as
(
select SERP_Group,SERP_item,1 as n,cast (SERP_item as varchar(max))
from SERP
union all
select s.SERP_Group,s.SERP_item,t.n+1,SERP_items + ',' + cast (s.SERP_item as varchar(max))
from t
join SERP s
on s.SERP_Group = t.SERP_Group
and s.SERP_item > t.SERP_item
and t.n < 3
)
select cnt
,rnk
,SERP_Group
,SERP_items
from (select t.*
,rank () over (order by SERP_items) as rnk
,count (*) over (partition by SERP_items) as cnt
from t
where t.n = 3
) t
where t.cnt > 1
order by t.rnk
;cnt rnk SERP_Group SERP_items
--- --- ---------- ----------
2 107 139 4,17,19
2 107 744 4,17,19
4 113 372 4,17,19
4 113 375 4,31,33
4 113 391 4,31,33
4 113 408 4,31,33 发布于 2016-11-02 15:08:42
使用Group by和Having子句
Group by SERP_Group,SERP_item
Having count(1) >= 3https://stackoverflow.com/questions/40383311
复制相似问题