假设您有一个带有(id、userid、时间戳)的表
从数据集来看,我所追求的结果如下:
所以问题是:
发布于 2010-12-02 10:07:42
您可以将列表加入到它本身,并尝试查找与您的条件匹配的条目。类似的事情(未经检查):
SELECT l1.userid FROM Logs as l1
INNER JOIN Logs as l2
ON l2.timestamp > l1.timestamp
AND l2.timestamp < l1.timestamp + "1 week"
AND l1.userid = l2.userid编辑:
匹配的计数可能会有所帮助:
SELECT COUNT(l1.userid) as matches, l1.userid FROM Logs as l1
INNER JOIN Logs as l2
ON l1.userid = l2.userid
AND l2.timestampFake > l1.timestampFake
AND l2.timestampFake < l1.timestampFake + @interval
WHERE l1.timestampFake > @start AND l1.timestampFake < @end
GROUP BY l1.userid发布于 2010-12-02 10:19:44
假设activity (user_id, ts)和@ts_start与@ts_end之间的利息周期
你可以试试
1)每周不活动
SELECT user_id
FROM activity
WHERE CEILING(DATEDIFF(@ts_end,@ts_start)/7) <
(SELECT COUNT(*)
FROM (SELECT 1
FROM activity sub
WHERE ts BETWEEN @ts_start AND @ts_end
AND sub.user_id = activity.user_id
GROUP BY YEAR(ts), WEEK(ts)) x
)2)每两周活动一次。
SELECT user_id
FROM activity
WHERE CEILING(DATEDIFF(@ts_end,@ts_start)/14) <
(SELECT COUNT(*)
FROM (SELECT 1
FROM activity
WHERE ts BETWEEN @ts_start AND @ts_end
AND sub.user_id = activity.user_id
GROUP BY YEAR(ts), WEEK(ts) DIV 2) x)这只是第一个想法,没有经过测试(此外,查询检查是否每周超过一次和每周超过两次,用=替换=应该分别更改为每周一次和每周两次)
编辑:在上面编辑的查询中出现了错误。
另一个想法是转换需求--每周都是活动的(或者更频繁!)意味着没有活动的星期(这意味着找到连续活动时间戳的最大值,并查看它是否小于或等于7天;可以通过在所有时间戳上加入self并找到这些时间戳的最小值来找到连续活动时间戳)。
SELECT user_id
FROM activity
WHERE 7 >=
(SELECT MAX(DATEDIFF(ts2,ts1))
FROM (SELECT a1.ts AS ts1, MIN(a2.ts) AS ts2
FROM activity a1
INNER JOIN activity a2 ON
a1.user_id = a2.user_id AND a1.ts < a2.ts
WHERE activity.user_id = a1.user_id AND
a1.ts BETWEEN @ts_start AND @ts_end AND
a2.ts BETWEEN @ts_start AND @ts_end AND) x )在两周内用14代替7,然后按条件进行比赛(7<.这意味着有一个比一个星期更长的间隔,所以在所有的周中都不活跃)将一个查询从每周不活动转到每周活跃(两周)。
EDIT2应该很容易将上面的查询更改为返回每个用户的最大不活动时间。
SELECT user_id, MAX(DATEDIFF(ts2,ts1))
FROM (SELECT a1.ts AS ts1, MIN(a2.ts) AS ts2, a1.user_id AS user_id
FROM activity a1
INNER JOIN activity a2 ON
a1.user_id = a2.user_id AND a1.ts < a2.ts
WHERE a1.ts BETWEEN @ts_start AND @ts_end AND
a2.ts BETWEEN @ts_start AND @ts_end AND
GROUP BY a1.user_id) x然后,为了报告的目的,可以对其进行分类或分组。
EDIT3上面的查询似乎像看起来那样困扰着mysql (?)这种相关性在WHERE部分中存在问题(它不应该使用postgres进行测试,类似的查询没有异议)
我们可以很容易地将相关的条件转化为连接,但在此期间,我意识到可以进行一些简化。
SELECT user_id, COUNT(DISTINCT WEEK(ts))
FROM activity
WHERE ts BETWEEN @ts_start AND @ts_end
GROUP BY user_id, YEAR(ts), WEEK(ts))
HAVING COUNT(DISTINCT WEEK(ts)) > CEILING(DATEDIFF(@ts_end,@ts_start)/7)上面的查询在超过1年的范围内有问题(您必须修改count,以便在having部分中执行类似COUNT(DISTINCT YEAR(ts)*100+WEEK(ts))的操作,但我将其保持为这样,因为如果表达式简单,它可能利用索引来计数不同的值)。此外,应该检查跨年底的范围--周函数在新的一年中可能有更短/更长的一周,参见文档中的详细信息。
现在我又重新写了一遍,下面的代码看起来应该更干净,如果有索引(user_id,ts),我相信它会很快的
SELECT user_id, COUNT(DISTINCT DATEDIFF(ts,@ts_start) DIV 7)
FROM activity
WHERE ts BETWEEN @ts_start AND @ts_end
GROUP BY user_id
HAVING COUNT(DISTINCT DATEDIFF(@ts_end,@ts_start) DIV 7) =
(DATEDIFF(@ts_end,@ts_start) DIV 7)https://stackoverflow.com/questions/4333644
复制相似问题