我有一个这样的数据集,我想使用pandas系列来求解,并得到user_id的输出、上次登录日期、丢失的登录位置、最频繁的登录位置和最频繁的第二次登录位置
event_name event_date user_id user_city user_state
exit_click 06-09-2021 10795552 Kayamkulam Kerala
exit_click 06-09-2021 11129909 Tiruppur Tamil Nadu
exit_click 06-09-2021 11028532 Thrissur Kerala
exit_click 06-09-2021 5701734 Thrissur Kerala
exit_click 06-09-2021 13178561 Navi Mumbai Maharashtra
exit_click 06-09-2021 4631431 Madurai Tamil Nadu
exit_click 06-09-2021 4243309 Thane Maharashtra
exit_click 06-09-2021 12934603 SriperumbudurTamil Nadu
exit_click 06-09-2021 12757354 Ghaziabad Uttar Pradesh
exit_click 06-09-2021 12504154 Kalyan Maharashtra
exit_click 06-09-2021 12530698 Ulhasnagar Maharashtra
exit_click 06-09-2021 12360310 Beed Maharashtra
exit_click 06-09-2021 11431196 Udaipur Rajasthan
exit_click 06-09-2021 13166134 VisakhapatnamAndhra Pradesh
exit_click 06-09-2021 2136420 Rudrapur Uttarakhand
exit_click 06-09-2021 9494724 Payyanur Kerala
exit_click 06-09-2021 12396316 Baddi Himachal Pradesh
exit_click 06-09-2021 10249015 Bhopal Madhya Pradesh
exit_click 06-09-2021 10724140 Pathanamthitta Kerala
exit_click 06-09-2021 9986130 Puducherry Puducherry
exit_click 06-09-2021 7229405 Patiala Punjab
exit_click 06-09-2021 7006354 Guntur Andhra Pradesh
exit_click 06-09-2021 8061789 Mehsana Gujarat
exit_click 06-09-2021 9341808 Bhopal Madhya Pradesh
exit_click 06-09-2021 9379141 Navi Mumbai Maharashtra
exit_click 06-09-2021 6157171 Rohtak Haryana
exit_click 06-09-2021 13124731 Khammam Telangana
exit_click 06-09-2021 13172076 Amravati Maharashtra输出显示如下所示
user_id | Last_date | most Location | Location of | Location on |
| of_login | of Latest Login | Max Logins | Second Most Logins |
| | | | |
3 |06=09-2021 |Gurgaon | Thane | Gurgaon |所以我尝试了这个逻辑
select bq.user_id as user_id,
bq.event_date as Date_of_Last_Login,
bq.user_city as Location_of_Latest_Login,
max(user_city) as Location_of_Max_Logins from bq
group by user_id
order by event_date DESC ;我得到的结果是
Date_of_Last_Login,
Location_of_Latest_Login,
Location_of_Max_Logins
但是我找不到Location_of_2nd_Max_Logins的逻辑
发布于 2021-09-15 10:38:23
您不能在单个聚合中执行此操作。您可以做的是按城市和用户使用聚合并枚举结果--一次按计数,一次按近期。然后在用户级别再次聚合:
select user_id, max(login_date) as max_login_date,
max(case when seqnum_ed = 1 then user_city end) as max_login_date_city,
max(case when seqnum_cnt = 1 then user_city end) as max_login_city,
max(case when seqnum_cnt = 2 then user_city end) as max_2_login_city
from (select bq.user_id, bq.user_city,
count(*) as num_logins,
max(event_date) as max_login_date,
row_number() over (partition by bq.user_id order by count(*) desc) as seqnum_cnt,
row_number() over (partition by bq.user_id order by max(event_date) desc) as seqnum_ed
from bq
group by user_id
) u
group by user_id;https://stackoverflow.com/questions/69190824
复制相似问题