我想数一数每个类别在我的数据中发生的频率。
要做到这一点,我需要计算每一行中的类别,并将这个数字乘以第5列的和。
(我的分析不需要列c4 )
优先产出将是:
分析= 131
Ads= 253
Identification= ..
我的数据如下:
tracker_category <- data.frame = c("Tracker1", "Tracker2", "Tracker3", "Tracker4","Tracker5","Tracker6"),
c1 = c("Analytics", "Crash", "Location", "Identification", "Analytics", "Ads"),
c2 = c("Ads", "Analytics", "Location", "Analytics", "Identification", "Ads"),
c3 = c("Identification", "Analytics", "Ads", "Ads", "Analytics", "Location"),
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"),
sum_tracker = c(1,20,100,0,5,76))发布于 2021-06-02 15:47:08
以下是你所追求的东西。
您可以将数据帧转换为“长”格式,然后添加事件(列5)。
data 备注:为了支持可重现性,我更正了您的数据帧定义。
tracker_category <- data.frame(
id = c("Tracker1", "Tracker2", "Tracker3", "Tracker4","Tracker5","Tracker6"),
c1 = c("Analytics", "Crash", "Location", "Identification", "Analytics", "Ads"),
c2 = c("Ads", "Analytics", "Location", "Analytics", "Identification", "Ads"),
c3 = c("Identification", "Analytics", "Ads", "Ads", "Analytics", "Location"),
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"),
sum_tracker = c(1,20,100,0,5,76)
)强制采用长格式 {tidyr},为此提供了pivot_longer()函数。
library(dplyr)
library(tidyr)
tracker_category %>%
select(-c4) %>% # remove c4
pivot_longer( cols = c(c1:c3) # which cols to use
, names_to = "action" # where to store the names
, values_to = "categories") # and values这产生了:
# A tibble: 18 x 4
id sum_tracker action categories
<chr> <dbl> <chr> <chr>
1 Tracker1 1 c1 Analytics
2 Tracker1 1 c2 Ads
3 Tracker1 1 c3 Identification
4 Tracker2 20 c1 Crash
5 Tracker2 20 c2 Analytics
6 Tracker2 20 c3 Analytics
7 Tracker3 100 c1 Location
8 Tracker3 100 c2 Location
9 Tracker3 100 c3 Ads
10 Tracker4 0 c1 Identification
11 Tracker4 0 c2 Analytics
12 Tracker4 0 c3 Ads
13 Tracker5 5 c1 Analytics
14 Tracker5 5 c2 Identification
15 Tracker5 5 c3 Analytics
16 Tracker6 76 c1 Ads
17 Tracker6 76 c2 Ads
18 Tracker6 76 c3 Location根据这种格式,您可以使用summarise()对您的组执行{dplyr}。
tracker_category %>%
select(-c4) %>%
pivot_longer(cols = c(c1:c3), names_to = "action", values_to = "categories") %>%
#------------- group by your categories
group_by(categories) %>%
#------------- and sum over your tracked results, note to use sum and not multiplication as we used a long format
summarise(total = sum(sum_tracker))这产生了:
# A tibble: 5 x 2
categories total
<chr> <dbl>
1 Ads 253
2 Analytics 51
3 Crash 20
4 Identification 6
5 Location 276请检查你的131分析例子是否真的正确.
https://stackoverflow.com/questions/67807611
复制相似问题