我有以下示例dataframe:
countries = c("Australia", "Australia", "Chile", "Chile", "Brazil", "Brazil", "Brazil")
techs = c("AI", "AI", "AI", "Bio", "AI", "Bio", "computers")
value = c(404, 402, 2313, 424, 1424, 2141, 214)
year = c(2018, 2019,2018, 2018, 2018, 2018, 2018)
df = data.frame(countries, techs, value, year)我有一个函数,计算每个国家每项技术的总价值(实质上是每项技术和国家的年份之和):
country_tech = function(data, tech, country){
result = data %>%
select(countries, techs, value) %>%
filter(countries == country) %>%
filter(techs == tech) %>%
summarise(Total = sum(value, na.rm = TRUE))
}我创建了一个新的dataframe,它对国家/技术进行分组,并减少年份,这样我就可以在其中追加新的数据:
df2 = select(df, countries, techs) %>% group_by(countries, techs) %>% distinct() 然后,我在我的新的dataframe中创建了一个新列,该函数总结了每个国家的技术价值:
df2 = df2 %>% mutate(value = country_tech(df, techs, countries ))一切都很好。但是,由于我在制作df2时没有取消分组,所以我在分发数据时遇到了问题。
如果我添加一个ungroup(),例如:
df2 = select(df, countries, techs) %>% group_by(countries, techs) %>% distinct() %>% ungroup()然后,我的函数不再工作,并得到以下错误:
Error: Problem with `mutate()` input `value`.
x Problem with `filter()` input `..1`.
x Input `..1` must be of size 4 or 1, not size 6.
i Input `..1` is `techs == tech`.
i Input `value` is `country_tech(df, techs, countries)`.有人知道我哪里出了问题吗?
发布于 2021-01-22 13:42:30
进一步更新了,您已经将该列命名了两次,这就是问题的原因。
像这样使用它,它就会工作(不要给出列的名称,因为您已经在自定义函数中给出了它)
df2 %>% group_by(techs, countries) %>% mutate(country_tech(df, techs, countries)) %>% ungroup() %>%
spread(techs, value)
# A tibble: 3 x 4
countries AI Bio computers
<chr> <dbl> <dbl> <dbl>
1 Australia 806 NA NA
2 Brazil 1424 2141 214
3 Chile 2313 424 NA更新
实际上,通过函数方法生成的列名就是问题所在。看看你能不能这样做就行了。
#ungrouping as you desire
df2 = select(df, countries, techs) %>% group_by(countries, techs) %>% distinct() %>% ungroup()
#mutating with custom function
df2 %>% group_by(techs, countries) %>% mutate(value = country_tech(df, techs, countries)) %>% ungroup()
# A tibble: 6 x 3
countries techs value$Total
<chr> <chr> <dbl>
1 Australia AI 806
2 Chile AI 2313
3 Chile Bio 424
4 Brazil AI 1424
5 Brazil Bio 2141
6 Brazil computers 214注意上面结果中的列名。
# using pivot_wider instead of spread
df2 %>% group_by(techs, countries) %>% mutate(value = country_tech(df, techs, countries)) %>% ungroup() %>%
pivot_wider(names_from = techs, values_from = value)
# A tibble: 3 x 4
countries AI$Total Bio$Total computers$Total
<chr> <dbl> <dbl> <dbl>
1 Australia 806 NA NA
2 Chile 2313 424 NA
3 Brazil 1424 2141 214旧答案我想知道你为什么不用这个来获得你的最终输出
df %>% group_by(countries, techs) %>% summarise(value_total = sum(value)) %>% ungroup()
# A tibble: 6 x 3
countries techs value_total
<chr> <chr> <dbl>
1 Australia AI 806
2 Brazil AI 1424
3 Brazil Bio 2141
4 Brazil computers 214
5 Chile AI 2313
6 Chile Bio 424ungroup()在这个例子中也是多余的。
编辑如果您想使用自定义函数,请尝试如下
df2 = select(df, countries, techs) %>% group_by(countries, techs) %>% slice_head() 发布于 2021-01-22 14:45:04
在每次调用中都有一个接受原始数据集和筛选某些值的函数,这是效率低下的。您应该按照所需的术语将数据集拆分,然后将某些函数应用于数据集。如果您需要做“多件事情”,我假设您希望您的函数返回一个具有多个值的数据框架(将它们添加到summarise函数中)。您可以在nest编辑的数据上这样做。
country_tech = function(data_subset){
data_subset %>%
summarise(Total = sum(value, na.rm = TRUE))
}
df %>%
group_by(countries, techs) %>%
nest() %>%
mutate(data = map(data, country_tech)) %>%
unnest(data)输出:
# A tibble: 6 x 3
# Groups: countries, techs [9]
countries techs Total
<fct> <fct> <dbl>
1 Australia AI 806
2 Chile AI 2313
3 Chile Bio 424
4 Brazil AI 1424
5 Brazil Bio 2141
6 Brazil computers 214https://stackoverflow.com/questions/65845960
复制相似问题