我试着计算每个球员的进球数、主助攻数和助攻数。我的问题是我不能理解这样做的逻辑,因为我想通过(球员名字)来总结的数据是通过三个变量(目标,主要助攻和次要助攻)列出的。
这是我的可重现数据(它来自dput(),很抱歉弄得一团糟)。
mydata <- structure(list(primary_assist = c("Dmitry Gilyazitdinov", "Evgeny Orlov",
"Anton Burdasov", "Sergei Kalinin", "Stanislav Solovyov", "Vasily Streltsov",
NA, "Bogdan Potekhin", "Bogdan Potekhin", "Vasily Streltsov",
"Vasily Streltsov", "Viktor Postnikov", "Danil Kaskov", NA, NA,
"Artemy Panarin"), secondary_assist = c("Andrei Badrutdinov",
NA, NA, NA, "Danil Gubarev", "Nikita Manukhov", NA, "Evgeny Grigorenko",
"Daniil Apalkov", "Ivan Boiko", NA, "Viktor Antipin", "Vitaly Sychov",
NA, NA, "Stanislav Levin"), goal = c("Vitaly Kropachyov", "Dmitry Kozlov",
"Stanislav Solovyov", "Kirill Polyansky", "Anton Burdasov", "Ilya Solodov",
"Alexander Antropov", "Daniil Apalkov", "Evgeny Grigorenko",
"Alexander Antropov", "Alexander Antropov", "Evgeny Grigorenko",
"Denis Belonogov", "Vitaly Sychov", "Alexander Streltsov", "Pyotr Kopyttsov"
), team = c("Belye Medvedi", "Omskie Yastreby", "Belye Medvedi",
"Omskie Yastreby", "Belye Medvedi", "Avto", "Avto", "Stalnye Lisy",
"Stalnye Lisy", "Avto", "Avto", "Stalnye Lisy", "Avto", "Avto",
"Avto", "Russkie Vityazi"), game_strength = c("PP", "EV", "EV",
"EV", "EV", "PP", "SO", "EV", "PP", "PP", "EV", "PP", "PP", "EV",
"PP", "EV"), season = c("2009-10", "2009-10", "2009-10", "2009-10",
"2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10",
"2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-16L), .Names = c("primary_assist", "secondary_assist", "goal",
"team", "game_strength", "season"))mydata
#> # A tibble: 16 x 6
#> primary_assist secondary_assist goal team game_strength season
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Dmitry Gilyazitdinov Andrei Badrutdin~ Vita~ Bely~ PP 2009-~
#> 2 Evgeny Orlov <NA> Dmit~ Omsk~ EV 2009-~
#> 3 Anton Burdasov <NA> Stan~ Bely~ EV 2009-~
#> 4 Sergei Kalinin <NA> Kiri~ Omsk~ EV 2009-~
#> 5 Stanislav Solovyov Danil Gubarev Anto~ Bely~ EV 2009-~
#> 6 Vasily Streltsov Nikita Manukhov Ilya~ Avto PP 2009-~
#> 7 <NA> <NA> Alex~ Avto SO 2009-~
#> 8 Bogdan Potekhin Evgeny Grigorenko Dani~ Stal~ EV 2009-~
#> 9 Bogdan Potekhin Daniil Apalkov Evge~ Stal~ PP 2009-~
#> 10 Vasily Streltsov Ivan Boiko Alex~ Avto PP 2009-~
#> 11 Vasily Streltsov <NA> Alex~ Avto EV 2009-~
#> 12 Viktor Postnikov Viktor Antipin Evge~ Stal~ PP 2009-~
#> 13 Danil Kaskov Vitaly Sychov Deni~ Avto PP 2009-~
#> 14 <NA> <NA> Vita~ Avto EV 2009-~
#> 15 <NA> <NA> Alex~ Avto PP 2009-~
#> 16 Artemy Panarin Stanislav Levin Pyot~ Russ~ EV 2009-~所以,我想要计算每个球员的进球数、主助攻和次助攻的数量,然后每个球员有1行。假设"Artemy Panarin“的名字在目标中列出了1次,在主要助攻中列出了0次,在次要助攻中列出了2次,我的输出将如下所示:
tibble::tibble(name = c("Artemy Panarin", "Stanislav Levin", "Danil Kaskov"), team = c("Russkie Vityazi", "Russkie Vityazi", "Avto"), goals = c(1, 1, 0), primary_assists = c(0, 0, 1), secondary_assists = c(2, 0, 0))
#> # A tibble: 3 x 5
#> name team goals primary_assists secondary_assists
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Artemy Panarin Russkie Vityazi 1.00 0 2.00
#> 2 Stanislav Levin Russkie Vityazi 1.00 0 0
#> 3 Danil Kaskov Avto 0 1.00 0这有什么意义吗?有什么想法吗?首选Tidyverse解决方案。谢谢!
发布于 2018-07-18 23:29:29
我们可以宽到‘gather’格式,按'name','team‘和'key’列(来自gather)分组,用summarise将计数和spread返回到'wide‘格式
library(tidyverse)
gather(mydata, key, name, primary_assist:goal) %>%
group_by(name, team, key) %>%
summarise(n = n()) %>%
spread(key, n, fill = 0)
# A tibble: 30 x 5
# Groups: name, team [30]
# name team goal primary_assist secondary_assist
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 Alexander Antropov Avto 3 0 0
# 2 Alexander Streltsov Avto 1 0 0
# 3 Andrei Badrutdinov Belye Medvedi 0 0 1
# 4 Anton Burdasov Belye Medvedi 1 1 0
# 5 Artemy Panarin Russkie Vityazi 0 1 0
# 6 Bogdan Potekhin Stalnye Lisy 0 2 0
# 7 Daniil Apalkov Stalnye Lisy 1 0 1
# 8 Danil Gubarev Belye Medvedi 0 0 1
# 9 Danil Kaskov Avto 0 1 0
#10 Denis Belonogov Avto 1 0 0
# ... with 20 more rows发布于 2018-07-18 23:28:29
获得结果的一种方法是除了您的汇总策略之外,还可以使用gather()/spread()重塑您的数据。
library(tidyverse)
scoring_summary <- mydata %>%
select(primary_assist:team) %>%
gather("key", "player", -team) %>%
group_by(player) %>%
count(key) %>%
spread(key, n)
# convert NAs to 0
scoring_summary[is.na(scoring_summary)] <- 0
scoring_summary
# A tibble: 28 x 4
# Groups: player [28]
player goal primary_assist secondary_assist
<chr> <dbl> <dbl> <dbl>
1 Alexander Antropov 3 0 0
2 Alexander Streltsov 1 0 0
3 Andrei Badrutdinov 0 0 1
4 Anton Burdasov 1 1 0
5 Artemy Panarin 0 1 0
6 Bogdan Potekhin 0 2 0
7 Daniil Apalkov 1 0 1
8 Danil Gubarev 0 0 1
9 Danil Kaskov 0 1 0
10 Denis Belonogov 1 0 0count()正在执行与您最初尝试使用summarise(count(goals)相同的操作
发布于 2018-07-18 23:41:12
您可以使用聚集和扩散。首先将球门和助攻列收集到一个“键”中,然后按键和球员分组。您可以稍后将NA转换为0
library(tidyverse)
mydata_tidy <- mydata %>%
gather(key = "key", value = "player", primary_assist, secondary_assist, goal) %>%
na.omit()
mydata_tidy %>%
group_by(key, player) %>%
summarize(count = n()) %>%
spread(key, count) %>%
filter(player %in% c("Artemy Panarin", "Stanislav Levin", "Danil Kaskov"))
#> # A tibble: 3 x 4
#> player goal primary_assist secondary_assist
#> <chr> <int> <int> <int>
#> 1 Artemy Panarin NA 1 NA
#> 2 Danil Kaskov NA 1 NA
#> 3 Stanislav Levin NA NA 1由reprex package创建于2018-07-18 (v0.2.0)。
https://stackoverflow.com/questions/51405462
复制相似问题