我有一个df,其中每一行代表一个个体,每一列代表这些个体的特征。其中一个列是TeamName,它是个人所属的团队的名称。多个人属于一个团队。
我希望在R中有一个函数,为每个团队创建一个包含团队成员数量的新列。
所以,例如,我有:
df
Name Surname TeamName
John Smith Champions
Mary Osborne Socceroos
Mark Johnson Champions
Rory Bradon Champions
Jane Bryant Socceroos
Bruce Harper 我想要
df1
Name Surname TeamName TeamNo
John Smith Champions 3
Mary Osborne Socceroos 2
Mark Johnson Champions 3
Rory Bradon Champions 3
Jane Bryant Socceroos 2
Bruce Harper 0所以正如你所看到的,计数也包括了那个人,如果某人(比如布鲁斯·哈珀)没有球队的名字,那么他得到了0。
我怎么能这么做?谢谢!
发布于 2015-07-31 06:30:51
这是一种基于使用data.table的解决方案,这可能是您所需要的太多了,但如下所示:
library(data.table)
dt=data.table(df)
# First, let's convert the factors of TeamName, to characters
dt[,TeamName:=as.character(TeamName)]
# Now, let find all the team numbers
dt[,TeamNo:=.N, by='TeamName']
# Let's exclude the special cases
dt[is.na(TeamName),TeamNo:=NA]
dt[TeamName=="",TeamNo:=NA]这显然不是最好的解决办法,但我希望这会有所帮助。
发布于 2015-07-31 06:46:56
如果您需要根据“unique”列知道前两列中的TeamName成员数,一个选项是来自dplyr的n_distinct
library(dplyr)
library(tidyr)
df %>%
unite(Var, Name, Surname) %>% #paste the columns together
group_by(TeamName) %>% #group by TeamName
mutate(TeamNo= n_distinct(Var)) %>% #create the TeamNo column
separate(Var, into=c('Name', 'Surname')) #split the 'Var' column或者,如果仅仅是每个'TeamName‘的行数,我们可以按'TeamName’进行分组,使用n()获取每个组的行数,根据这个n()创建带有mutate的'TeamNo‘列,如果需要,可以使用ifelse条件为'TeamName’(即''或NA )提供NA。
df %>%
group_by(TeamName) %>%
mutate(TeamNo = ifelse(is.na(TeamName)|TeamName=='', NA_integer_, n()))
# Name Surname TeamName TeamNo
#1 John Smith Champions 3
#2 Mary Osborne Socceroos 2
#3 Mark Johnson Champions 3
#4 Rory Bradon Champions 3
#5 Jane Bryant Socceroos 2
#6 Bruce Harper NA或者您可以使用来自base R的base R。假设有''和NA,我首先将''转换为NA,然后使用ave获得按该列分组的“TeamNo”的length。它将给出“`NA”值的NA。例如。
v1 <- c(df$TeamName, NA)# appending an NA with the example to show the case
is.na(v1) <- v1=='' #convert the `'' to `NA`
as.numeric(ave(v1, v1, FUN=length))
#[1] 3 2 3 3 2 NA NA发布于 2015-07-31 08:20:47
使用sqldf
library(sqldf)
sqldf("SELECT Name, Surname, TeamName, n
FROM df
LEFT JOIN
(SELECT TeamName, COUNT(Name) AS n
FROM df
WHERE NOT TeamName IS '' GROUP BY TeamName)
USING (TeamName)")输出:
Name Surname TeamName n
1 John Smith Champions 3
2 Mary Osborne Socceroos 2
3 Mark Johnson Champions 3
4 Rory Bradon Champions 3
5 Jane Bryant Socceroos 2
6 Bruce Harper NAhttps://stackoverflow.com/questions/31739063
复制相似问题