我有一个数据集,我尝试给出一个使用下面的dput命令的示例。我遇到的问题是试图用分隔符分隔数据。
> dput(head(team_data))
structure(list(X1 = 2:6,
names2 = c("Andre Callender Seton Hall Preparatory School (West Orange, NJ)", "Gosder Cherilus Somerville (Somerville, MA)", "Justin Bell Mount Vernon (Alexandria, VA)", "Tom Anevski Elder (Cincinnati, OH)", "Brad Mueller Mars Area (Mars, PA)"),
pos2 = c("RB 5-10 185", "OT 6-7 270", "TE 6-3 250", "OT 6-5 265", "CB 6-0 170"), rating2 = c("0.8667 194 18 8", "0.8667 262 20 1", "0.8333 306 14 7", "0.8333 377 25 13", "0.8333 496 36 16"),
status2 = c("Enrolled 6/30/2003", "Enrolled 6/30/2003", "Enrolled 6/30/2003", "Enrolled 6/30/2003", "Enrolled 6/30/2003"), team = c("Boston-College", "Boston-College", "Boston-College", "Boston-College", "Boston-College"), year = c(2003L, 2003L, 2003L, 2003L, 2003L)),
.Names = c("X1", "names2", "pos2", "rating2", "status2", "team", "year"), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))下面是我试图在上面的数据集上执行的代码。据我所知,以下两个函数工作得很好,而且是预期的。
library(rvest)
library(stringr)
library(tidyr)
library(readxl)
df2<-separate(data=team_data,col=pos2,into= c("Position","Height","Weight"),sep=" ")
df3<-separate(data=df2,col=rating2,into= c("Rating","National","Position","State Rank"),sep=" ")但是,我在尝试进一步分离数据帧的列时遇到了很大的困难。我已经尝试了各种方法(下面的例子),但是下面的所有代码都产生了相同的错误," error : Data source必须是一个字典“。
df4<-separate(data=df3,col=names2,into= c("Name","Geo"),sep="(")
df4<-separate(data=df3,col=names2,into= c("Name","Geo"),sep='\\(|\\)')
df4<-separate(data=df3,col=status2,into= c("Date_Enrollment","Enroll_Status"),sep=" ")
df4<-separate(data=df3,col=status2,into= c("Date_Enrollment","Enroll_Status"),sep=" ")最终目标是将"(“处的"names2”列和",“处的”,“列分开,并删除")”,这样我将得到3列数据。对于另一列("status2"),目标是将“已注册”从注册日期中分离出来。
从我读到的错误来看,我得到的错误表明我正在复制列名,但我不知道这是在哪里发生的。
发布于 2018-01-29 22:26:00
您正在使用Position两次,一次是在df2中,一次是在df3中。这对我来说很有效:
team_data %>%
separate(col=pos2, into= c("Position","Height","Weight"), sep=" ") %>%
separate(col=rating2,into= c("Rating","National","Position2","State Rank"),sep=" ")%>%
separate(col=names2,into= c("Name","Geo"),sep="\\(") %>%
separate(col=status2,into= c("Date_Enrollment","Enroll_Status"),sep=" ") https://stackoverflow.com/questions/48503240
复制相似问题