因此,我有一个类似于以下所示的数据:
ID TEXT ReferenceTEXT TextID
1 Yo NA NA
2 Cool Yup 5
3 Nice NA NA
4 Phat Yup 5
5 Yup Phat 4
6 Boss NA NA
7 Yay Phat 4 通过在match中使用dataframe$TextID <- match(dataframe$ReferenceText,dataframe$Text, incomparables=NA)
我为TextID提取了ReferenceText。现在,我想在一个名为TextID的新列下获得SequenceID的序列/等级,如下所示:
ID TEXT ReferenceText TextID SequenceID
1 Yo NA NA NA
2 Cool Yup 5 5-1
3 Nice NA NA NA
4 Phat Yup 5 5-2
5 Yup Phat 4 4-1
6 Boss NA NA NA
7 Yay Phat 4 4-2但我该怎么做呢?完成这项任务最实际的方法是什么?这种解决方案是需要一个数据框架的160,000+观测。
发布于 2015-10-19 22:01:01
在base R中
df$SequenceID <- paste(df$TextID, ave(df$TextID, df$TextID, FUN=seq_along), sep="-")
is.na(df$SequenceID) <- is.na(df$TextID)
df
# ID TEXT ReferenceTEXT TextID SequenceID
# 1 1 Yo <NA> NA <NA>
# 2 2 Cool Yup 5 5-1
# 3 3 Nice <NA> NA <NA>
# 4 4 Phat Yup 5 5-2
# 5 5 Yup Phat 4 4-1
# 6 6 Boss <NA> NA <NA>
# 7 7 Yay Phat 4 4-2使用ave,创建类似id的序列并将其与id粘贴在一起。然后定义正确的NA值。
更新
为了获得更清晰的外观,您可以使用transform创建新列并将其赋值在一行中,并根据需要删除NA字符串:
newdf <- transform(df, SequenceID = paste(TextID, ave(TextID, TextID, FUN=seq_along), sep="-"))
is.na(newdf$SequenceID) <- is.na(df$TextID)发布于 2015-10-19 22:00:07
尝尝这个
library(dplyr)
dataframe %>%
group_by(ReferenceTEXT) %>%
mutate(SequenceID = ifelse(is.na(TextID), NA_character_, paste(TextID, seq_len(n()), sep="-")))
# Source: local data frame [7 x 5]
# Groups: ReferenceTEXT [3]
#
# ID TEXT ReferenceTEXT TextID SequenceID
# (int) (fctr) (fctr) (int) (chr)
# 1 1 Yo NA NA NA
# 2 2 Cool Yup 5 5-1
# 3 3 Nice NA NA NA
# 4 4 Phat Yup 5 5-2
# 5 5 Yup Phat 4 4-1
# 6 6 Boss NA NA NA
# 7 7 Yay Phat 4 4-2https://stackoverflow.com/questions/33224478
复制相似问题