数据作为列表:
dfcheck <- data.frame(status = c("open/close", "close", "open"), stock = c("company energy","goods and books","other"), name = c("amazon1;google1","google3;yahoo1","yahoo2;amazon2;google2"))和这样的输入数据:
dfdata <- data.frame(id = c("id1", "id2", "id3"), title1 = c("amazon1","google1","yahoo1"), title2 = c("yahoo2",NA,"amazon2"))如何能够根据前面的列表生成包含列的数据文件:
预期产出: dfdata <- data.frame(id = c("id1“、"id2”、"id3")、title1 = c("amazon1“、"google1”、"yahoo1")、title2 = c("yahoo2“、NA、"amazon2")、status1 =c(”开放/关闭“、”开放/关闭“、”关闭“)、stock1 =c(”公司能源“、”公司能源“、”商品和书籍“)、status2 =c(”开放“、NA、”开放“),stock2 =c(“其他”、NA、“其他”)
id title1 title2 status1 stock1 status2 1 id1 amazon1 yahoo2打开/关闭公司能源开放2 id2 google1打开/关闭公司能源3 id3 yahoo1 amazon2关闭商品和书籍打开stock2 1其他2 3
此dataframe检查每个列中的dfdata,如果dfcheck dataframe中存在任何值,则期望第一个id列,并创建两个具有dfcheck状态和库存的新列。在dfcheck中,列名有多个由“;;”分隔的值。
发布于 2022-11-25 15:39:48
图书馆:
library(dplyr)
library(stringr)
library(tidyr)首先,您需要整理您的dfcheck data.frame:
dfcheck_tidy <- dfcheck %>%
mutate(name = str_split(name, ";")) %>%
unnest(name)(我不使用tidyr::separate来做这件事,从您的示例中可以看出,您可以有一个由“;”分隔的可变长度的名称。)
现在您可以执行这两个联接:
dfdata %>%
left_join(dfcheck_tidy,
by = c("title1" = "name")) %>%
left_join(dfcheck_tidy,
by = c("title2" = "name"),
suffix = c("1", "2"))
# id title1 title2 status1 stock1 status2 stock2
# 1 id1 amazon1 yahoo2 open/close company energy open other
# 2 id2 google1 <NA> open/close company energy <NA> <NA>
# 3 id3 yahoo1 amazon2 close goods and books open other发布于 2022-11-25 15:53:09
下面是从regex_join()包中使用fuzzyjoin的一种方法。
library(dplyr)
library(fuzzyjoin)
regex_right_join(dfcheck, dfdata, by = c(name = "title1")) %>%
regex_right_join(dfcheck, ., by = c(name = "title2")) %>%
select(!contains("name")) %>%
relocate(id, title1, title2) id title1 title2 status.x stock.x status.y stock.y
1 id1 amazon1 yahoo2 open other open/close company energy
2 id2 google1 <NA> <NA> <NA> open/close company energy
3 id3 yahoo1 amazon2 open other close goods and bookshttps://stackoverflow.com/questions/74574699
复制相似问题