我有一些数据要填写。
看起来是这样的:
CO_1_Name Ticker2LP
1: Sonic Corp SONC
2: Sonic Corp <NA>
3: Baxter International Inc <NA>
4: Baxter International Inc BAX
5: Lockheed Martin Corporation LMT有一些缺失值。我想通过在其他行中找到的Ticker2LP ID来填充这些缺少的值。预期输出为:
CO_1_Name Ticker2LP
1: Sonic Corp SONC
2: Sonic Corp SONC # since the observation above has "SONC"
3: Baxter International Inc BAX # since the observation below has "BAX"
4: Baxter International Inc BAX
5: Lockheed Martin Corporation LMT所以我想填写NAs inTicker2LPwhen the names match from the columnCO_1_Name`."SONC“和"BAX”被填写,因为他们的名字在其他行中被发现。
数据:
structure(list(CO_1_Name = c("Sonic Corp", "Sonic Corp", "Baxter International Inc",
"Baxter International Inc", "Lockheed Martin Corporation", "Lockheed Martin Corporation",
"Lockheed Martin Corporation", "Baxter International Inc", "Baxter International Inc",
"Lockheed Martin Corporation", "Sonic Corp", "Sonic Corp", "Baxter International Inc",
"Baxter International Inc", "Baxter International Inc", "Lockheed Martin Corporation",
"Lockheed Martin Corporation", "Lockheed Martin Corporation",
"Baxter International Inc", "Sonic Corp", "Lockheed Martin Corporation",
"Lockheed Martin Corporation", "Lockheed Martin Corporation",
"Lockheed Martin Corporation", "Baxter International Inc", "Sonic Corp",
"Baxter International Inc", "Lockheed Martin Corporation", "Sonic Corp",
"Lockheed Martin Corporation", "Lockheed Martin Corporation",
"Baxter International Inc", "Baxter International Inc", "Baxter International Inc",
"Lockheed Martin Corporation", "Baxter International Inc", "Baxter International Inc",
"Baxter International Inc", "Baxter International Inc", "Baxter International Inc",
"Baxter International Inc", "Baxter International Inc", "Baxter International Inc",
"Baxter International Inc", "Baxter International Inc", "Lockheed Martin Corporation",
"Sonic Corp", "Lockheed Martin Corporation", "Lockheed Martin Corporation",
"Lockheed Martin Corporation", "Lockheed Martin Corporation",
"Lockheed Martin Corporation", "Baxter International Inc", "Baxter International Inc",
"Lockheed Martin Corporation", "Baxter International Inc", "Lockheed Martin Corporation",
"Baxter International Inc", "Baxter International Inc", "Baxter International Inc"
), Ticker2LP = c("SONC", NA, NA, "BAX", "LMT", "LMT", "MLM",
"AP", "BAX", "LMT", "SONC", "SONC", "BAX", "BAX", NA, "LMT",
"LMT", "LMT", NA, NA, "LMT", NA, "NOC", NA, "BAX", NA, NA, "LMT",
"SONC", NA, "LMT", NA, NA, "BAX", "NOC", "BAX", NA, NA, NA, "BAX",
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA)), .internal.selfref = <pointer: 0x55603dbefe00>, row.names = c(NA,
-60L), class = c("data.table", "data.frame"))发布于 2020-04-29 23:43:09
正如您可能知道的,您的示例数据包含CO_1_Name和Ticker2LP的非惟一组合。
library(data.table)
unique(data[!is.na(Ticker2LP),])
CO_1_Name Ticker2LP
1: Sonic Corp SONC
2: Baxter International Inc BAX
3: Lockheed Martin Corporation LMT
4: Lockheed Martin Corporation MLM
5: Baxter International Inc AP
6: Lockheed Martin Corporation NOC这听起来像是你想实现某种形式的“填充附近的值”。一种方法是设置一个ID列并执行滚动联接。
setDT(data)[,id := 1:.N]
data[!is.na(Ticker2LP),][data , on = c("CO_1_Name","id"), roll = "nearest"][,.(CO_1_Name,Ticker2LP)]
CO_1_Name Ticker2LP
1: Sonic Corp SONC
2: Sonic Corp SONC
3: Baxter International Inc BAX
4: Baxter International Inc BAX
5: Lockheed Martin Corporation LMT
6: Lockheed Martin Corporation LMT
7: Lockheed Martin Corporation MLM
8: Baxter International Inc AP
9: Baxter International Inc BAX
10: Lockheed Martin Corporation LMT发布于 2020-04-30 02:38:38
使用filter
library(dplyr)
data %>%
filter(complete.cases(Ticker2LP)) %>%
distinct
# CO_1_Name Ticker2LP
#1: Sonic Corp SONC
#2: Baxter International Inc BAX
#3: Lockheed Martin Corporation LMT
#4: Lockheed Martin Corporation MLM
#5: Baxter International Inc AP
#6: Lockheed Martin Corporation NOChttps://stackoverflow.com/questions/61505011
复制相似问题