我有以下数据帧( data ),其中包含区域名称(120个区域)、IPC类别(专利)和每个区域每个IPC的专利数量
(DATA)
REGION IPC Count
AT11 B29C 15
AT11 B32B 22
AT11 C02F 17
AT11 C09K 26
.........
FI19 A01C 67
FI19 G09G 13
FI19 H01F 32我有一个包含所有594 IPC类的数据帧
(ALLIPC)
A01B
A01C
A01D
A01F
...
H05K我想为每个区域创建一个dataframe DATA2,其中包含来自ALLIPC的所有594ipc类,即使该区域中没有计数,因此当该区域在数据中没有IPC类时,它将变为0,并且当该区域在数据中有一个计数时,将保留每个IPC的原始计数。
(DATA2)
REGION IPC Count
AT11 A01B 0
AT11 A01C 0
AT11 A01D 0
...
AT11 B29C 15
AT11 B32B 22
AT11 C02F 17
AT11 C09K 26
.........
FI19 A01B 0
FI19 A01C 67
FI19 A01D 0
....
FI19 G09G 13
FI19 H01F 32非常感谢!
发布于 2019-12-05 18:03:00
使用data.table
setDT(DATA)
setDT(ALLIPC)
DATA <- DATA[CJ(IPC = ALLIPC$IPC, REGION, unique = TRUE),
on = .(IPC, REGION),
][, Count := fifelse(is.na(Count), 0L, Count)
][order(REGION)]
DATA[]
REGION IPC Count
1: AT11 A01B 0
2: AT11 A01C 0
3: AT11 A01D 0
4: AT11 A01F 0
5: AT11 B29C 15
...
21: FI19 H01F 32
22: FI19 H05K 0
REGION IPC Count可复制的输入数据:
DATA <- data.frame(
REGION = c("AT11", "AT11", "AT11", "AT11", "FI19", "FI19", "FI19"),
IPC = c("B29C", "B32B", "C02F", "C09K", "A01C", "G09G", "H01F"),
Count = c(15L, 22L, 17L, 26L, 67L, 13L, 32L),
stringsAsFactors = FALSE
)
ALLIPC = data.frame(
IPC = c(
"A01B", "A01C", "A01D", "A01F", "H05K", "B29C", "B32B", "C02F", "C09K", "G09G", "H01F"
),
stringsAsFactors = FALSE
)发布于 2019-12-05 18:22:49
Base R解决方案:
df3 <- merge(df2, df, by = intersect(names(df), names(df2)), all.x = TRUE)
df3$Count <- replace(df3$Count, is.na(df3$Count), 0)数据:
df <- data.frame(
REGION = c("AT11", "AT11", "AT11", "AT11", "FI19", "FI19", "FI19"),
IPC = c("B29C", "B32B", "C02F", "C09K", "A01C", "G09G", "H01F"),
Count = c(15L, 22L, 17L, 26L, 67L, 13L, 32L),
stringsAsFactors = FALSE
)
df2 = data.frame(
IPC = c(
"A01B", "A01C", "A01D", "A01F", "H05K", "B29C", "B32B", "C02F", "C09K", "G09G", "H01F"
),
stringsAsFactors = FALSE)
https://stackoverflow.com/questions/59192264
复制相似问题