我有一个循环,我试图尽可能多地改进,可悲的是,我不知道如何让它变得更好。
你有什么改进的想法吗?
#partial is a data frame that look like this
partial = data.frame(
partial.regex = c("european construction industry federation",
" zentralverband des deutschen baugewerbes",
"hauptverband der deutschen bauindustrie",
"1 1 drillisch ag 439568220616 04"))
> summary(partial)
partial.name regex full.name
Length:13202 Length:13202 Length:13202
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
#full is also a df
full = data.frame(
full.name = c("International Lead Association (ILA)", "Airborne Wind Europe", "Sazka Group a.s",
regex = c("international lead association (ila)", "airborne wind europe", "sazka group a.s.")
> summary(full)
full.name regex
Length:9779 Length:9779
Class :character Class :character
Mode :character Mode :character 然后循环就来了。抱歉,如果这是愚蠢的,我是一个真正的初学者!
for(y in 1:dim(partial)[1]){
a = 0
b = ""
for(i in 1:dim(full)[1]){
vec = c(partial$regex[y], full$regex[i])
if(length(Reduce(`intersect`,stri_extract_all_regex(vec,"\\w+"))) > a){
a = length(Reduce(`intersect`,stri_extract_all_regex(vec,"\\w+")))
partial$full[y] = full$full.name[i]
}
}
}提前感谢您给我的所有帮助!
诚挚的问候,
PS : partial.csv = https://github.com/JMcrocs/MeetingMEPs/blob/main/partial.csv
full.csv = https://github.com/JMcrocs/MeetingMEPs/blob/main/full.csv
发布于 2021-05-10 06:22:53
这是一个使用外部+max.col`的基本R选项
partial$full <- full$full.name[
max.col(
lengths(outer(
regmatches(partial$partial.regex, gregexpr("\\w+", partial$partial.regex)),
regmatches(full$regex, gregexpr("\\w+", full$regex)),
FUN = Vectorize(intersect)
)),
ties.method = "first"
)
]https://stackoverflow.com/questions/67462258
复制相似问题