在R中,我有一个函数定义来计算两个字符串之间的交集:
containedin <- function(t1,t2){
return length(Reduce(intersect, strsplit(c(t1,t2), "\\s+")))
}我想在一个包含两个字符串列的数据框架上应用这个函数:data.selectedc(‘关键字’,'title')
keywords title
1 Samsung UN48H6350 48" Samsung UN48H6350 48" Full 1080p Smart HDTV 120Hz with Wi-Fi +$50 Visa Gift Card
2 Samsung UN48H6350 48" Samsung UN48H6350 48" Full HD Smart LED TV -Bundle- (See Below for Contents)
3 Samsung UN48H6350 48" Samsung UN48H6350 48" Class Full HD Smart LED TV -BUNDLE- See below Details
4 Samsung UN48H6350 48" Samsung UN48H6350 48" Full HD Smart LED TV With BD-H5100 Blu-ray Disc Player
5 Samsung UN48H6350 48" Samsung UN48H6350 48" Smart 1080p Clear Motion Rate 240 LED HDTV
6 Samsung UN48H6350 48" Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi
7 Samsung UN48H6350 48" Samsung 6350 Series UN48H6350 48" 1080p HD LED LCD Internet TV NEW
8 Samsung UN48H6350 48" Samsung Un48h6350af 75" 1080p Led-lcd Tv - 16:9 - Hdtv 1080p - (un75h6350afxza)
9 Samsung UN48H6350 48" Samsung UN48H6350 - 48" HD 1080p Smart HDTV 120Hz Bundle
10 Samsung UN48H6350 48" Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi, (R#416)如何使用对这2列应用的apply函数来返回一个带有结果的新列?
发布于 2015-01-14 09:33:36
首先,您的return语句确实会给您一个错误。你可能是说
containedin <- function(t1,t2){
length(Reduce(intersect, strsplit(c(t1,t2), "\\s+")))
}无论如何,您可以使用mapply来解决问题。
mapply(containedin,
as.character(data.selected[, 'keywords']),
as.character(data.selected[, 'title']))只有当as.character是factor (而不是character)时,才有必要使用factor
https://stackoverflow.com/questions/27939461
复制相似问题