将句子转换成一个单词列表,然后找到根字符串的索引,应该是这样做的:
sentence = "lack of association between the promoter polymorphism of the mtnr1a gene and adolescent idiopathic scoliosis"
root = "mtnr1a"
try:
words = sentence.split()
n = words.index(root)
cutoff = ' '.join(words[n-4:n+5])
except ValueError:
cutoff = None
print(cutoff)结果:
promoter polymorphism of the mtnr1a gene and adolescent idiopathic如何在pandas数据帧中实现它?
我试着:
sentence = data['sentence']
root = data['rootword']
def cutOff(sentence,root):
try:
words = sentence.str.split()
n = words.index(root)
cutoff = ' '.join(words[n-4:n+5])
except ValueError:
cutoff = None
return cutoff
data.apply(cutOff(sentence,root),axis=1)但它不起作用。
编辑:
当词根在句子中的第一个位置时,当词根在句子中的最后一个位置时,如何在词根之后的4个字符串之后切分句子?例如:
sentence = "mtnr1a lack of association between the promoter polymorphism of the gene and adolescent idiopathic scoliosis"
out if root in first position:
"mtnr1a lack of association between"
out if root in last position:
"lack of association between the promoter polymorphism of the gene and adolescent idiopathic scoliosis"
"adolescent idiopathic scoliosis mtnr1a"发布于 2018-04-23 23:02:19
代码中的两个小调整应该可以解决您的问题:
首先,在数据帧上调用apply()会将该函数应用于调用该函数的dataframe的每一行中的值。
您不必将列作为输入传递给函数,而且调用sentence.str.split()也没有意义。在cutOff()函数中,sentence只是一个常规字符串(不是列)。
将函数更改为:
def cutOff(sentence,root):
try:
words = sentence.split() # this is the line that was changed
n = words.index(root)
cutoff = ' '.join(words[n-4:n+5])
except ValueError:
cutoff = None
return cutoff接下来,您只需指定将作为函数输入的列-您可以使用lambda完成此操作
df.apply(lambda x: cutOff(x["sentence"], x["rootword"]), axis=1)
#0 promoter polymorphism of the mtnr1a gene and a...
#dtype: objecthttps://stackoverflow.com/questions/49983773
复制相似问题