我需要根据基于spacy规则的配对结果分割一只熊猫DataFrame。以下是我尝试过的。
import pandas as pd
import numpy as np
import spacy
from spacy.matcher import Matcher
df = pd.DataFrame([['Eight people believed injured in serious SH1 crash involving truck and three cars at Hunterville',
'Fire and emergency responding to incident at Mataura, Southland ouvea premix site',
'Civil Defence Minister Peeni Henare heartbroken over Northland flooding',
'Far North flooding: New photos reveal damage to roads']]).T
df.columns = ['col1']
nlp = spacy.load("en_core_web_sm")
flood_pattern = [{'LOWER': 'flooding'}]
matcher = Matcher(nlp.vocab, validate=True)
matcher.add("FLOOD_DIS", None, flood_pattern)
titles = (_ for _ in df['col1'])
g = (d for d in nlp.pipe(titles) if matcher(d))
x = list(g)
df2 = df[df['col1'].isin(x)]
df2这会产生一个空的DataFrame。但是,它应该从df中提取以下两行。
发布于 2020-07-20 22:41:43
你可以做下面的事。
titles = (_ for _ in df['col1'])
g = (d for d in nlp.pipe(titles) if matcher(d))
A = []
for i in range(len(df)):
doc = nlp(next(titles))
if len(matcher(doc)) == 1:
A.append(str(doc))
df2 = df[df['col1'].isin(A)]发布于 2020-07-25 10:31:06
试试这个:
matcher.add("FLOOD_DIS", None, flood_pattern)
matches = [True if matcher(doc) else False for doc in nlp.pipe(df['col1'])]
df2 = df[matches][['col1']]https://stackoverflow.com/questions/62993303
复制相似问题