我试图从文本中提取引语和引语属性(即说话者),但我有错误。下面是设置:
import textacy
import pandas as pd
import spacy
data = [
("\"Hello, nice to meet you,\" said world 1"),
("\"Hello, nice to meet you,\" said world 2"),
]
df = pd.DataFrame(data, columns=['text'])
nlp = spacy.load('en_core_web_sm')
doc = df['text'].apply(nlp)以下是所需的输出:
[DQTriple(speaker=world 1,cue=said,content=,你好,很高兴见到你,")] [DQTriple(speaker=world 2,cue=said,content=“你好,很高兴认识你,”)
以下是提取的第一次尝试:
print(list(textacy.extract.triples.direct_quotations(doc) for records in doc))它提供了以下输出:
<生成器对象direct_quotations在0x7f82edf58ac0>,<生成器对象direct_quotations在0x7f82edf58190>
以下是提取的第二次尝试:
print(list(textacy.extract.triples.direct_quotations(doc)))这将产生以下错误:
AttributeError:“串联”对象没有属性“lang_”
发布于 2022-06-17 09:32:42
在第一次尝试中,您是通过遍历令牌来提取引号的。
下面是你可以做的事情的一个例子:
import textacy
import spacy
text =""" "Hello, nice to meet you," said world 1"""
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
print(list(textacy.extract.triples.direct_quotations(doc)))
# will print: [DQTriple(speaker=[world], cue=[said], content="Hello, nice to meet you,")]https://stackoverflow.com/questions/72643007
复制相似问题