我正在编写一些代码来执行命名实体识别(NER),它非常适合英语文本。然而,我希望能够应用于任何语言。要做到这一点,我想: 1)识别文本的语言,然后2)将NER应用于所识别的语言。对于第二步,我怀疑A)将文本翻译成英语,然后将NER (以英语)或B)应用到所识别的语言中。
下面是我到目前为止的代码。我想要的是,在text2或任何其他语言中,在这种语言第一次被认可之后,NER可以为其工作:
import spacy
from spacy_langdetect import LanguageDetector
from langdetect import DetectorFactory
text = 'In 1793, Alexander Hamilton recruited Webster to move to New York City and become an editor for a Federalist Party newspaper.'
text2 = 'Em 1793, Alexander Hamilton recrutou Webster para se mudar para a cidade de Nova York e se tornar editor de um jornal do Partido Federalista.'
# Step 1: Identify the language of a text
DetectorFactory.seed = 0
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe(LanguageDetector(), name='language_detector', last=True)
doc = nlp(text)
print(doc._.language)
# Step 2: NER
Entities = [(str(x), x.label_) for x in nlp(str(text)).ents]
print(Entities)有人有这方面的经验吗?非常感谢!
发布于 2021-03-31 15:27:03
Spacy需要加载正确的语言模型。
有关可用模型,请参见https://spacy.io/usage/models。
import spacy
from langdetect import detect
nlp={}
for lang in ["en", "es", "pt", "ru"]: # Fill in the languages you want, hopefully they are supported by spacy.
if lang == "en":
nlp[lang]=spacy.load(lang + '_core_web_lg')
else:
nlp[lang]=spacy.load(lang + '_core_news_lg')
def entites(text):
lang = detect(text)
try:
nlp2 =nlp[lang]
except KeyError:
return Exception(lang + " model is not loaded")
return [(str(x), x.label_) for x in nlp2(str(text)).ents]然后,您可以一起运行这两个步骤。
ents = entites(text)
print(ents)https://stackoverflow.com/questions/66888668
复制相似问题