我有以下句子,需要提取公司的名称和它的符号。
到目前为止,我已经尝试过这个([A-Z][a-z]*)(\s)([A-Z]{1,5}),但是当名称有多个大写字母单词(英国国防工业目录和高盛)以及公司名称的第一个单词都是大写字母(British )时,这是不匹配的。
发布于 2022-08-24 16:48:03
我想你需要在“公司”和“销售”之后或者“与”之后的名字。这就行了。
import re
s='''
Company British Defence Industry Directory BDEC sells stuff.
Company BDEC Limited BDEC sells stuff.
The company BDEC Limited BDEC sells stuff.
The company BDEC BDEC sells stuff.
The tech company Apple AAPL sells stuff.
The payments company Visa V sells stuff.
Customers are not happy with Goldman Sachs GS.
'''
pattern=r'(?i)company(.*?)sells|with(.*?)\.'
print(["".join(x) for x in re.findall(pattern,s)])输出:
[' British Defence Industry Directory BDEC ', ' BDEC Limited BDEC ', ' BDEC Limited BDEC ', ' BDEC BDEC ', ' Apple AAPL ', ' Visa V ', ' Goldman Sachs GS']发布于 2022-08-24 18:39:43
text = '''Company British Defence Industry Directory BDEC sells stuff.
Company BDEC Limited BDEC sells stuff.
The company BDEC Limited BDEC sells stuff.
The company BDEC BDEC sells stuff.
The tech company Apple AAPL sells stuff.
The payments company Visa V sells stuff.
Customers are not happy with Goldman Sachs GS.
'''首先删除句子开头的大写词,然后删除非大写词。
for l in text.splitlines():
print([w for w in re.sub(r'^\w+',r'', l).split() if w[0].isupper()])
['British', 'Defence', 'Industry', 'Directory', 'BDEC']
['BDEC', 'Limited', 'BDEC']
['BDEC', 'Limited', 'BDEC']
['BDEC', 'BDEC']
['Apple', 'AAPL']
['Visa', 'V']
['Goldman', 'Sachs', 'GS.']发布于 2022-08-24 19:44:29
也许捕获第一个大写字符就足够了,在可选地匹配以大写之间的大写开头的单词之后,确保第一个捕获的大写字符是最后部分中的第一个字符,其中只包含大写字符。
\b([A-Z])\w*(?:\s[A-Z]\w*)*\s\1[A-Z]*\bimport re
pattern = r"\b([A-Z])\w*(?:\s[A-Z]\w*)*\s\1[A-Z]*\b"
s = ("Company British Defence Industry Directory BDEC sells stuff.\n"
"Company BDEC Limited BDEC sells stuff.\n"
"The company BDEC Limited BDEC sells stuff.\n"
"The company BDEC BDEC sells stuff.\n"
"The tech company Apple AAPL sells stuff.\n"
"The payments company Visa V sells stuff.\n"
"Customers are not happy with Goldman Sachs GS.\n\n")
matches = re.finditer(pattern, s)
for _, m in enumerate(matches, start=1):
print(m.group(0))输出
British Defence Industry Directory BDEC
BDEC Limited BDEC
BDEC Limited BDEC
BDEC BDEC
Apple AAPL
Visa V
Goldman Sachs GShttps://stackoverflow.com/questions/73476301
复制相似问题