我有以下字符串来提取卷(只匹配毫升,而不是毫克/毫升)
test = [
"10ML", # 10
"10 ML", # 10
"10.5ML", # 10.5
"1MG/1ML", # [] not match
"1MG/10ML", # [] not match
"10MG/0.5ML", # [] not match
" 10ML and 15ML ", # 10, 15
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", # []
"NSS.0.9% 1000 ML (PLASTIC BAG)", # 1000
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML", # 10
]这是我目前的模式和结果。
pattern = re.compile("(?<!\/)([0-9]*[.]*[0-9]+)\s*ML(?![\/A-z])")
for i, s in enumerate(test):
print(test[i], '>>' , pattern.findall(s))
10ML >> ['10']
10 ML >> ['10']
10.5ML >> ['10.5']
1MG/1ML >> []
1MG/10ML >> ['0'] # Wrong []
10MG/0.5ML >> ['.5'] # Wrong []
10ML and 15ML >> ['10', '15']
LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION >> []
NSS.0.9% 1000 ML (PLASTIC BAG) >> ['1000']
110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML >> ['10', '30'] # Wrong ['10']正如您所看到的,我从["1MG/10ML", "10MG/0.5ML", "110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML"]中得到了错误的结果。应该是[[], [], ['10']。
我已经试过修正我的模式,但还是找不出答案。请帮我改正一下我的图案。谢谢!
发布于 2021-05-17 13:03:53
您可以使用
(?<![/\d])(?<!\d[.-])(\d+(?:\.\d+)?)\s*ML\b(?!/)详细信息
(?<![/\d]) -当前位置左侧不允许使用/或数字(?<!\d[.-]) -当前位置左侧不允许数字+ .或-(\d+(?:\.\d+)?) -第1组:一个或多个数字,一个.和一个或多个数字的可选序列\s* -零或更多空格字符ML\b - ML作为一个整体(?!/) -不允许在当前位置的右侧立即使用/。import re
pattern = re.compile(r'(?<![/\d])(?<!\d[.-])(\d+(?:\.\d+)?)\s*ML\b(?!/)', re.A)
test = ["10ML", "10 ML", "10.5ML", "1MG/1ML", "1MG/10ML", "10MG/0.5ML", " 10ML and 15ML ",
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", "NSS.0.9% 1000 ML (PLASTIC BAG)",
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML"]
for i, s in enumerate(test):
print(test[i], '>>' , pattern.findall(s))输出:
10ML >> ['10']
10 ML >> ['10']
10.5ML >> ['10.5']
1MG/1ML >> []
1MG/10ML >> []
10MG/0.5ML >> []
10ML and 15ML >> ['10', '15']
LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION >> []
NSS.0.9% 1000 ML (PLASTIC BAG) >> ['1000']
110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML >> ['10']发布于 2021-05-17 13:05:30
有关以下正则表达式组件的详细信息,请参阅这个RegExr链接。
import re
test = [
"10ML", # 10
"10 ML", # 10
"10.5ML", # 10.5
"1MG/1ML", # [] not match
"1MG/10ML", # [] not match
"10MG/0.5ML", # [] not match
" 10ML and 15ML ", # 10, 15
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", # []
"NSS.0.9% 1000 ML (PLASTIC BAG)", # 1000
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML", # 10
]
for s in test:
re.findall(r'(?<![\-\/])(\d+(?:\.?\d+)) *ML\b', s)输出
['10']
['10']
['10.5']
[]
[]
[]
['10', '15']
[]
['1000']
['10']发布于 2021-05-17 14:10:14
另一个也许更容易读懂:
(?<![/\d-])(\d+\.*\d+)\s*ML\bhttps://stackoverflow.com/questions/67570152
复制相似问题