首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python :从字符串中提取卷(mL)

Python :从字符串中提取卷(mL)
EN

Stack Overflow用户
提问于 2021-05-17 12:57:46
回答 3查看 483关注 0票数 2

我有以下字符串来提取卷(只匹配毫升,而不是毫克/毫升)

代码语言:javascript
复制
test = [
"10ML", # 10
"10 ML", # 10
"10.5ML", # 10.5
"1MG/1ML", # [] not match
"1MG/10ML", # [] not match
"10MG/0.5ML", # [] not match
"   10ML and 15ML  ", # 10, 15
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", # []
"NSS.0.9% 1000 ML (PLASTIC BAG)", # 1000
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML", # 10
]

这是我目前的模式和结果。

代码语言:javascript
复制
pattern = re.compile("(?<!\/)([0-9]*[.]*[0-9]+)\s*ML(?![\/A-z])")

for i, s in enumerate(test):
    print(test[i], '>>' , pattern.findall(s))

10ML >> ['10']
10 ML >> ['10']
10.5ML >> ['10.5']
1MG/1ML >> []
1MG/10ML >> ['0'] # Wrong []
10MG/0.5ML >> ['.5'] # Wrong []
   10ML and 15ML   >> ['10', '15']
LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION >> []
NSS.0.9% 1000 ML (PLASTIC BAG) >> ['1000']
110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML >> ['10', '30'] # Wrong ['10']

正如您所看到的,我从["1MG/10ML", "10MG/0.5ML", "110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML"]中得到了错误的结果。应该是[[], [], ['10']

我已经试过修正我的模式,但还是找不出答案。请帮我改正一下我的图案。谢谢!

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2021-05-17 13:03:53

您可以使用

代码语言:javascript
复制
(?<![/\d])(?<!\d[.-])(\d+(?:\.\d+)?)\s*ML\b(?!/)

Python regex演示

详细信息

  • (?<![/\d]) -当前位置左侧不允许使用/或数字
  • (?<!\d[.-]) -当前位置左侧不允许数字+ .-
  • (\d+(?:\.\d+)?) -第1组:一个或多个数字,一个.和一个或多个数字的可选序列
  • \s* -零或更多空格字符
  • ML\b - ML作为一个整体
  • (?!/) -不允许在当前位置的右侧立即使用/

Python演示

代码语言:javascript
复制
import re
pattern = re.compile(r'(?<![/\d])(?<!\d[.-])(\d+(?:\.\d+)?)\s*ML\b(?!/)', re.A)
test = ["10ML", "10 ML", "10.5ML", "1MG/1ML", "1MG/10ML", "10MG/0.5ML", "   10ML and 15ML  ",
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", "NSS.0.9% 1000 ML (PLASTIC BAG)", 
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML"]
for i, s in enumerate(test):
    print(test[i], '>>' , pattern.findall(s))

输出:

代码语言:javascript
复制
10ML >> ['10']
10 ML >> ['10']
10.5ML >> ['10.5']
1MG/1ML >> []
1MG/10ML >> []
10MG/0.5ML >> []
   10ML and 15ML   >> ['10', '15']
LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION >> []
NSS.0.9% 1000 ML (PLASTIC BAG) >> ['1000']
110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML >> ['10']
票数 2
EN

Stack Overflow用户

发布于 2021-05-17 13:05:30

有关以下正则表达式组件的详细信息,请参阅这个RegExr链接。

代码语言:javascript
复制
import re

test = [
    "10ML", # 10
    "10 ML", # 10
    "10.5ML", # 10.5
    "1MG/1ML", # [] not match
    "1MG/10ML", # [] not match
    "10MG/0.5ML", # [] not match
    "   10ML and 15ML  ", # 10, 15
    "LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", # []
    "NSS.0.9% 1000 ML (PLASTIC BAG)", # 1000
    "110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML", # 10
]

for s in test:
    re.findall(r'(?<![\-\/])(\d+(?:\.?\d+)) *ML\b', s)

输出

代码语言:javascript
复制
['10']
['10']
['10.5']
[]
[]
[]
['10', '15']
[]
['1000']
['10']
票数 3
EN

Stack Overflow用户

发布于 2021-05-17 14:10:14

另一个也许更容易读懂:

代码语言:javascript
复制
(?<![/\d-])(\d+\.*\d+)\s*ML\b
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67570152

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档