首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >正则短语,正数或负数/带单位的十进制

正则短语,正数或负数/带单位的十进制
EN

Stack Overflow用户
提问于 2021-11-01 21:44:48
回答 1查看 83关注 0票数 1

我正试图得到一个字符串中的数字。然而,这个数字在句子中必须遵循一个非常具体的模式。我们可以假装我在寻找包裹的最大重量。例如,The 3rd box marked 4 cannot exceed 7.34kg but if its exact 8.42kg its okay too。在这种情况下,我想要匹配数字7.34。我的英语模式是:

<Starts with phrase which is in ('must not exceed', 'cannot exceed', 'limited by')><Can be any characters of any length (0 length to infinite) as long as its not any alphabet characters><a positive or negative int or decimal which may or may not be comma separated (ie. 1,032.43kg)><Can be any characters of any length (0 length to infinite) as long as its not any alphabet characters><ends with the characters which are in ('kg', 'k.g', 'k/g')

我拥有的是:

代码语言:javascript
复制
(must not exceed|cannot exceed|limited by).*?[0-9]+ ?(kg|k\.g|k\/g)

但是,我不能做的主要事情是能够匹配<Can be any characters of any length (0 length to infinite) as long as its not any alphabet characters>和`

一些例子

代码语言:javascript
复制
The 3 boxes with red on them must not exceed -23.4435kg and don't pick them up.
Parcels that can be sent are limited by 1,402kg and its okay to send
The 2 boxes on the shelf must not exceed: 
102 kg
Do not pick up 18 boxes at a time and make sure they cannot exceed,: 92302 k.g
Do not pick up boxes that weight 56.23 kg
Boxes cannot exceed -23 k/g

我知道我可能需要双倍匹配。因此,我首先对句子进行匹配(即must not exceed -23.4435kg然后regex匹配数字,这就是我目前在代码中所做的。)我的问题本质上是如何与字符串的正确部分匹配。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-11-01 21:55:19

我建议用

代码语言:javascript
复制
\b((?:must\s+|can)not\s+exceed|limited\s+by)\W*?(-?\d+(?:,\d+)*(?:\.\d+)?)\s*(kg|k\.g|k/g)\b

regex演示。详细信息

  • \b -一个单词边界
  • ((?:must\s+|can)not\s+exceed|limited\s+by) -第1组:must not exceedcannot exceedlimited by,在单词之间有空格
  • \W*? -任何零或更多,但尽可能少,非字字符
  • (-?\d+(?:,\d+)*(?:\.\d+)?) -第2组:数字模式,一个可选的-,然后是一个或多个数字,然后是零或多个,序列和一个或多个数字,然后是可选的.和一个或多个数字。
  • \s* -零或多个空白空间
  • (kg|k\.g|k/g) -第3组:kgk.g还是k/g
  • \b -一个单词边界

Python演示

代码语言:javascript
复制
import re
texts = ['The 3rd box marked 4 cannot exceed 7.34kg but if its exact 8.42kg its okay too',
    'The 3 boxes with red on them must not exceed -23.4435kg and don\'t pick them up.',
    'Parcels that can be sent are limited by 1,402kg and its okay to send',
    'The 2 boxes on the shelf must not exceed: \n102 kg',
    'Do not pick up 18 boxes at a time and make sure they cannot exceed,: 92302 k.g',
    'Do not pick up boxes that weight 56.23 kg',
    'Boxes cannot exceed -23 k/g',
    'must not exceed -23.4435kg']

rx = re.compile(r'\b((?:must\s+|can)not\s+exceed|limited\s+by)\W*?(-?\d+(?:,\d+)*(?:\.\d+)?)\s*(kg|k\.g|k/g)\b')
for text in texts:
    print("----", text,"----")
    m = rx.search(text)
    if m:
        print(f"Phrase: {m.group(1)}")
        print(f"Number: {m.group(2)}")
        print(f"UOM: {m.group(3)}")
    else:
        print("Not matched!")

输出:

代码语言:javascript
复制
---- The 3rd box marked 4 cannot exceed 7.34kg but if its exact 8.42kg its okay too ----
Phrase: cannot exceed
Number: 7.34
UOM: kg
---- The 3 boxes with red on them must not exceed -23.4435kg and don't pick them up. ----
Phrase: must not exceed
Number: -23.4435
UOM: kg
---- Parcels that can be sent are limited by 1,402kg and its okay to send ----
Phrase: limited by
Number: 1,402
UOM: kg
---- The 2 boxes on the shelf must not exceed: 
102 kg ----
Phrase: must not exceed
Number: 102
UOM: kg
---- Do not pick up 18 boxes at a time and make sure they cannot exceed,: 92302 k.g ----
Phrase: cannot exceed
Number: 92302
UOM: k.g
---- Do not pick up boxes that weight 56.23 kg ----
Not matched!
---- Boxes cannot exceed -23 k/g ----
Phrase: cannot exceed
Number: -23
UOM: k/g
---- must not exceed -23.4435kg ----
Phrase: must not exceed
Number: -23.4435
UOM: kg

在Oracle中,您需要丢弃单词边界,并将所有非捕获组替换为捕获组:

代码语言:javascript
复制
 REGEXP_SUBSTR(
   col,
   '((must\s+|can)not\s+exceed|limited\s+by)\W*?(-?\d+(,\d+)*(\.\d+)?)\s*(kg|k\.g|k/g)',
   1,1, NULL, 3)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69803107

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档