文章/答案/技术大牛

发布

社区首页 >问答首页 >我们如何编写正则表达式(regex)来识别单位数量，如"54.20克“？

问我们如何编写正则表达式(regex)来识别单位数量，如"54.20克“？
EN

Stack Overflow用户

提问于 2021-01-09 20:03:57

回答 2查看 388关注 0票数 4

下面是一些测试输入的示例。

测试输入是ASCII编码的字符串.

测试用例输入

arrhar = Array(100) arrhar1 =“低碳Orzo低碳米，高蛋白，大低碳水化合物面包公司，低碳面大米，每包7克”arrhar2 = "Helios认证有机希腊Orzo Pasta，500 g“arrhar3 = "Barilla 15.73盎司”。arrhar4 = "Pasta Granoro Il Primo Orzo每袋6盎司“arrhar5 =”正宗意大利Orzo --每袋6盎司“arrhar6 = "ORZO PASA 4U！1 bag IZ 4.39-GRM”arrhar.trim()

测试用例输出

out1 =“7g”out2 =“500 grm”out3 = "15.73盎司“out4 = "6盎司”out5 = "6盎司“out6 = "4.1-grm”

正则表达式的英文描述

假设我们将字符串匹配模式表示为项目符号列表。

符号(1)描述字符串的最左边部分。

符号(2)描述从左开始的子字符串秒。

项目(3)描述字符串的第三部分。

等等..。

数值量化
1. 零或更多位数(0，1，2，.，9)
2. 零小数点或小数点或逗号
3. 零或更多位数(0，1，2，.，9)
可选划界器
1. 除[A-Z]、[a-z]和\d类中的字符外，任何字符的零或多个字符
单元
1. 克
  1. “克”a. "g“b的任何不区分大小写的子序列。"GRMS“c. "gs”d."Gms“e.等等。

1. Ounces 
    1. Z-ounces ... any case-insensitive substring of `OUNCEZ`
    2. S-ounces ... any case-insensitive substring of `OUNCES`

Regex片

适当的正则表达式--数字数量的左部分(整数部分)可能是：

\d*
\d{0,}
[0-9]{0,}
[0123456789]*

零小数点或小数点的正则表达式是[\.,]?。

十进制数是\d*[\.,]\d

在数字和单元规范之间可能存在，也可能没有分隔符。

56.1gr
56.1 gr
56.1-grams

用于分隔符的合适的regexp可能是[^a-zA-Z0-9]*。

假设我们为数字和分隔符编写了一个正则表达式，而不是单位(例如“盎司”)。我们可能会：

\d*[\.,]?\d[^a-zA-Z0-9]*?

我希望上面的内容与"4.91...."或"4.91 "相匹配。

“克”子序列的一个正则表达式可能是：[Gg]?[Rr]?[Aa]?[Mm]?[Ss]?。

捕获类似"4.1-grm"的正则表达式如下所示：

\d*[\.,]?\d[^a-zA-Z0-9]*?[Gg]?[Rr]?[Aa]?[Mm]?[Ss]?

我们怎么能同时得到克和盎司。

regex

parsing

回答 2

Stack Overflow用户

发布于 2021-01-09 20:07:43

使用?使[Gg]?[Rr]?[Aa]?[Mm]?[Ss]?中的所有部件都是可选的，也可能与RM或空字符串匹配。

您可以使用与交替 |不区分大小写的匹配来列出可能的备选方案，从而使它们更加具体。

\b\d+(?:[.,]\d+)?\s*(?:gr?|oz|ounces?|-grm|grams?)\b

\b A字界
\d+匹配1+数字
(?:[.,]\d+)?可选择匹配.或,和1+数字。
\s*匹配0+空格字符
(?:gr?|oz|ounces?|-grm|grams?)匹配一种替代方案
\b A字界

Regex演示

例如，另一个选项是嵌套非捕获组，使所选的部件选项，但按一定的顺序：

\b\d+(?:[.,]\d+)?\s*-?(?:g(?:r(?:a?ms?)?)?|oz|ounces?)\b

Regex演示

票数 3

Stack Overflow用户

发布于 2021-01-10 22:05:00

使用

/\d[.,\d]*\W*(?:gr?a?m?s?|ou?n?c?e?[zs]?)/i

见证明。

解释

--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  [.,\d]*                  any character of: '.', ',', digits (0-9)
                           (0 or more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \W*                      non-word characters (all but a-z, A-Z, 0-
                           9, _) (0 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    g                        'g'
--------------------------------------------------------------------------------
    r?                       'r' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    a?                       'a' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    m?                       'm' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    s?                       's' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    o                        'o'
--------------------------------------------------------------------------------
    u?                       'u' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    n?                       'n' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    c?                       'c' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    e?                       'e' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    [zs]?                    any character of: 'z', 's' (optional
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of grouping

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65647077

复制

相似问题

问我们如何编写正则表达式(regex)来识别单位数量，如"54.20克“？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我们如何编写正则表达式(regex)来识别单位数量，如"54.20克“？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问我们如何编写正则表达式(regex)来识别单位数量，如"54.20克“？
EN