我正试图得到一个字符串中的数字。然而,这个数字在句子中必须遵循一个非常具体的模式。我们可以假装我在寻找包裹的最大重量。例如,The 3rd box marked 4 cannot exceed 7.34kg but if its exact 8.42kg its okay too。在这种情况下,我想要匹配数字7.34。我的英语模式是:
<Starts with phrase which is in ('must not exceed', 'cannot exceed', 'limited by')><Can be any characters of any length (0 length to infinite) as long as its not any alphabet characters><a positive or negative int or decimal which may or may not be comma separated (ie. 1,032.43kg)><Can be any characters of any length (0 length to infinite) as long as its not any alphabet characters><ends with the characters which are in ('kg', 'k.g', 'k/g')
我拥有的是:
(must not exceed|cannot exceed|limited by).*?[0-9]+ ?(kg|k\.g|k\/g)但是,我不能做的主要事情是能够匹配<Can be any characters of any length (0 length to infinite) as long as its not any alphabet characters>和`
一些例子
The 3 boxes with red on them must not exceed -23.4435kg and don't pick them up.
Parcels that can be sent are limited by 1,402kg and its okay to send
The 2 boxes on the shelf must not exceed:
102 kg
Do not pick up 18 boxes at a time and make sure they cannot exceed,: 92302 k.g
Do not pick up boxes that weight 56.23 kg
Boxes cannot exceed -23 k/g我知道我可能需要双倍匹配。因此,我首先对句子进行匹配(即must not exceed -23.4435kg然后regex匹配数字,这就是我目前在代码中所做的。)我的问题本质上是如何与字符串的正确部分匹配。
发布于 2021-11-01 21:55:19
我建议用
\b((?:must\s+|can)not\s+exceed|limited\s+by)\W*?(-?\d+(?:,\d+)*(?:\.\d+)?)\s*(kg|k\.g|k/g)\b见regex演示。详细信息
\b -一个单词边界((?:must\s+|can)not\s+exceed|limited\s+by) -第1组:must not exceed,cannot exceed或limited by,在单词之间有空格\W*? -任何零或更多,但尽可能少,非字字符(-?\d+(?:,\d+)*(?:\.\d+)?) -第2组:数字模式,一个可选的-,然后是一个或多个数字,然后是零或多个,序列和一个或多个数字,然后是可选的.和一个或多个数字。\s* -零或多个空白空间(kg|k\.g|k/g) -第3组:kg,k.g还是k/g\b -一个单词边界import re
texts = ['The 3rd box marked 4 cannot exceed 7.34kg but if its exact 8.42kg its okay too',
'The 3 boxes with red on them must not exceed -23.4435kg and don\'t pick them up.',
'Parcels that can be sent are limited by 1,402kg and its okay to send',
'The 2 boxes on the shelf must not exceed: \n102 kg',
'Do not pick up 18 boxes at a time and make sure they cannot exceed,: 92302 k.g',
'Do not pick up boxes that weight 56.23 kg',
'Boxes cannot exceed -23 k/g',
'must not exceed -23.4435kg']
rx = re.compile(r'\b((?:must\s+|can)not\s+exceed|limited\s+by)\W*?(-?\d+(?:,\d+)*(?:\.\d+)?)\s*(kg|k\.g|k/g)\b')
for text in texts:
print("----", text,"----")
m = rx.search(text)
if m:
print(f"Phrase: {m.group(1)}")
print(f"Number: {m.group(2)}")
print(f"UOM: {m.group(3)}")
else:
print("Not matched!")输出:
---- The 3rd box marked 4 cannot exceed 7.34kg but if its exact 8.42kg its okay too ----
Phrase: cannot exceed
Number: 7.34
UOM: kg
---- The 3 boxes with red on them must not exceed -23.4435kg and don't pick them up. ----
Phrase: must not exceed
Number: -23.4435
UOM: kg
---- Parcels that can be sent are limited by 1,402kg and its okay to send ----
Phrase: limited by
Number: 1,402
UOM: kg
---- The 2 boxes on the shelf must not exceed:
102 kg ----
Phrase: must not exceed
Number: 102
UOM: kg
---- Do not pick up 18 boxes at a time and make sure they cannot exceed,: 92302 k.g ----
Phrase: cannot exceed
Number: 92302
UOM: k.g
---- Do not pick up boxes that weight 56.23 kg ----
Not matched!
---- Boxes cannot exceed -23 k/g ----
Phrase: cannot exceed
Number: -23
UOM: k/g
---- must not exceed -23.4435kg ----
Phrase: must not exceed
Number: -23.4435
UOM: kg在Oracle中,您需要丢弃单词边界,并将所有非捕获组替换为捕获组:
REGEXP_SUBSTR(
col,
'((must\s+|can)not\s+exceed|limited\s+by)\W*?(-?\d+(,\d+)*(\.\d+)?)\s*(kg|k\.g|k/g)',
1,1, NULL, 3)https://stackoverflow.com/questions/69803107
复制相似问题