首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >只识别句子中的某些单词而只识别前两个单词的Regex

只识别句子中的某些单词而只识别前两个单词的Regex
EN

Stack Overflow用户
提问于 2022-07-08 07:15:06
回答 1查看 41关注 0票数 1

我有个问题。我想使用regex来识别文本中的某些文本模块。例如,beach vibe some。问题是有些文本模块有三个字长(甚至更长)。然而,大多数人只使用前两个词,也许是第二个单词的缩写。

如果只识别前两个单词,那么是否有选择地说正则表达式应该命中?它应该只看第二个单词的前三个字母?

代码语言:javascript
复制
   customerId                          text          element  code
0           1    please use beach vibe some  beach vibe some     0
1           1     you should use beach vibe  beach vibe some     0
2           1           right use beach vib  beach vibe some     0
3           3              use floating pow   floating power     1
4           3  use floating stuff right now   floating stuff     2
代码语言:javascript
复制
import pandas as pd
import copy
import re
d = {
    "customerId": [1, 1, 1, 3, 3],
    "text": ["please use beach vibe some",
             "you should use beach vibe",
             "right use beach vib",
             'use floating pow',
             'use floating stuff right now'],
     "element": ['beach vibe some', 'beach vibe some', 'beach vibe some', 'floating power', 'floating stuff']
}
df = pd.DataFrame(data=d)
df['code'] = df['element'].astype('category').cat.codes
print(df)

def f(x):
    match = 999
    for element in df['element'].unique():
        check = bool(re.search(element, x['text'], re.IGNORECASE))
        if(check):
            #print(forwarder)
            match = df['code'].loc[df['element']== element].iloc[0]
            break
        elif(re.search(' '.join(element.split()[:2]), x['text'], re.IGNORECASE)):
            match = df['code'].loc[df['element']== element].iloc[0]
            break
        else:
          s = element.split()
          s[1] = s[1][:3]
          string = ' '.join(s[:2])
          if(bool(re.search(string, x['text'], re.IGNORECASE))):
            match = df['code'].loc[df['element']== element].iloc[0]
            break

    x['test'] = match
    return x
    #print(match)
df['test'] = None
df = df.apply(lambda x: f(x), axis = 1)
print(df)
代码语言:javascript
复制
   customerId                          text          element  code  test
0           1    please use beach vibe some  beach vibe some     0     0
1           1     you should use beach vibe  beach vibe some     0     0
2           1           right use beach vib  beach vibe some     0     0
3           3              use floating pow   floating power     1     1
4           3  use floating stuff right now   floating stuff     2     2
EN

回答 1

Stack Overflow用户

发布于 2022-07-08 07:22:12

为什么要使用RegEx呢?

代码语言:javascript
复制
element_parts = element.lower().split()
lookup_key = element_parts[0] + " " + element_parts[1][:3] 
if lookup_key in x["text"].lower():
    # here we go ...
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72907945

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档