首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在数字和单词之间提取文本

在数字和单词之间提取文本
EN

Stack Overflow用户
提问于 2018-09-04 07:02:06
回答 1查看 67关注 0票数 1

我有一个文件,内容如下:

代码语言:javascript
复制
01009700  Samsung  Samsung SGH-N625  GSM 1900,GSM 900  
01009800  Motorola  Motorola T194 EOTD  GSM 1900  

01009900  Option International  
,GSM 900  
01009901  Option International  

,GSM 1900,GSM 900 01009902 Option International ,GSM 1900,GSM 900 01009903 Option International ,GSM 1900,GSM 900 01009904 Option International ,GSM 1900,GSM 900 01009905 Option International ,GSM 1900,GSM 900 01009906 Option International ,GSM 1900,GSM 900 01009907 Option International ,GSM 1900,GSM 900 01009908 Option International ,GSM 1900,GSM 900 01009909 Option International ,GSM 1900,GSM 900 01009910 Option International ,GSM 1900,GSM 900 01009911 Option International ,GSM 1900,GSM 900 01009912 Option International ,GSM 1900,GSM 900 01009913 Option International ,GSM 1900,GSM 900 01009914 Option International ,GSM 1900,GSM 900 01009915 Option International ,GSM 1900,GSM 900 01009916 Option International ,GSM 1900,GSM 900 01009917 Option International ,GSM 1900,GSM 900 01009918 Option International ,GSM 1900,GSM 900 01009919 Option International ,GSM 1900,GSM 900 
Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 
01010000  Sierra Wireless Sierra Wireless Aircard 710  GSM 1900  
01010100  Sierra Wireless Sierra Wireless Aircard 750  GSM 1800,GSM 190  
0,GSM 900 

使用正则表达式,我正在尝试从8位数字中提取任何内容,在第一次GSM发生之前,例如:

代码语言:javascript
复制
01009700  Samsung  Samsung SGH-N625
01009800  Motorola  Motorola T194 EOTD
01009900  Option International
01009902  Option International
01009919  Option International
01010000  Sierra Wireless Sierra Wireless Aircard
01010100  Sierra Wireless Sierra Wireless Aircard

我试过\d{8}.+(GSM)?,但它似乎不起作用。

什么才是正确的准则?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-09-04 07:05:22

你可以用

代码语言:javascript
复制
re.findall(r'\b(\d{8}.*?)\W*GSM', s)

regex演示

详细信息

  • \b -字界(
  • (\d{8}.*?) -第1组:8位数,然后除换行符外,任何0+字符都尽可能少。
  • \W* -任何0+非字字符
  • GSM -一个GSM子字符串。

Python演示

代码语言:javascript
复制
import re
s="""01009700  Samsung  Samsung SGH-N625  GSM 1900,GSM 900  
01009800  Motorola  Motorola T194 EOTD  GSM 1900  

01009900  Option International  
,GSM 900  
01009901  Option International  

,GSM 1900,GSM 900 01009902 Option International ,GSM 1900,GSM 900 01009903 Option International ,GSM 1900,GSM 900 01009904 Option International ,GSM 1900,GSM 900 01009905 Option International ,GSM 1900,GSM 900 01009906 Option International ,GSM 1900,GSM 900 01009907 Option International ,GSM 1900,GSM 900 01009908 Option International ,GSM 1900,GSM 900 01009909 Option International ,GSM 1900,GSM 900 01009910 Option International ,GSM 1900,GSM 900 01009911 Option International ,GSM 1900,GSM 900 01009912 Option International ,GSM 1900,GSM 900 01009913 Option International ,GSM 1900,GSM 900 01009914 Option International ,GSM 1900,GSM 900 01009915 Option International ,GSM 1900,GSM 900 01009916 Option International ,GSM 1900,GSM 900 01009917 Option International ,GSM 1900,GSM 900 01009918 Option International ,GSM 1900,GSM 900 01009919 Option International ,GSM 1900,GSM 900 
Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 
01010000  Sierra Wireless Sierra Wireless Aircard 710  GSM 1900  
01010100  Sierra Wireless Sierra Wireless Aircard 750  GSM 1800,GSM 190  
0,GSM 900 """
print(re.findall(r"\b(\d{8}.*?)\W*GSM", s))

输出:

代码语言:javascript
复制
['01009700  Samsung  Samsung SGH-N625', '01009800  Motorola  Motorola T194 EOTD', '01009900  Option International', '01009901  Option International', '01009902 Option International', '01009903 Option International', '01009904 Option International', '01009905 Option International', '01009906 Option International', '01009907 Option International', '01009908 Option International', '01009909 Option International', '01009910 Option International', '01009911 Option International', '01009912 Option International', '01009913 Option International', '01009914 Option International', '01009915 Option International', '01009916 Option International', '01009917 Option International', '01009918 Option International', '01009919 Option International', '01010000  Sierra Wireless Sierra Wireless Aircard 710', '01010100  Sierra Wireless Sierra Wireless Aircard 750']
票数 4
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/52160568

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档