文章/答案/技术大牛

发布

社区首页 >问答首页 >如何从字符串中找到子字符串列表的位置？

问如何从字符串中找到子字符串列表的位置？
EN

Stack Overflow用户

提问于 2017-05-04 04:34:35

回答 3查看 1.3K关注 0票数 5

如何从字符串中找到子字符串列表的位置？

给出一个字符串：

这架飞往圣彼得堡的飞机在周六从沙姆沙伊赫起飞23分钟后在埃及西奈沙漠坠毁。

和一个子字符串列表：

“‘The”、“’The”、“plane”、“in”、“bound”、“for”、“St”、“Petersburg”、“‘’、‘崩溃’、'in‘、”埃及“、”s“、”Sinai“、”荒漠“、'just’、'23‘、'minutes’、'after‘、’起飞‘、'from’、‘起飞’、‘from’、‘沙姆沙姆’、'el-Sheikh‘、'on’、‘周六’、‘’。

期望产出：

>>> s = "The plane, bound for St Petersburg, crashed in Egypt's Sinai desert just 23 minutes after take-off from Sharm el-Sheikh on Saturday."
>>> tokens = ['The', 'plane', ',', 'bound', 'for', 'St', 'Petersburg', ',', 'crashed', 'in', 'Egypt', "'s", 'Sinai', 'desert', 'just', '23', 'minutes', 'after', 'take-off', 'from', 'Sharm', 'el-Sheikh', 'on', 'Saturday', '.']
>>> find_offsets(tokens, s)
[(0, 3), (4, 9), (9, 10), (11, 16), (17, 20), (21, 23), (24, 34),
        (34, 35), (36, 43), (44, 46), (47, 52), (52, 54), (55, 60), (61, 67),
        (68, 72), (73, 75), (76, 83), (84, 89), (90, 98), (99, 103), (104, 109),
        (110, 119), (120, 122), (123, 131), (131, 132)]

对输出的解释，第一个子字符串" the“可以使用(start, end)索引通过使用字符串s找到。因此，从期望的输出。

因此，如果我们从期望的输出中循环所有整数元组，我们将得到子字符串的列表，即

>>> [s[start:end] for start, end in out]
['The', 'plane', ',', 'bound', 'for', 'St', 'Petersburg', ',', 'crashed', 'in', 'Egypt', "'s", 'Sinai', 'desert', 'just', '23', 'minutes', 'after', 'take-off', 'from', 'Sharm', 'el-Sheikh', 'on', 'Saturday', '.']

我试过：

def find_offset(tokens, s):
    index = 0
    offsets = []
    for token in tokens:
        start = s[index:].index(token) + index
        index = start + len(token)
        offsets.append((start, index))
    return offsets

是否有另一种方法从字符串中查找子字符串列表的位置？

indexing

substring

offset

python

string

回答 3

Stack Overflow用户

回答已采纳

发布于 2017-05-04 04:52:08

如果我们不知道子串，除了重新扫描每一个子字符串之外，没有其他的方法。

如果从数据中可以看出，这些是按文本顺序给出的文本的顺序片段，那么在每次匹配之后，很容易只扫描其余的文本。不过，没有必要每次都删掉这段文字。

def spans(text, fragments):
    result = []
    point = 0  # Where we're in the text.
    for fragment in fragments:
        found_start = text.index(fragment, point)
        found_end = found_start + len(fragment)
        result.append((found_start, found_end))
        point = found_end
    return result

测试：

>>> spans('foo in bar', ['foo', 'in', 'bar'])
[(0, 3), (4, 6), (7, 10)]

这假设文本中的每个片段都在正确的位置。您的输出格式没有提供不匹配报告的示例。使用.find而不是.index可以对此有所帮助，尽管这只是部分的帮助。

票数 1

Stack Overflow用户

发布于 2017-05-04 04:46:55

第一解决方案：

#use list comprehension and list.index function.
[tuple((s.index(e),s.index(e)+len(e))) for e in t]

第二个解决方案以纠正第一个解决方案中的问题：

def find_offsets(tokens, s):
    tid = [list(e) for e in tokens]
    i = 0
    for id_token,token in enumerate(tid):
        while (token[0]!=s[i]):            
            i+=1
        tid[id_token] = tuple((i,i+len(token)))
        i+=len(token)

    return tid


find_offsets(tokens, s)
Out[201]: 
[(0, 3),
 (4, 9),
 (9, 10),
 (11, 16),
 (17, 20),
 (21, 23),
 (24, 34),
 (34, 35),
 (36, 43),
 (44, 46),
 (47, 52),
 (52, 54),
 (55, 60),
 (61, 67),
 (68, 72),
 (73, 75),
 (76, 83),
 (84, 89),
 (90, 98),
 (99, 103),
 (104, 109),
 (110, 119),
 (120, 122),
 (123, 131),
 (131, 132)]   

#another test
s = 'The plane, plane'
t = ['The', 'plane', ',', 'plane']
find_offsets(t,s)
Out[212]: [(0, 3), (4, 9), (9, 10), (11, 16)]

票数 5

Stack Overflow用户

发布于 2017-05-04 17:56:43

import re

s = "The plane, bound for St Petersburg, crashed in Egypt's Sinai desert just 23 minutes after take-off from Sharm el-Sheikh on Saturday."
tokens = ['The', 'plane', ',', 'bound', 'for', 'St', 'Petersburg', ',', 'crashed', 'in', 'Egypt', "'s", 'Sinai', 'desert', 'just', '23', 'minutes', 'after', 'take-off', 'from', 'Sharm', 'el-Sheikh', 'on', 'Saturday', '.']


for token in tokens:
  pattern = re.compile(re.escape(token))
  print(pattern.search(s).span())

结果

(0, 3)
(4, 9)
(9, 10)
(11, 16)
(17, 20)
(21, 23)
(24, 34)
(9, 10)
(36, 43)
(44, 46)
(47, 52)
(52, 54)
(55, 60)
(61, 67)
(68, 72)
(73, 75)
(76, 83)
(84, 89)
(90, 98)
(99, 103)
(104, 109)
(110, 119)
(120, 122)
(123, 131)
(131, 132)

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43773962

复制

相似问题

问如何从字符串中找到子字符串列表的位置？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何从字符串中找到子字符串列表的位置？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何从字符串中找到子字符串列表的位置？
EN