首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用Python中唯一的ID替换字符串的所有出现(给定它们的起始和结束索引)

用Python中唯一的ID替换字符串的所有出现(给定它们的起始和结束索引)
EN

Stack Overflow用户
提问于 2017-06-05 13:46:00
回答 1查看 67关注 0票数 0

我正在处理一个必须预处理的数据集。我想用唯一的ID替换所有事件(通过开始和结束索引)。

给出了如下的一系列文本:

s = "The hypotensive effect of 100 mg/kg alpha-methyldopa was also partially reversed by naloxone. Naloxone alone did not affect either blood pressure or heart rate. In brain membranes from spontaneously hypertensive rats clonidine, 10(-8) to 10(-5) M, did not influence stereoselective binding of [3H]-naloxone (8 nM), and naloxone, 10(-8) to 10(-4) M, did not influence naloxone-suppressible binding of [3H]-dihydroergocryptine (1 nM)."

还有一份字典清单,如:

[

‘D 006973’:[{‘长度’:'12',‘偏移’:'199',‘文本’:‘高血压’,‘类型’:‘疾病’},

‘D 008750’:[{‘长度’:'16',‘偏移’:'36',‘文本’:‘α-甲基多巴’,‘类型’:‘化学’},

‘D 007022’:[{‘长度’:'11',‘偏移’:'4',‘文本’:‘低血压’,‘类型’:‘疾病’},

‘D 009270’:[{‘长度’:'8',‘偏移’:'84',‘文本’:‘纳洛酮’,‘类型:’化学‘},{’长度‘:'8',’偏移‘:'94',’文本‘:’纳洛酮‘,’类型‘:’化学‘},{“长度”:“13”,“偏移”:'293',‘文本’:[“3H-纳洛酮”],‘类型’:‘化学’}]

]

我希望用它们各自的ID替换由偏移给出的所有事件。因此,对于最后一个字典,我希望将列表中的所有值替换为‘D 009270’。

例1:对于第一本字典,用键‘D 006973’,我想用‘D 006973’代替‘高血压’,它位于索引199,长度为12。

示例2:对于上一个关键字为‘D 009270’的字典,我希望替换索引中的子字符串(用元组表示)。

代码语言:javascript
复制
[(84, 92), (94, 102), (293, 306)]
  1. 在最后一句中,"naloxone-suppressible",存在纳洛酮,但我不想替换它。所以我不能简单地使用str.replace()
  2. 我用唯一的ID代替了字符串从开始索引到结束索引(例如: 199到211表示“高血压”),但这会干扰其他“尚未被替换”的实体的抵消。当要替换的文本(“D006973”)小于字符串(“高血压”)时,我可以使用填充。但是,当要重新调整的文本的大小更大时,就会失败。
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-06-05 17:31:00

您可以使用具有占位符字符的字符串格式:

代码语言:javascript
复制
from operator import itemgetter

s = "The hypotensive effect of 100 mg/kg alpha-methyldopa was also partially reversed by naloxone. Naloxone alone did not affect either blood pressure or heart rate. In brain membranes from spontaneously hypertensive rats clonidine, 10(-8) to 10(-5) M, did not influence stereoselective binding of [3H]-naloxone (8 nM), and naloxone, 10(-8) to 10(-4) M, did not influence naloxone-suppressible binding of [3H]-dihydroergocryptine (1 nM)."

dictionary={
'D006973': [{'length': '12', 'offset': '199', 'text': ['hypertensive'], 'type': 'Disease'}],
'D008750': [{'length': '16', 'offset': '36', 'text': ['alpha-methyldopa'], 'type': 'Chemical'}],
'D007022': [{'length': '11', 'offset': '4', 'text': ['hypotensive'], 'type': 'Disease'}],
'D009270': [{'length': '8', 'offset': '84', 'text': ['naloxone'], 'type': 'Chemical'}, {'length': '8', 'offset': '94', 'text': ['Naloxone'], 'type': 'Chemical'}, {'length': '13', 'offset': '293', 'text': ["[3H]-naloxone"], 'type': 'Chemical'}]
}

index_list=[]
for key in dictionary:
    for dic in dictionary[key]:
        o=int(dic['offset'])
        index_tuple=o , o+int(dic['length']),key
        index_list.append(index_tuple)

index_list.sort(key=itemgetter(0))
format_list=[]
lt=list(s)
for i,j in enumerate(index_list):
    si=j[0]
    ei=j[1]
    lt[si:ei]=list("{}") + ["@"]*((ei-si)-2)
    format_list.append(j[2])

text = "".join(lt)
text = text.replace("@","")
text = text.format(*format_list)

结果:'The D007022 effect of 100 mg/kg D008750 was also partially reversed by D009270. D009270 alone did not affect either blood pressure or heart rate. In brain membranes from spontaneously D006973 rats clonidine, 10(-8) to 10(-5) M, did not influence stereoselective binding of D009270 (8 nM), and naloxone, 10(-8) to 10(-4) M, did not influence naloxone-suppressible binding of [3H]-dihydroergocryptine (1 nM).'

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/44370422

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档