我已经将文件的内容读入python,并希望去掉所有遵循相同格式的引用:
(Author et al., .............. \nGoogle Scholar) # there could be many '\nGoogle Scholar's within the brackets介绍胰岛内分泌细胞在葡萄糖干扰下分泌胰岛素和胰高血糖素,维持葡萄糖稳态。分泌胰岛素的β细胞表现出形态、功能和分子变异,表明它们可能由具有特殊任务和生理反应的亚群体组成(Gutierrez etal,2007 Gutierrez G.D. Gromada J. Sussel L. cell.Front的异质性)。吉内。2017年;8: 22Crossref\n nPubMed\n nScopus(11)\n谷歌学者,Roscioni etal.,2016 Roscioni S.S. Migliorini A. Gegg M. Lickert H.胰岛结构对-cell异质性、可塑性和function.Nat的影响。内分泌醇牧师2016年;12: 695-709 709Crossref\n nPubMed\n nScopus(36)nGoogle学者。β细胞异质性的特征包括葡萄糖反应性和分泌活性。然而,在胰腺中可视化转录本是不可行的,如果不使用诸如光敏染料等专门技术(Cui etal,2008 Cui Y. Hu D. Markillie L.M. Chrisler W.B. Gaffrey M.J.安松C. Sussel L. Orr G.基于波动定位成像的荧光原位杂交(fliFISH),用于准确检测和计数2018年单一cells.Nucleic酸研究中的cells.Nucleic拷贝;46: e7Crossref\n nPubMed\n nScopus(2)\n谷歌nScopus)。我们优化了标准组织smFISH协议(Lyubimova .,2013 Lyubimova A. Itzkovitz S. Junker J.P. . Fan Z.P. Wu X. van Oudenaarden A.单分子mRNA在哺乳动物tissue.Nat中的检测和计数。普罗托科。2013年;8: 1743-1758Crossref\n nPubMed\n nScopus(62)\n nGoogle学者(通过大幅延长探针杂交步骤之前的mRNA变性时间,从5分钟增加到至少3小时)。
期望输出
介绍胰岛内分泌细胞在葡萄糖干扰下分泌胰岛素和胰高血糖素,维持葡萄糖稳态。分泌胰岛素的β细胞表现出形态、功能和分子变异,表明它们可能由具有特殊任务和生理反应的亚群体组成。β细胞异质性的特征包括葡萄糖反应性和分泌活性。然而,在胰腺中可视化转录本是不可行的,如果不使用专门的技术,如光敏染料。我们通过将探针杂交前的smFISH变性周期从5 5min大幅度增加到至少3小时,优化了标准的组织mRNA协议。
我找不到一个正则表达式,它一次忽略了所有引用,所以我不得不分两部分完成:
我的尝试如下:
def remove(test_str):
regex=re.compile('\\nGoogle Scholar\)')
starts=[]
ends=[]
ret=''
for end in regex.finditer(test_str): #find all 'Google Scholar)'
ends.append(m.end())
for e in ends: #find all starting brackets
i=e
while True:
if bool(re.match('\(\D+',test_str[i-2:i])):
starts.append(i-2)
break
else:
i-=1
start=test_str[:starts[0]] #omit all characters in between
starts=starts[1:]
end=test_str[ends[-1]:]
ends=ends[:-1]
for i,j in zip(starts,ends):
ret=ret+test_str[j:i]
return start+ret+end但是,这个策略失败了,因为我用来查找每个起始括号(\(\D+)的正则表达式不够精确--通常在引用中有封闭括号。
(崔爱塔尔,2018Cui Y. Hu D. Markillie L.M. Chrisler W.B. Gaffrey M.J. Ansong C. Sussel L. Orr G.基于波动定位成像的荧光原位杂交(fliFISH),用于准确检测和计数单个cells.Nucleic酸研究中的cells.Nucleic拷贝。2018年;46: e7Crossref\n nPubMed\n nScopus (2)\nGoogle nScopus)
因此,在这种情况下,搜索正确的开始托架过早停止..。
有人能推荐一个持续删除所有引用的好方法吗?
发布于 2019-01-08 16:41:55
根据您描述的模式,您可以使用这个正则表达式,
(?s)\(.*?Google Scholar\) ?用空字符串代替。在这里,(?s)是为了使.能够匹配新行。
下面是一个python代码演示,
import re
s = 'Introduction The endocrine cells in the pancreatic islets of Langerhans secrete insulin and glucagon in response to glucose perturbations to maintain glucose homeostasis. The insulin-secreting beta cells exhibit morphological, functional, and molecular variations, suggesting that they may consist of sub-populations with specialized tasks and physiological responses (Gutierrez etal., 2017Gutierrez G.D. Gromada J. Sussel L. Heterogeneity of the pancreatic beta cell.Front. Genet. 2017; 8: 22Crossref\nPubMed\nScopus (11)\nGoogle Scholar, Roscioni etal., 2016Roscioni S.S. Migliorini A. Gegg M. Lickert H. Impact of islet architecture on -cell heterogeneity, plasticity and function.Nat. Rev. Endocrinol. 2016; 12: 695-709Crossref\nPubMed\nScopus (36)\nGoogle Scholar). Features of beta cell heterogeneity include glucose responsiveness and secretory activity ..... Visualizing transcripts in the pancreas, however, has been infeasible without the use of specialized techniques such as photoswitchable dyes (Cui etal., 2018Cui Y. Hu D. Markillie L.M. Chrisler W.B. Gaffrey M.J. Ansong C. Sussel L. Orr G. Fluctuation localization imaging-based fluorescence insitu hybridization (fliFISH) for accurate detection and counting of RNA copies in single cells.Nucleic Acids Res. 2018; 46: e7Crossref\nPubMed\nScopus (2)\nGoogle Scholar). We have optimized the standard tissue smFISH protocol (Lyubimova etal., 2013Lyubimova A. Itzkovitz S. Junker J.P. Fan Z.P. Wu X. van Oudenaarden A. Single-molecule mRNA detection and counting in mammalian tissue.Nat. Protoc. 2013; 8: 1743-1758Crossref\nPubMed\nScopus (62)\nGoogle Scholar) by substantially increasing the period of mRNA denaturation, which precedes the probe hybridization steps, from 5min to at least 3hr.'
replacedStr = re.sub(r'(?s)\(.*?Google Scholar\) ?','',s)
print(replacedStr)像你在帖子中提到的那样打印以下内容。
介绍胰岛内分泌细胞在葡萄糖干扰下分泌胰岛素和胰高血糖素,维持葡萄糖稳态。分泌胰岛素的β细胞表现出形态、功能和分子变异,表明它们可能由具有特殊任务和生理反应的亚群体组成。β细胞异质性的特征包括葡萄糖反应性和分泌活性。然而,在胰腺中可视化转录本是不可行的,如果不使用专门的技术,如光敏染料。通过将原杂交前的smFISH变性周期从5 5min大幅度提高到3小时以上,优化了标准的组织mRNA变性工艺。
发布于 2019-01-08 16:51:33
我将以以下方式解决这个问题,它与您想要的内容相匹配,并且可以处理文本中的括号(不是引用):
\([^()]+(?:\([^()]+\))?的重复,它是一个或多个不是括号的字符,后面是可选的一对( ),其中一个或多个字符不是括号。\nGoogle Scholar\)的方法代码:
import re
text = 'Introduction The endocrine cells in the pancreatic islets of Langerhans secrete insulin and glucagon in response to glucose perturbations to maintain glucose homeostasis. The insulin-secreting beta cells exhibit morphological, functional, and molecular variations, suggesting that they may consist of sub-populations with specialized tasks and physiological responses (Gutierrez etal., 2017Gutierrez G.D. Gromada J. Sussel L. Heterogeneity of the pancreatic beta cell.Front. Genet. 2017; 8: 22Crossref\nPubMed\nScopus (11)\nGoogle Scholar, Roscioni etal., 2016Roscioni S.S. Migliorini A. Gegg M. Lickert H. Impact of islet architecture on -cell heterogeneity, plasticity and function.Nat. Rev. Endocrinol. 2016; 12: 695-709Crossref\nPubMed\nScopus (36)\nGoogle Scholar). Features of beta cell heterogeneity include glucose responsiveness and secretory activity ..... Visualizing transcripts in the pancreas, however, has been infeasible without the use of specialized techniques such as photoswitchable dyes (Cui etal., 2018Cui Y. Hu D. Markillie L.M. Chrisler W.B. Gaffrey M.J. Ansong C. Sussel L. Orr G. Fluctuation localization imaging-based fluorescence insitu hybridization (fliFISH) for accurate detection and counting of RNA copies in single cells.Nucleic Acids Res. 2018; 46: e7Crossref\nPubMed\nScopus (2)\nGoogle Scholar). We have optimized the standard tissue smFISH protocol (Lyubimova etal., 2013Lyubimova A. Itzkovitz S. Junker J.P. Fan Z.P. Wu X. van Oudenaarden A. Single-molecule mRNA detection and counting in mammalian tissue.Nat. Protoc. 2013; 8: 1743-1758Crossref\nPubMed\nScopus (62)\nGoogle Scholar) by substantially increasing the period of mRNA denaturation, which precedes the probe hybridization steps, from 5min to at least 3hr.'
fixed_text = ' '.join(re.sub(r'\((?:[^()]+(?:\([^()]+\))?)+\nGoogle Scholar\)', '', text).split())
print(fixed_text)输出:
介绍胰岛内分泌细胞在葡萄糖干扰下分泌胰岛素和胰高血糖素,维持葡萄糖稳态。分泌胰岛素的β细胞表现出形态、功能和分子变异,表明它们可能由具有特殊任务和生理反应的亚群体组成。β细胞异质性的特征包括葡萄糖反应性和分泌活性。然而,在胰腺中可视化转录本是不可行的,如果不使用专门的技术,如光敏染料。我们通过将探针杂交前的smFISH变性周期从5 5min大幅度增加到至少3小时,优化了标准的组织mRNA协议。
可以通过更改以下代码来进行改进,该代码还删除了前面的\(之前的空格,但是它与您想要的输出不匹配(这有缺陷):
fixed_text = re.sub(r' ?\((?:[^()]+(?:\([^()]+\))?)+\nGoogle Scholar\)', '', string)介绍胰岛内分泌细胞在葡萄糖干扰下分泌胰岛素和胰高血糖素,维持葡萄糖稳态。分泌胰岛素的β细胞表现出形态、功能和分子变异,表明它们可能由具有特殊任务和生理反应的亚群体组成。β细胞异质性的特征包括葡萄糖反应性和分泌活性。然而,在胰腺中可视化转录本是不可行的,如果不使用专门的技术,如光敏染料。我们通过将探针杂交前的smFISH变性周期从5 5min大幅度增加到至少3小时,优化了标准的组织mRNA协议。
发布于 2019-01-08 16:55:37
import re
if __name__ == '__main__':
source = """Introduction The endocrine cells in the pancreatic islets of Langerhans secrete insulin and glucagon in response to glucose perturbations to maintain glucose homeostasis. The insulin-secreting beta cells exhibit morphological, functional, and molecular variations, suggesting that they may consist of sub-populations with specialized tasks and physiological responses (Gutierrez etal., 2017Gutierrez G.D. Gromada J. Sussel L. Heterogeneity of the pancreatic beta cell.Front. Genet. 2017; 8: 22Crossref\nPubMed\nScopus (11)\nGoogle Scholar, Roscioni etal., 2016Roscioni S.S. Migliorini A. Gegg M. Lickert H. Impact of islet architecture on -cell heterogeneity, plasticity and function.Nat. Rev. Endocrinol. 2016; 12: 695-709Crossref\nPubMed\nScopus (36)\nGoogle Scholar). Features of beta cell heterogeneity include glucose responsiveness and secretory activity ..... Visualizing transcripts in the pancreas, however, has been infeasible without the use of specialized techniques such as photoswitchable dyes (Cui etal., 2018Cui Y. Hu D. Markillie L.M. Chrisler W.B. Gaffrey M.J. Ansong C. Sussel L. Orr G. Fluctuation localization imaging-based fluorescence insitu hybridization (fliFISH) for accurate detection and counting of RNA copies in single cells.Nucleic Acids Res. 2018; 46: e7Crossref\nPubMed\nScopus (2)\nGoogle Scholar). We have optimized the standard tissue smFISH protocol (Lyubimova etal., 2013Lyubimova A. Itzkovitz S. Junker J.P. Fan Z.P. Wu X. van Oudenaarden A. Single-molecule mRNA detection and counting in mammalian tissue.Nat. Protoc. 2013; 8: 1743-1758Crossref\nPubMed\nScopus (62)\nGoogle Scholar) by substantially increasing the period of mRNA denaturation, which precedes the probe hybridization steps, from 5min to at least 3hr."""
output = re.sub(' \(.*? etal\., .*?\\nGoogle Scholar\)', '', source, flags=re.DOTALL)
print(output)输出
Introduction The endocrine cells in the pancreatic islets of Langerhans secrete insulin and glucagon in response to glucose perturbations to maintain glucose homeostasis. The insulin-secreting beta cells exhibit morphological, functional, and molecular variations, suggesting that they may consist of sub-populations with specialized tasks and physiological responses. Features of beta cell heterogeneity include glucose responsiveness and secretory activity ..... Visualizing transcripts in the pancreas, however, has been infeasible without the use of specialized techniques such as photoswitchable dyes. We have optimized the standard tissue smFISH protocol by substantially increasing the period of mRNA denaturation, which precedes the probe hybridization steps, from 5min to at least 3hr.https://stackoverflow.com/questions/54095734
复制相似问题