首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Regex -在两个字符串之间获取文本

Regex -在两个字符串之间获取文本
EN

Stack Overflow用户
提问于 2015-11-27 05:50:18
回答 2查看 161关注 0票数 0

我有一个大的文本文件,其中包含许多摘要(其中7k)。我想把他们分开。它们具有下列属性:

开始时的数字,从后面的一个句号开始

123。

它总是以:

PubMed - MEDLINE索引

如果我能从分隔的字符串中得到标题和抽象,那就更好了。如果我必须先把文章分开,然后再把课文分开,那我就没事了。

在示例中,标题是第三行:

代码语言:javascript
复制
Effects of propofol and isoflurane on haemodynamics and the inflammatory response in cardiopulmonary bypass surgery.

摘要在第8行:

代码语言:javascript
复制
Cardiopulmonary bypass (CPB) causes reperfusion injury...

我试图对此文本使用以下代码

Regex:

代码语言:javascript
复制
[0-9\.]*\s*(((?![0-9\.]*|MEDLINE).)+)\s*MEDLINE

文本:

代码语言:javascript
复制
1. Br J Biomed Sci. 2015;72(3):93-101.

Effects of propofol and isoflurane on haemodynamics and the inflammatory response
in cardiopulmonary bypass surgery.

Sayed S, Idriss NK, Sayyedf HG, Ashry AA, Rafatt DM, Mohamed AO, Blann AD.

Cardiopulmonary bypass (CPB) causes reperfusion injury that when most severe is
clinically manifested as a systemic inflammatory response syndrome. The
anaesthetic propofol may have anti-inflammatory properties that may reduce such a
response. We hypothesised differing effects of propofol and isoflurane on
inflammatory markers in patients having CBR Forty patients undergoing elective
CPB were randomised to receive either propofol or isoflurane for maintenance of
anaesthesia. CRP, IL-6, IL-8, HIF-1α (ELISA), CD11 and CD18 expression (flow
cytometry), and haemoxygenase (HO-1) promoter polymorphisms (PCR/electrophoresis)
were measured before anaesthetic induction, 4 hours post-CPB, and 24 hours later.
There were no differences in the 4 hours changes in CRP, IL-6, IL-8 or CD18
between the two groups, but those in the propofol group had higher HIF-1α (P =
0.016) and lower CD11 expression (P = 0.026). After 24 hours, compared to the
isoflurane group, the propofol group had significantly lower levels of CRP (P <
0.001), IL-6 (P < 0.001) and IL-8 (P < 0.001), with higher levels CD11 (P =
0.009) and CD18 (P = 0.002) expression. After 24 hours, patients on propofol had 
increased expression of shorter HO-1 GT(n) repeats than patients on isoflurane (P
= 0.001). Use of propofol in CPB is associated with a less adverse inflammatory
profile than is isofluorane, and an increased up-regulation of HO-1. This
supports the hypothesis that propofol has anti-inflammatory activity.

PMID: 26510263  [PubMed - indexed for MEDLINE]
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2015-11-27 09:41:37

Marianostribizhev提出了两种有用的解决方案

Mariano的解决方案:使用带有典型结尾的split方法

代码语言:javascript
复制
(?m)\[PubMed - indexed for MEDLINE\]$

演示:http://ideone.com/Qw5ss2

4+

stribizhev的解决方案:从文本中完全提取数据

代码语言:javascript
复制
(?m)^\s*\d+\..*\R{2}                 # Get to the title
(?<title>[^\n]*(?:\n(?!\n)[^\n]*)*)  # Get title
\R{2}                                # Get to the authors
[^\n]*(?:\n(?!\R)[^\R]*)*            # Consume authors
(?<abstract>[^\[]*(?:\[(?!PubMed[ ]-[ ]indexed[ ]for[ ]MEDLINE\])[^\[]*)*) #Grab abstract

演示:https://regex101.com/r/sG2yQ2/2

8+

票数 1
EN

Stack Overflow用户

发布于 2015-11-27 08:04:08

试试这个:

代码语言:javascript
复制
"^[0-9]+\..*\s+(.*)\s+.*\s+((?:\s|.)*?)\[PubMed - indexed for MEDLINE\]"

第一组是冠军。第二种是抽象的。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/33951001

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档