我有一个大的文本文件,其中包含许多摘要(其中7k)。我想把他们分开。它们具有下列属性:
开始时的数字,从后面的一个句号开始
123。
它总是以:
PubMed - MEDLINE索引
如果我能从分隔的字符串中得到标题和抽象,那就更好了。如果我必须先把文章分开,然后再把课文分开,那我就没事了。
在示例中,标题是第三行:
Effects of propofol and isoflurane on haemodynamics and the inflammatory response in cardiopulmonary bypass surgery.摘要在第8行:
Cardiopulmonary bypass (CPB) causes reperfusion injury...我试图对此文本使用以下代码
Regex:
[0-9\.]*\s*(((?![0-9\.]*|MEDLINE).)+)\s*MEDLINE文本:
1. Br J Biomed Sci. 2015;72(3):93-101.
Effects of propofol and isoflurane on haemodynamics and the inflammatory response
in cardiopulmonary bypass surgery.
Sayed S, Idriss NK, Sayyedf HG, Ashry AA, Rafatt DM, Mohamed AO, Blann AD.
Cardiopulmonary bypass (CPB) causes reperfusion injury that when most severe is
clinically manifested as a systemic inflammatory response syndrome. The
anaesthetic propofol may have anti-inflammatory properties that may reduce such a
response. We hypothesised differing effects of propofol and isoflurane on
inflammatory markers in patients having CBR Forty patients undergoing elective
CPB were randomised to receive either propofol or isoflurane for maintenance of
anaesthesia. CRP, IL-6, IL-8, HIF-1α (ELISA), CD11 and CD18 expression (flow
cytometry), and haemoxygenase (HO-1) promoter polymorphisms (PCR/electrophoresis)
were measured before anaesthetic induction, 4 hours post-CPB, and 24 hours later.
There were no differences in the 4 hours changes in CRP, IL-6, IL-8 or CD18
between the two groups, but those in the propofol group had higher HIF-1α (P =
0.016) and lower CD11 expression (P = 0.026). After 24 hours, compared to the
isoflurane group, the propofol group had significantly lower levels of CRP (P <
0.001), IL-6 (P < 0.001) and IL-8 (P < 0.001), with higher levels CD11 (P =
0.009) and CD18 (P = 0.002) expression. After 24 hours, patients on propofol had
increased expression of shorter HO-1 GT(n) repeats than patients on isoflurane (P
= 0.001). Use of propofol in CPB is associated with a less adverse inflammatory
profile than is isofluorane, and an increased up-regulation of HO-1. This
supports the hypothesis that propofol has anti-inflammatory activity.
PMID: 26510263 [PubMed - indexed for MEDLINE]发布于 2015-11-27 09:41:37
Mariano和stribizhev提出了两种有用的解决方案
Mariano的解决方案:使用带有典型结尾的split方法
(?m)\[PubMed - indexed for MEDLINE\]$演示:http://ideone.com/Qw5ss2
4+
stribizhev的解决方案:从文本中完全提取数据
(?m)^\s*\d+\..*\R{2} # Get to the title
(?<title>[^\n]*(?:\n(?!\n)[^\n]*)*) # Get title
\R{2} # Get to the authors
[^\n]*(?:\n(?!\R)[^\R]*)* # Consume authors
(?<abstract>[^\[]*(?:\[(?!PubMed[ ]-[ ]indexed[ ]for[ ]MEDLINE\])[^\[]*)*) #Grab abstract演示:https://regex101.com/r/sG2yQ2/2
8+
发布于 2015-11-27 08:04:08
试试这个:
"^[0-9]+\..*\s+(.*)\s+.*\s+((?:\s|.)*?)\[PubMed - indexed for MEDLINE\]"第一组是冠军。第二种是抽象的。
https://stackoverflow.com/questions/33951001
复制相似问题