我试图创建一个regex模式来获取字符串的一部分,该文件包含某些标头,而且所有的头都具有相同的格式。我目前正在使用python,并且希望保持这种状态。
下面是我遇到的一个示例文件:
TI TEST TEST TEST TEST TEST TEST TEST TEST AJSAOISJAO SOAI
ASASPAOS
SO EITCHA EITCHA EITCHA EITCHA EITCHA EITCHA EITCHA EITCHA
AB Purpose
To examine the evidence supporting the use of simulation-based assessments as surrogates for patient-related outcomes assessed in the workplace.
Method
The authors systematically searched MEDLINE, EMBASE, Scopus, and key journals through February 26, 2013. They included original studies that assessed health professionals and trainees using simulation and then linked those scores with patient-related outcomes assessed in the workplace. Two reviewers independently extracted information on participants, tasks, validity evidence, study quality, patent-related and simulation-based outcomes, and magnitude of correlation. All correlations were pooled using random-effects meta-analysis.
Results
Of 11,628 potentially relevant articles, the 33 included studies enrolled 1,203 participants, including postgraduate physicians (n = 24 studies), practicing physicians (n = 8), medical students (n = 6), dentists (n = 2), and nurses (n = 1). The pooled correlation for provider behaviors was 0.51 (95% confidence interval [Cl], 0.38 to 0.62; n = 27 studies); for time behaviors, 0.44 (95% Cl, 0.15 to 0.66; n = 7); and for patient outcomes, 0.24(95% Cl, 0.02 to 0.47; n = 5). Most reported validity evidence was favorable, though studies often included only correlational evidence. Validity evidence of internal structure (n = 13 studies), content (n = 12), response process (n = 2), and consequences (n = 1) were reported less often. Three tools showed large pooled correlations and favorable (albeit incomplete) validity evidence.
Conclusions
Simulation-based assessments often correlate positively with patient-related outcomes. Although these surrogates are imperfect, tools with established validity evidence may replace workplace-based assessments for evaluating select procedural skills.
OI MANEIRAO MANEIRAOMANEIRAOMANEIRAO MANEIRAO
SN 6516516516
EI 849819981981
PD FEB
PY 2015我目前的目标是捕获'AB‘标题的整个文本。值得注意的是,AB的内容的长度和格式并没有那么大的变化,它的重要部分总是段落,或者是一行文本,直到下一个标题。
我尝试了很多不同的雷克斯模式,让我更接近我想要的是:
\nAB ((.*?\n)+)(\n[A-Z]{2}\s)?但是,直到文件结束时,它会消耗它找到的每个头,我希望模式在AB之后遇到下一个标头后停止匹配,不管它是什么。
在两个大写字母和一个空格之后,标题始终遵循行间隔的模式,或者:
\n[A-Z]{2}\s感谢任何人在任何方面的帮助。
我的问题不同于一般的贪婪符号,因为它不是由一个不贪婪的角色来排序的,而是一个完整的“停止”组。
发布于 2019-05-01 02:16:02
https://stackoverflow.com/questions/55930777
复制相似问题