我用下面的正则表达式来匹配一个以“摘要”开头的段落,
([^\']*(?=Summary)[^\']*)但它与所有文本匹配:regex101a
也试过
(?<=Summary).*?(?=]\.)这与任何东西都不匹配:regex101b
我认为这与文本文件的格式有关。
下面是一个示例:
COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The
reference sequence was derived from AC105339.9 and FJ695193.1.
This sequence is a reference standard in the RefSeqGene project.
Summary: Adaptor protein complex 3 (AP-3 complex) is a
heterotrimeric protein complex involved in the formation of
clathrin-coated synaptic vesicles. The protein encoded by this gene
represents the beta subunit of the neuron-specific AP-3 complex and
was first identified as the target antigen in human paraneoplastic
neurologic disorders. The encoded subunit binds clathrin and is
phosphorylated by a casein kinase-like protein, which mediates
synaptic vesicle coat assembly. Defects in this gene are a cause of
early-onset epileptic encephalopathy. [provided by RefSeq, Feb
2017].
PRIMARY REFSEQ_SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP
1-35060 AC105339.9 88079-123138
35061-35259 FJ695193.1 1-199 c
35260-57628 AC105339.9 123337-145705这就是我的目标:
Summary: Adaptor protein complex 3 (AP-3 complex) is a
heterotrimeric protein complex involved in the formation of
clathrin-coated synaptic vesicles. The protein encoded by this gene
represents the beta subunit of the neuron-specific AP-3 complex and
was first identified as the target antigen in human paraneoplastic
neurologic disorders. The encoded subunit binds clathrin and is
phosphorylated by a casein kinase-like protein, which mediates
synaptic vesicle coat assembly. Defects in this gene are a cause of
early-onset epileptic encephalopathy. [provided by RefSeq, Feb
2017].发布于 2017-08-31 05:02:24
我认为这是一个与您的段落相匹配的健壮模式(使用Multiline标志):
^\s+$\n^([ \t]+)Summary.*(?:\n\1[ \t]*\S.*)+工作示例:https://regex101.com/r/P6KlBa/2
([ \t]+)捕获每行开头的空格数。有些口味有\h for 水平空间。Summary.* --第一行以“摘要”开头。(\n\1([ \t]+)*\S.*)* -匹配更多的非空行。https://stackoverflow.com/questions/45973353
复制相似问题