首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >通过匹配第一行单词来选择整段的正则表达式

通过匹配第一行单词来选择整段的正则表达式
EN

Stack Overflow用户
提问于 2017-08-31 04:40:53
回答 1查看 2.7K关注 0票数 1

我用下面的正则表达式来匹配一个以“摘要”开头的段落,

代码语言:javascript
复制
([^\']*(?=Summary)[^\']*)

但它与所有文本匹配:regex101a

也试过

代码语言:javascript
复制
(?<=Summary).*?(?=]\.)

这与任何东西都不匹配:regex101b

我认为这与文本文件的格式有关。

下面是一个示例:

代码语言:javascript
复制
COMMENT     REVIEWED REFSEQ: This record has been curated by NCBI staff. The
            reference sequence was derived from AC105339.9 and FJ695193.1.
            This sequence is a reference standard in the RefSeqGene project.

        Summary: Adaptor protein complex 3 (AP-3 complex) is a
        heterotrimeric protein complex involved in the formation of
        clathrin-coated synaptic vesicles. The protein encoded by this gene
        represents the beta subunit of the neuron-specific AP-3 complex and
        was first identified as the target antigen in human paraneoplastic
        neurologic disorders. The encoded subunit binds clathrin and is
        phosphorylated by a casein kinase-like protein, which mediates
        synaptic vesicle coat assembly. Defects in this gene are a cause of
        early-onset epileptic encephalopathy. [provided by RefSeq, Feb
        2017].
PRIMARY     REFSEQ_SPAN         PRIMARY_IDENTIFIER PRIMARY_SPAN        COMP
            1-35060             AC105339.9         88079-123138
            35061-35259         FJ695193.1         1-199               c
            35260-57628         AC105339.9         123337-145705

这就是我的目标:

代码语言:javascript
复制
    Summary: Adaptor protein complex 3 (AP-3 complex) is a
    heterotrimeric protein complex involved in the formation of
    clathrin-coated synaptic vesicles. The protein encoded by this gene
    represents the beta subunit of the neuron-specific AP-3 complex and
    was first identified as the target antigen in human paraneoplastic
    neurologic disorders. The encoded subunit binds clathrin and is
    phosphorylated by a casein kinase-like protein, which mediates
    synaptic vesicle coat assembly. Defects in this gene are a cause of
    early-onset epileptic encephalopathy. [provided by RefSeq, Feb
    2017].
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-08-31 05:02:24

我认为这是一个与您的段落相匹配的健壮模式(使用Multiline标志):

代码语言:javascript
复制
^\s+$\n^([ \t]+)Summary.*(?:\n\1[ \t]*\S.*)+

工作示例:https://regex101.com/r/P6KlBa/2

  • “摘要”可以作为行中的第一个单词出现。我们从匹配空行开始,以确保“汇总”在段落的开头。
  • ([ \t]+)捕获每行开头的空格数。有些口味有\h for 水平空间
  • Summary.* --第一行以“摘要”开头。
  • (\n\1([ \t]+)*\S.*)* -匹配更多的非空行。
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/45973353

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档