文章/答案/技术大牛

发布

社区首页 >问答首页 >通过匹配第一行单词来选择整段的正则表达式

问通过匹配第一行单词来选择整段的正则表达式
EN

Stack Overflow用户

提问于 2017-08-31 04:40:53

回答 1查看 2.7K关注 0票数 1

我用下面的正则表达式来匹配一个以“摘要”开头的段落，

([^\']*(?=Summary)[^\']*)

但它与所有文本匹配：regex101a

也试过

(?<=Summary).*?(?=]\.)

这与任何东西都不匹配：regex101b

我认为这与文本文件的格式有关。

下面是一个示例：

COMMENT     REVIEWED REFSEQ: This record has been curated by NCBI staff. The
            reference sequence was derived from AC105339.9 and FJ695193.1.
            This sequence is a reference standard in the RefSeqGene project.

        Summary: Adaptor protein complex 3 (AP-3 complex) is a
        heterotrimeric protein complex involved in the formation of
        clathrin-coated synaptic vesicles. The protein encoded by this gene
        represents the beta subunit of the neuron-specific AP-3 complex and
        was first identified as the target antigen in human paraneoplastic
        neurologic disorders. The encoded subunit binds clathrin and is
        phosphorylated by a casein kinase-like protein, which mediates
        synaptic vesicle coat assembly. Defects in this gene are a cause of
        early-onset epileptic encephalopathy. [provided by RefSeq, Feb
        2017].
PRIMARY     REFSEQ_SPAN         PRIMARY_IDENTIFIER PRIMARY_SPAN        COMP
            1-35060             AC105339.9         88079-123138
            35061-35259         FJ695193.1         1-199               c
            35260-57628         AC105339.9         123337-145705

这就是我的目标：

    Summary: Adaptor protein complex 3 (AP-3 complex) is a
    heterotrimeric protein complex involved in the formation of
    clathrin-coated synaptic vesicles. The protein encoded by this gene
    represents the beta subunit of the neuron-specific AP-3 complex and
    was first identified as the target antigen in human paraneoplastic
    neurologic disorders. The encoded subunit binds clathrin and is
    phosphorylated by a casein kinase-like protein, which mediates
    synaptic vesicle coat assembly. Defects in this gene are a cause of
    early-onset epileptic encephalopathy. [provided by RefSeq, Feb
    2017].

regex

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-08-31 05:02:24

我认为这是一个与您的段落相匹配的健壮模式(使用Multiline标志)：

^\s+$\n^([ \t]+)Summary.*(?:\n\1[ \t]*\S.*)+

工作示例：https://regex101.com/r/P6KlBa/2

“摘要”可以作为行中的第一个单词出现。我们从匹配空行开始，以确保“汇总”在段落的开头。
([ \t]+)捕获每行开头的空格数。有些口味有\h for 水平空间。
Summary.* --第一行以“摘要”开头。
(\n\1([ \t]+)*\S.*)* -匹配更多的非空行。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/45973353

复制

相似问题

问通过匹配第一行单词来选择整段的正则表达式
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问通过匹配第一行单词来选择整段的正则表达式EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问通过匹配第一行单词来选择整段的正则表达式
EN