文章/答案/技术大牛

发布

社区首页 >问答首页 >匹配大括号之间内容的Regex

问匹配大括号之间内容的Regex
EN

Stack Overflow用户

提问于 2022-09-18 17:12:43

回答 1查看 65关注 0票数 1

我有以下文本文件，需要从元素和集群匹配中提取文本。我使用以下正则表达式：

CLUSTER\[at....\][\d\D]+?{[\d\D]+?items[\d\D]+?}|(ELEMENT\[at....\][\d\D]+?}[\d\D]+?})

虽然这很好，但是对于下面这样的特定文本文件(其中有些元素将有多个DV匹配)，它不会只提取第一个元素的整个值匹配部分。

例如，ELEMENT[at0030]将省略DV_TEXT匹配和DV_PROPORTION匹配，而ELEMENT[at0028]将匹配我所需的一切。

我需要我的正则表达式能够获取每个ELEMENT的“值匹配”大括号中的所有内容，而不仅仅是从这里开始的第一个值。有什么帮助吗？

下面是我正在处理的一个文本文件的示例：

definition
    CLUSTER[at0000] matches {    -- Examination of a cleavage-stage embryo
        items cardinality matches {1..*; unordered} matches {
            ELEMENT[at0028] occurrences matches {0..1} matches {    -- Number of cells
                value matches {
                    DV_COUNT matches {*}
                }
            }
            ELEMENT[at0030] occurrences matches {0..1} matches {    -- Fragmentation
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0031,    -- None
                            at0032,    -- Mild fragmentation
                            at0033,    -- Moderate fragmentation
                            at0034]    -- Severe fragmentation
                        }
                    }
                    DV_TEXT matches {*}
                    DV_PROPORTION matches {*}
                }
            }
            ELEMENT[at0035] occurrences matches {0..1} matches {    -- Blastomere size
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0036,    -- Equal, stage specific
                            at0037,    -- Unequal, stage specific
                            at0053,    -- Equal, non-stage specific
                            at0054]    -- Unequal, non-stage specific
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0038] occurrences matches {0..1} matches {    -- Nucleation
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0039,    -- No visible nuclei
                            at0040,    -- Mononucleation
                            at0041,    -- Binucleation
                            at0051,    -- Multinucleation
                            at0052]    -- Broad multinucleation
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0042] occurrences matches {0..1} matches {    -- Cytoplasmic morphology
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0043] occurrences matches {0..1} matches {    -- Spatial distribution of cells
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0044] occurrences matches {0..1} matches {    -- Compaction
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0045,    -- None
                            at0046,    -- Minimal
                            at0047,    -- Moderate
                            at0048]    -- Complete
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0049] occurrences matches {0..*} matches {    -- Other morphological features
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0055] occurrences matches {0..1} matches {    -- Morphology grade
                value matches {
                    DV_TEXT matches {*}
                }
            }
        }
    }


ontology
    term_definitions = <
        ["en"] = <
            items = <
                ["at0000"] = <
                    text = <"Examination of a cleavage-stage embryo">
                    description = <"Morphological findings obtained by microscopy of the human cleavage-stage embryo.">
                >
                ["at0028"] = <
                    text = <"Number of cells">
                    description = <"Number of cells in a cleavage-stage embryo.">
                >
                ["at0030"] = <
                    text = <"Fragmentation">
                    description = <"Cytoplasmic fragmentation in a cleavage-stage embryo.">
                    comment = <"The proportion data type can be used to record a more precise assessment.">
                >
                ["at0031"] = <
                    text = <"None">
                    description = <"Absence of cytoplasmic fragments.">
                >
                ["at0032"] = <
                    text = <"Mild fragmentation">
                    description = <"Cytoplasmic fragments cover < 10% of the total cytoplasmic volume.">
                >
                ["at0033"] = <
                    text = <"Moderate fragmentation">
                    description = <"Cytoplasmic fragments cover 10 - 25% of the total cytoplasmic volume.">
                >

javascript

regex

回答 1

Stack Overflow用户

发布于 2022-09-18 18:46:26

例如：

const rx = /(?<=ELEMENT\[at\d{4}\] occurrences[^\n]+\n( +)value matches \{)[\d\D]+?\n\1(?=\})/g;

console.log(text.match(rx));

其基本思想是解决必须计数开始和结束大括号的问题，方法是使用模式"value matches"捕获换行符后面和"value matches"之前的空格数，然后匹配所有内容，直到有一个换行符，后面跟着相同的空格数和一个大括号\n\1\}。

清楚了吗？

“证明”它确实有效：

const text = `
definition
    CLUSTER[at0000] matches {    -- Examination of a cleavage-stage embryo
        items cardinality matches {1..*; unordered} matches {
            ELEMENT[at0028] occurrences matches {0..1} matches {    -- Number of cells
                value matches {
                    DV_COUNT matches {*}
                }
            }
            ELEMENT[at0030] occurrences matches {0..1} matches {    -- Fragmentation
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0031,    -- None
                            at0032,    -- Mild fragmentation
                            at0033,    -- Moderate fragmentation
                            at0034]    -- Severe fragmentation
                        }
                    }
                    DV_TEXT matches {*}
                    DV_PROPORTION matches {*}
                }
            }
            ELEMENT[at0035] occurrences matches {0..1} matches {    -- Blastomere size
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0036,    -- Equal, stage specific
                            at0037,    -- Unequal, stage specific
                            at0053,    -- Equal, non-stage specific
                            at0054]    -- Unequal, non-stage specific
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0038] occurrences matches {0..1} matches {    -- Nucleation
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0039,    -- No visible nuclei
                            at0040,    -- Mononucleation
                            at0041,    -- Binucleation
                            at0051,    -- Multinucleation
                            at0052]    -- Broad multinucleation
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0042] occurrences matches {0..1} matches {    -- Cytoplasmic morphology
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0043] occurrences matches {0..1} matches {    -- Spatial distribution of cells
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0044] occurrences matches {0..1} matches {    -- Compaction
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0045,    -- None
                            at0046,    -- Minimal
                            at0047,    -- Moderate
                            at0048]    -- Complete
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0049] occurrences matches {0..*} matches {    -- Other morphological features
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0055] occurrences matches {0..1} matches {    -- Morphology grade
                value matches {
                    DV_TEXT matches {*}
                }
            }
        }
    }


ontology
    term_definitions = <
        ["en"] = <
            items = <
                ["at0000"] = <
                    text = <"Examination of a cleavage-stage embryo">
                    description = <"Morphological findings obtained by microscopy of the human cleavage-stage embryo.">
                >
                ["at0028"] = <
                    text = <"Number of cells">
                    description = <"Number of cells in a cleavage-stage embryo.">
                >
                ["at0030"] = <
                    text = <"Fragmentation">
                    description = <"Cytoplasmic fragmentation in a cleavage-stage embryo.">
                    comment = <"The proportion data type can be used to record a more precise assessment.">
                >
                ["at0031"] = <
                    text = <"None">
                    description = <"Absence of cytoplasmic fragments.">
                >
                ["at0032"] = <
                    text = <"Mild fragmentation">
                    description = <"Cytoplasmic fragments cover < 10% of the total cytoplasmic volume.">
                >
                ["at0033"] = <
                    text = <"Moderate fragmentation">
                    description = <"Cytoplasmic fragments cover 10 - 25% of the total cytoplasmic volume.">
                >
`;

const rx = /(?<=ELEMENT\[at\d{4}\] occurrences[^\n]+\n( +)value matches \{)[\d\D]+?\n\1(?=\})/g;

const match = text.match(rx);

if (match != null) {
  match.forEach((m, i) => console.log(`${i}: ${m}`));
}

上面的内容与每个元素的“值匹配”大括号中的所有内容匹配。如果不需要任何前导或尾随空格，则可以相应地调整regex，例如：

const rx = /(?<=ELEMENT\[at\d{4}\] occurrences[^\n]+\n( +)value matches \{\s*)\S+[\s\S]*?(?=\s*\n\1\})/g;

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73764864

复制

相似问题

问匹配大括号之间内容的Regex
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问匹配大括号之间内容的RegexEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问匹配大括号之间内容的Regex
EN