首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >匹配大括号之间内容的Regex

匹配大括号之间内容的Regex
EN

Stack Overflow用户
提问于 2022-09-18 17:12:43
回答 1查看 65关注 0票数 1

我有以下文本文件,需要从元素和集群匹配中提取文本。我使用以下正则表达式:

代码语言:javascript
复制
CLUSTER\[at....\][\d\D]+?{[\d\D]+?items[\d\D]+?}|(ELEMENT\[at....\][\d\D]+?}[\d\D]+?})

虽然这很好,但是对于下面这样的特定文本文件(其中有些元素将有多个DV匹配),它不会只提取第一个元素的整个值匹配部分。

例如,ELEMENT[at0030]将省略DV_TEXT匹配和DV_PROPORTION匹配,而ELEMENT[at0028]将匹配我所需的一切。

我需要我的正则表达式能够获取每个ELEMENT的“值匹配”大括号中的所有内容,而不仅仅是从这里开始的第一个值。有什么帮助吗?

下面是我正在处理的一个文本文件的示例:

代码语言:javascript
复制
definition
    CLUSTER[at0000] matches {    -- Examination of a cleavage-stage embryo
        items cardinality matches {1..*; unordered} matches {
            ELEMENT[at0028] occurrences matches {0..1} matches {    -- Number of cells
                value matches {
                    DV_COUNT matches {*}
                }
            }
            ELEMENT[at0030] occurrences matches {0..1} matches {    -- Fragmentation
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0031,    -- None
                            at0032,    -- Mild fragmentation
                            at0033,    -- Moderate fragmentation
                            at0034]    -- Severe fragmentation
                        }
                    }
                    DV_TEXT matches {*}
                    DV_PROPORTION matches {*}
                }
            }
            ELEMENT[at0035] occurrences matches {0..1} matches {    -- Blastomere size
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0036,    -- Equal, stage specific
                            at0037,    -- Unequal, stage specific
                            at0053,    -- Equal, non-stage specific
                            at0054]    -- Unequal, non-stage specific
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0038] occurrences matches {0..1} matches {    -- Nucleation
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0039,    -- No visible nuclei
                            at0040,    -- Mononucleation
                            at0041,    -- Binucleation
                            at0051,    -- Multinucleation
                            at0052]    -- Broad multinucleation
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0042] occurrences matches {0..1} matches {    -- Cytoplasmic morphology
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0043] occurrences matches {0..1} matches {    -- Spatial distribution of cells
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0044] occurrences matches {0..1} matches {    -- Compaction
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0045,    -- None
                            at0046,    -- Minimal
                            at0047,    -- Moderate
                            at0048]    -- Complete
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0049] occurrences matches {0..*} matches {    -- Other morphological features
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0055] occurrences matches {0..1} matches {    -- Morphology grade
                value matches {
                    DV_TEXT matches {*}
                }
            }
        }
    }


ontology
    term_definitions = <
        ["en"] = <
            items = <
                ["at0000"] = <
                    text = <"Examination of a cleavage-stage embryo">
                    description = <"Morphological findings obtained by microscopy of the human cleavage-stage embryo.">
                >
                ["at0028"] = <
                    text = <"Number of cells">
                    description = <"Number of cells in a cleavage-stage embryo.">
                >
                ["at0030"] = <
                    text = <"Fragmentation">
                    description = <"Cytoplasmic fragmentation in a cleavage-stage embryo.">
                    comment = <"The proportion data type can be used to record a more precise assessment.">
                >
                ["at0031"] = <
                    text = <"None">
                    description = <"Absence of cytoplasmic fragments.">
                >
                ["at0032"] = <
                    text = <"Mild fragmentation">
                    description = <"Cytoplasmic fragments cover < 10% of the total cytoplasmic volume.">
                >
                ["at0033"] = <
                    text = <"Moderate fragmentation">
                    description = <"Cytoplasmic fragments cover 10 - 25% of the total cytoplasmic volume.">
                >
EN

回答 1

Stack Overflow用户

发布于 2022-09-18 18:46:26

例如:

代码语言:javascript
复制
const rx = /(?<=ELEMENT\[at\d{4}\] occurrences[^\n]+\n( +)value matches \{)[\d\D]+?\n\1(?=\})/g;

console.log(text.match(rx));

其基本思想是解决必须计数开始和结束大括号的问题,方法是使用模式"value matches"捕获换行符后面和"value matches"之前的空格数,然后匹配所有内容,直到有一个换行符,后面跟着相同的空格数和一个大括号\n\1\}

清楚了吗?

“证明”它确实有效:

代码语言:javascript
复制
const text = `
definition
    CLUSTER[at0000] matches {    -- Examination of a cleavage-stage embryo
        items cardinality matches {1..*; unordered} matches {
            ELEMENT[at0028] occurrences matches {0..1} matches {    -- Number of cells
                value matches {
                    DV_COUNT matches {*}
                }
            }
            ELEMENT[at0030] occurrences matches {0..1} matches {    -- Fragmentation
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0031,    -- None
                            at0032,    -- Mild fragmentation
                            at0033,    -- Moderate fragmentation
                            at0034]    -- Severe fragmentation
                        }
                    }
                    DV_TEXT matches {*}
                    DV_PROPORTION matches {*}
                }
            }
            ELEMENT[at0035] occurrences matches {0..1} matches {    -- Blastomere size
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0036,    -- Equal, stage specific
                            at0037,    -- Unequal, stage specific
                            at0053,    -- Equal, non-stage specific
                            at0054]    -- Unequal, non-stage specific
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0038] occurrences matches {0..1} matches {    -- Nucleation
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0039,    -- No visible nuclei
                            at0040,    -- Mononucleation
                            at0041,    -- Binucleation
                            at0051,    -- Multinucleation
                            at0052]    -- Broad multinucleation
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0042] occurrences matches {0..1} matches {    -- Cytoplasmic morphology
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0043] occurrences matches {0..1} matches {    -- Spatial distribution of cells
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0044] occurrences matches {0..1} matches {    -- Compaction
                value matches {
                    DV_CODED_TEXT matches {
                        defining_code matches {
                            [local::
                            at0045,    -- None
                            at0046,    -- Minimal
                            at0047,    -- Moderate
                            at0048]    -- Complete
                        }
                    }
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0049] occurrences matches {0..*} matches {    -- Other morphological features
                value matches {
                    DV_TEXT matches {*}
                }
            }
            ELEMENT[at0055] occurrences matches {0..1} matches {    -- Morphology grade
                value matches {
                    DV_TEXT matches {*}
                }
            }
        }
    }


ontology
    term_definitions = <
        ["en"] = <
            items = <
                ["at0000"] = <
                    text = <"Examination of a cleavage-stage embryo">
                    description = <"Morphological findings obtained by microscopy of the human cleavage-stage embryo.">
                >
                ["at0028"] = <
                    text = <"Number of cells">
                    description = <"Number of cells in a cleavage-stage embryo.">
                >
                ["at0030"] = <
                    text = <"Fragmentation">
                    description = <"Cytoplasmic fragmentation in a cleavage-stage embryo.">
                    comment = <"The proportion data type can be used to record a more precise assessment.">
                >
                ["at0031"] = <
                    text = <"None">
                    description = <"Absence of cytoplasmic fragments.">
                >
                ["at0032"] = <
                    text = <"Mild fragmentation">
                    description = <"Cytoplasmic fragments cover < 10% of the total cytoplasmic volume.">
                >
                ["at0033"] = <
                    text = <"Moderate fragmentation">
                    description = <"Cytoplasmic fragments cover 10 - 25% of the total cytoplasmic volume.">
                >
`;

const rx = /(?<=ELEMENT\[at\d{4}\] occurrences[^\n]+\n( +)value matches \{)[\d\D]+?\n\1(?=\})/g;

const match = text.match(rx);

if (match != null) {
  match.forEach((m, i) => console.log(`${i}: ${m}`));
}  

上面的内容与每个元素的“值匹配”大括号中的所有内容匹配。如果不需要任何前导或尾随空格,则可以相应地调整regex,例如:

代码语言:javascript
复制
const rx = /(?<=ELEMENT\[at\d{4}\] occurrences[^\n]+\n( +)value matches \{\s*)\S+[\s\S]*?(?=\s*\n\1\})/g;
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73764864

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档