我有以下文本文件,需要从元素和集群匹配中提取文本。我使用以下正则表达式:
CLUSTER\[at....\][\d\D]+?{[\d\D]+?items[\d\D]+?}|(ELEMENT\[at....\][\d\D]+?}[\d\D]+?})虽然这很好,但是对于下面这样的特定文本文件(其中有些元素将有多个DV匹配),它不会只提取第一个元素的整个值匹配部分。
例如,ELEMENT[at0030]将省略DV_TEXT匹配和DV_PROPORTION匹配,而ELEMENT[at0028]将匹配我所需的一切。
我需要我的正则表达式能够获取每个ELEMENT的“值匹配”大括号中的所有内容,而不仅仅是从这里开始的第一个值。有什么帮助吗?
下面是我正在处理的一个文本文件的示例:
definition
CLUSTER[at0000] matches { -- Examination of a cleavage-stage embryo
items cardinality matches {1..*; unordered} matches {
ELEMENT[at0028] occurrences matches {0..1} matches { -- Number of cells
value matches {
DV_COUNT matches {*}
}
}
ELEMENT[at0030] occurrences matches {0..1} matches { -- Fragmentation
value matches {
DV_CODED_TEXT matches {
defining_code matches {
[local::
at0031, -- None
at0032, -- Mild fragmentation
at0033, -- Moderate fragmentation
at0034] -- Severe fragmentation
}
}
DV_TEXT matches {*}
DV_PROPORTION matches {*}
}
}
ELEMENT[at0035] occurrences matches {0..1} matches { -- Blastomere size
value matches {
DV_CODED_TEXT matches {
defining_code matches {
[local::
at0036, -- Equal, stage specific
at0037, -- Unequal, stage specific
at0053, -- Equal, non-stage specific
at0054] -- Unequal, non-stage specific
}
}
DV_TEXT matches {*}
}
}
ELEMENT[at0038] occurrences matches {0..1} matches { -- Nucleation
value matches {
DV_CODED_TEXT matches {
defining_code matches {
[local::
at0039, -- No visible nuclei
at0040, -- Mononucleation
at0041, -- Binucleation
at0051, -- Multinucleation
at0052] -- Broad multinucleation
}
}
DV_TEXT matches {*}
}
}
ELEMENT[at0042] occurrences matches {0..1} matches { -- Cytoplasmic morphology
value matches {
DV_TEXT matches {*}
}
}
ELEMENT[at0043] occurrences matches {0..1} matches { -- Spatial distribution of cells
value matches {
DV_TEXT matches {*}
}
}
ELEMENT[at0044] occurrences matches {0..1} matches { -- Compaction
value matches {
DV_CODED_TEXT matches {
defining_code matches {
[local::
at0045, -- None
at0046, -- Minimal
at0047, -- Moderate
at0048] -- Complete
}
}
DV_TEXT matches {*}
}
}
ELEMENT[at0049] occurrences matches {0..*} matches { -- Other morphological features
value matches {
DV_TEXT matches {*}
}
}
ELEMENT[at0055] occurrences matches {0..1} matches { -- Morphology grade
value matches {
DV_TEXT matches {*}
}
}
}
}
ontology
term_definitions = <
["en"] = <
items = <
["at0000"] = <
text = <"Examination of a cleavage-stage embryo">
description = <"Morphological findings obtained by microscopy of the human cleavage-stage embryo.">
>
["at0028"] = <
text = <"Number of cells">
description = <"Number of cells in a cleavage-stage embryo.">
>
["at0030"] = <
text = <"Fragmentation">
description = <"Cytoplasmic fragmentation in a cleavage-stage embryo.">
comment = <"The proportion data type can be used to record a more precise assessment.">
>
["at0031"] = <
text = <"None">
description = <"Absence of cytoplasmic fragments.">
>
["at0032"] = <
text = <"Mild fragmentation">
description = <"Cytoplasmic fragments cover < 10% of the total cytoplasmic volume.">
>
["at0033"] = <
text = <"Moderate fragmentation">
description = <"Cytoplasmic fragments cover 10 - 25% of the total cytoplasmic volume.">
>发布于 2022-09-18 18:46:26
例如:
const rx = /(?<=ELEMENT\[at\d{4}\] occurrences[^\n]+\n( +)value matches \{)[\d\D]+?\n\1(?=\})/g;
console.log(text.match(rx));其基本思想是解决必须计数开始和结束大括号的问题,方法是使用模式"value matches"捕获换行符后面和"value matches"之前的空格数,然后匹配所有内容,直到有一个换行符,后面跟着相同的空格数和一个大括号\n\1\}。
清楚了吗?
“证明”它确实有效:
const text = `
definition
CLUSTER[at0000] matches { -- Examination of a cleavage-stage embryo
items cardinality matches {1..*; unordered} matches {
ELEMENT[at0028] occurrences matches {0..1} matches { -- Number of cells
value matches {
DV_COUNT matches {*}
}
}
ELEMENT[at0030] occurrences matches {0..1} matches { -- Fragmentation
value matches {
DV_CODED_TEXT matches {
defining_code matches {
[local::
at0031, -- None
at0032, -- Mild fragmentation
at0033, -- Moderate fragmentation
at0034] -- Severe fragmentation
}
}
DV_TEXT matches {*}
DV_PROPORTION matches {*}
}
}
ELEMENT[at0035] occurrences matches {0..1} matches { -- Blastomere size
value matches {
DV_CODED_TEXT matches {
defining_code matches {
[local::
at0036, -- Equal, stage specific
at0037, -- Unequal, stage specific
at0053, -- Equal, non-stage specific
at0054] -- Unequal, non-stage specific
}
}
DV_TEXT matches {*}
}
}
ELEMENT[at0038] occurrences matches {0..1} matches { -- Nucleation
value matches {
DV_CODED_TEXT matches {
defining_code matches {
[local::
at0039, -- No visible nuclei
at0040, -- Mononucleation
at0041, -- Binucleation
at0051, -- Multinucleation
at0052] -- Broad multinucleation
}
}
DV_TEXT matches {*}
}
}
ELEMENT[at0042] occurrences matches {0..1} matches { -- Cytoplasmic morphology
value matches {
DV_TEXT matches {*}
}
}
ELEMENT[at0043] occurrences matches {0..1} matches { -- Spatial distribution of cells
value matches {
DV_TEXT matches {*}
}
}
ELEMENT[at0044] occurrences matches {0..1} matches { -- Compaction
value matches {
DV_CODED_TEXT matches {
defining_code matches {
[local::
at0045, -- None
at0046, -- Minimal
at0047, -- Moderate
at0048] -- Complete
}
}
DV_TEXT matches {*}
}
}
ELEMENT[at0049] occurrences matches {0..*} matches { -- Other morphological features
value matches {
DV_TEXT matches {*}
}
}
ELEMENT[at0055] occurrences matches {0..1} matches { -- Morphology grade
value matches {
DV_TEXT matches {*}
}
}
}
}
ontology
term_definitions = <
["en"] = <
items = <
["at0000"] = <
text = <"Examination of a cleavage-stage embryo">
description = <"Morphological findings obtained by microscopy of the human cleavage-stage embryo.">
>
["at0028"] = <
text = <"Number of cells">
description = <"Number of cells in a cleavage-stage embryo.">
>
["at0030"] = <
text = <"Fragmentation">
description = <"Cytoplasmic fragmentation in a cleavage-stage embryo.">
comment = <"The proportion data type can be used to record a more precise assessment.">
>
["at0031"] = <
text = <"None">
description = <"Absence of cytoplasmic fragments.">
>
["at0032"] = <
text = <"Mild fragmentation">
description = <"Cytoplasmic fragments cover < 10% of the total cytoplasmic volume.">
>
["at0033"] = <
text = <"Moderate fragmentation">
description = <"Cytoplasmic fragments cover 10 - 25% of the total cytoplasmic volume.">
>
`;
const rx = /(?<=ELEMENT\[at\d{4}\] occurrences[^\n]+\n( +)value matches \{)[\d\D]+?\n\1(?=\})/g;
const match = text.match(rx);
if (match != null) {
match.forEach((m, i) => console.log(`${i}: ${m}`));
}
上面的内容与每个元素的“值匹配”大括号中的所有内容匹配。如果不需要任何前导或尾随空格,则可以相应地调整regex,例如:
const rx = /(?<=ELEMENT\[at\d{4}\] occurrences[^\n]+\n( +)value matches \{\s*)\S+[\s\S]*?(?=\s*\n\1\})/g;https://stackoverflow.com/questions/73764864
复制相似问题