例如,我有这样的xml
<managedObject class="New" distName="MB-85404/TB-85404/ST-4/a" version="xL20A_1911_002" operation="open">
<p name="a">320ms</p>
<p name="b">enabled</p>
<p name="c">640ms</p>
<p name="d">320ms</p>
<p name="e">640ms</p>
<p name="f">1280ms</p>
<p name="g">6</p>
</managedObject>
<managedObject class="new" distName="AL-76867/MB-85404/TB-85404/ST-4/b" version="xL20A_1911_002" operation="open">
<p name="h">320ms</p>
<p name="i">enabled</p>
<p name="j">640ms</p>
<p name="k">320ms</p>
<p name="l">640ms</p>
<p name="a">1280ms</p>
<p name="l">6</p>
</managedObject>
<managedObject class="New" distName="MB-85404/TB-85404/ST-4/c" version="xL20A_1911_002" operation="open">
<p name="a">320ms</p>
<p name="p">enabled</p>
<p name="q">640ms</p>
<p name="r">320ms</p>
<p name="s">640ms</p>
<p name="t">1280ms</p>
<p name="u">6</p>
</managedObject>如果特定的正则表达式匹配,我想从中提取distName。例如
pattern = re.compile('MB-\d/TB-\d/ST-\d/')
for i in Soup.find_all('managedObject',{'distName':pattern}):如果distName与模式匹配,则提取distName,否则将其保留。
我试过很多东西,但就是不能通过。
发布于 2020-09-12 03:56:10
您的正则表达式在开头的\d和^之后似乎缺少+符号。此外,如果使用html.parser,则标记和属性的名称必须是小写的(txt是所讨论的XML片段):
soup = BeautifulSoup(txt, 'html.parser')
pattern = re.compile(r'^MB-\d+/TB-\d+/ST-\d+/.*')
for i in soup.find_all('managedobject',{'distname':pattern}):
print(i['distname'])打印:
MB-85404/TB-85404/ST-4/a
MB-85404/TB-85404/ST-4/chttps://stackoverflow.com/questions/63853459
复制相似问题