我试图解析一个XML文件并从行中删除不必要的标记。我的循环被捕获了,不会在标记上应用第二个if语句,我不知道为什么..我已经盯着这个看了一个多小时了,并且测试了新的方法,但是我一直得到错误的ParseError: mismatched tag:。从调试中,我可以看出它甚至没有进入第二个if语句,但在我看来,逻辑似乎应该是这样的。我知道我错过了一些小东西,但我想不出.有什么想法吗?
环
with open('test.xml') as inXML, open(outputFilename, 'w') as outXML:
outXML.write('<root>\n')
for line in inXML.readlines():
if (line.find("<sub>")):
newline = line.replace("<sub>", "")
newLine = newline.replace("</sub", "")
elif (line.find("<sup>")):
newline = line.replace("<sup>", "")
newLine = newline.replace("</sup", "")
outXML.write(re.sub('&[a-zA-Z]+;',anglicise,newLine))
outXML.write('\n</root>')XML测试
<pub>
<ID>5010</ID>
<title>Model-Checking for L<sub>2</sub</title>
<year>1997</year>
<booktitle>Universität Trier, Mathematik/Informatik, Forschungsbericht</booktitle>
<pages></pages>
<authors>
<author>Helmut Seidl</author>
</authors>
</pub>
<pub>
<ID>71035</ID>
<title>S_2p \subseteq ZPP<sup>NP</sup</title>
<year>2001</year>
<booktitle>Electronic Colloquium on Computational Complexity (ECCC)</booktitle>
<pages></pages>
<authors>
<author>Jin-yi Cai</author>
</authors>
</pub>发布于 2017-04-26 19:18:34
谢谢@juanpa.arrivillaga & @BrenBarn,解决方案在一行迭代中叠加了.replace()语句,如下所示:
with open('test.xml') as inXML, open(outputFilename, 'w') as outXML:
outXML.write('<root>\n')
for line in inXML.readlines():
line = line.replace("<sub>", "").replace("</sub", "").replace("<sup>", "").replace("</sup", "")
outXML.write(re.sub('&[a-zA-Z]+;',anglicise,line))
outXML.write('\n</root>')https://stackoverflow.com/questions/43642021
复制相似问题