我的输入文件是:
<?xml version='1.0' encoding='UTF-8'?>
<try>
something somethingRNA and RNA in RNA.
</try> 我的Python代码:
import lxml.etree as ET
import openpyxl
import re
url = 'output_15012015_test.xml'
tree = ET.parse(url)
lncrna = "RNA"
abstract = tree.xpath('//try)
string = abstract[0].text
if(abstract):
anotherString = re.sub(r'\b'+lncrna.lower()+'\\b', '<mark>'+lncrna+'</mark>', string.lower())
abstract[0].text = anotherString
print abstract[0].text
tree.write('FalseRoller.xml', encoding='UTF-8', pretty_print=True)输出
我得到以下替换的文本,而不是<mark>RNA</mark>
<mark>RNA</mark>我认为这与tree.write()方法有关。我也是Python和社区的新手。请帮我处理这个。
发布于 2015-01-20 06:50:01
您是在元素.text中设置XML标记,因此当写入XML时,它被解释为文本,而不是标记,字符用&...;转义。
你想做的是:
.text分为三个部分:新标签之前、新标签中、新标签之后见代码:
tree = ET.parse(url)
lncrna = "RNA"
abstract = tree.xpath('//try')
aList = re.split(r'(\b'+lncrna+r'\b)', abstract[0].text, flags=re.IGNORECASE)
abstract[0].text = aList[0]
for i in range(1,len(aList),2):
anElement = ET.SubElement(abstract[0], 'mark')
anElement.text = aList[i]
anElement.tail = aList[i+1]
abstract[0].insert( (i-1)/2, anElement )
print abstract[0].text
tree.write('FalseRoller.xml', encoding='UTF-8', pretty_print=True)https://stackoverflow.com/questions/28039181
复制相似问题