请帮帮忙。我试图解析一个大型XML文件并将数据传输到CSV文件中。我总是在标签之间丢失大量的数据,但是我不知道为什么。
以下是一些XML:
<testcase internalid="1256092" name="hls_vtt_single_default_diable_vtt">
<node_order><![CDATA[7]]></node_order>
<externalid><![CDATA[6121]]></externalid>
<version><![CDATA[2]]></version>
<summary><![CDATA[<p>condition: single subtitle track is available in stream and it is default set the vtt track to diable status before playing stream.</p>
<p> </p>
<div>play stream no subtitle is rendered along with A/V<span class="Apple-tab-span" style="white-space:pre"> </span></div>
<div> </div>]]></summary>
<preconditions><![CDATA[]]></preconditions>
<execution_type><![CDATA[1]]></execution_type>
<importance><![CDATA[2]]></importance>
</testcase>下面是我的Python代码:
class CaseHandler( xml.sax.ContentHandler ):
def __init__(self):
self.CurrentData = ""
self.externalid = ""
self.version = ""
self.summary = ""
def startElement(self, tag, attributes):
self.CurrentData = tag
if tag == "testcase":
name = attributes["name"]
outfile.write("\n" + name + " ,")
def endElement(self, tag):
if self.CurrentData == "externalid":
outfile.write("OTV52-" + self.externalid + ",")
elif self.CurrentData == "version":
outfile.write("Version: " + self.version + ",")
elif self.CurrentData == "summary":
outfile.write("Summary: " + self.summary + ",")
def characters(self, content):
if self.CurrentData == "externalid":
self.externalid = content
elif self.CurrentData == "version":
self.version = content
elif self.CurrentData == "summary":
self.summary = content
if ( __name__ == "__main__"):
parser = xml.sax.make_parser()
parser.setFeature(xml.sax.handler.feature_namespaces, 0)
Handler = CaseHandler()
parser.setContentHandler( Handler )
parser.parse("OTV52.xml")问题是,它没有返回“摘要”括号中的任何信息。外部的数据和版本的数据都很好。但是,从“摘要”括号返回的所有内容都是div括号。
我需要它回来:
条件:单字幕磁道在流中可用,默认情况下,在播放流媒体之前将vtt曲目设置为可食用状态。播放流不随A/V呈现字幕
发布于 2016-05-20 02:01:18
如此answer所示,您应该将解析的值+=content与每次对characters()的调用连接起来。但是,要删除已解析的CDATA中的xml内容,包括换行和空格,请考虑regex替换:
import xml.sax
import re
class CaseHandler( xml.sax.ContentHandler ):
def __init__(self):
self.CurrentData = ""
self.externalid = ""
self.version = ""
self.summary = ""
def startElement(self, tag, attributes):
self.CurrentData = tag
if tag == "testcase":
name = attributes["name"]
outfile.write("\r" + name + " ,")
def endElement(self, tag):
if self.CurrentData == "externalid":
outfile.write("OTV52-" + self.externalid + ",")
elif self.CurrentData == "version":
outfile.write("Version: " + self.version + ",")
elif self.CurrentData == "summary":
self.summary = re.sub("<[^>]+>", "", self.summary)
self.summary = re.sub("\n| |/\s\s/", "", self.summary).strip()
outfile.write("Summary: " + self.summary + ",")
def characters(self, content):
if self.CurrentData == "externalid":
self.externalid += content
elif self.CurrentData == "version":
self.version += content
elif self.CurrentData == "summary":
self.summary += content输出(全部一行)
#
# hls_vtt_single_default_diable_vtt ,OTV52-6121,Version: 2,Summary: \
# condition: single subtitle track is available in stream and it is \
# default set the vtt track to diable status before playing \
# stream.play stream no subtitle is rendered along with A/V, \https://stackoverflow.com/questions/37334959
复制相似问题