我很难找到通过python将XML文件转换为CSV的方法。这个文件有多个属性,我需要在dataframe中拥有这些属性。以下是XML文件的示例:
<helpdesk-tickets type="array">
---<helpdesk-ticket>
------<account-id type="integer">123</account-id>
---<notes type="array">
------<helpdesk-note>
---------<body>content 1 I need</body>
------</helpdesk-note>
------<helpdesk-note>
---------<body>content 2 I need</body>
------</helpdesk-note>
---</notes>
---</helpdesk-ticket>
---<helpdesk-ticket>
------<account-id type="integer">456</account-id>
---<notes type="array">
------<helpdesk-note>
---------<body>content 3 I need </body>
------</helpdesk-note>
------<helpdesk-note>
---------<body>content 4 I need </body>
------</helpdesk-note>
---</notes>
---</helpdesk-ticket>
</helpdesk-tickets>这是我的密码:
import xml.etree.ElementTree as Xet
import pandas as pd
cols = ["account-id","notes"]
rows = []
xmlparse = Xet.parse('E:\python\Tickets132.xml')
root = xmlparse.getroot()
for i in root:
display_id = i.find("account-id").text
for att in root.findall('./helpdesk-ticket/notes/helpdesk-note'):
notes2 = att.find("body").text
rows.append({
"account-id": display_id,
"notes" : notes2,
})
df91 = pd.DataFrame(rows, columns=cols)
display (df91)
df91.to_csv('output21.csv')我得到的是:
account-id notes
0 123 content 1 I need
1 123 content 2 I need
2 123 content 3 I need
3 123 content 4 I need预期产出:
account-id notes
0 123 content 1 I need
1 123 content 2 I need
2 456 content 3 I need
3 456 content 4 I need提前感谢!
发布于 2022-06-23 07:25:23
问题是,您首先在整个文件上迭代帐户id,然后再使用迭代。你需要嵌套循环。
这应该是可行的:
import xml.etree.ElementTree as Xet
import pandas as pd
cols = ["account-id", "notes"]
rows = []
xmlparse = Xet.parse('E:\python\Tickets132.xml')
root = xmlparse.getroot()
for helpdesk_ticket in root.findall("./helpdesk-ticket"): # iteration over every helpdesk_ticket
display_id = helpdesk_ticket.find("account-id").text # save account-id
for helpdesk_note in helpdesk_ticket.findall(".//helpdesk-note"): # find every helpdesk-note in iterated helpdesk_ticket
notes2 = helpdesk_note.find("./body").text # find body text
rows.append({
"account-id": display_id,
"notes": notes2,
})
df91 = pd.DataFrame(rows, columns=cols)
display(df91)
df91.to_csv('output21.csv')https://stackoverflow.com/questions/72725615
复制相似问题