我在一个文件夹中有4-10个xml文件,这些文件是从一个大的单个xml文件中分解出来的。幸运的是,解析xml很容易,因为我可以使用xmltodict包。因此,我可以使用单个xml文件做任何我需要做的事情。我将其转换为pandas数据帧以满足分析需求。但是,我需要将4个xml文件组合成一个pandas数据帧。假设没有数据/索引问题,这些文件肯定按顺序正确命名为00001.xml, 00002.xml, 00003.xml, 00004.xml。
import xmltodict
import numpy as np
import pandas as pd
from collections import Counter
with open('00001.xml') as fd:
doc = xmltodict.parse(fd.read())
def panda_maker (xml_dict):
channel_list = xml_dict ['logs']['log']['logData']['mnemonicList'].split(",")
logData_list = [i.split(",") for i in xml_dict ['logs']['log']['logData']['data']]
logData_list.insert(0, xml_dict ['logs']['log']['logData']['unitList'].split(","))
return pd.DataFrame(np.array(logData_list).reshape(len(logData_list),len(channel_list)), columns = channel_list)
logData_frame_01 = panda_maker(doc)
logData_frame_01.head() #all good如何将logData_frame_01 + _02 + _03 + _04巧妙地组合到一个数据帧中?上面程序中的任何进一步的抽象技巧也是非常受欢迎的。
发布于 2018-08-01 04:42:16
尝试:
doc = []
for i in range(1,5):
with open('0000{}.xml'.format(i)) as fd:
doc.append(xmltodict.parse(fd.read()))
def panda_maker (xml_dict):
logData_list = []
for xmlval in xml_dict:
channel_list = xmlval['logs']['log']['logData']['mnemonicList'].split(",")
temp = [i.split(",") for i in xml_dict ['logs']['log']['logData']['data']]
temp.insert(0, xml_dict ['logs']['log']['logData']['unitList'].split(","))
logData_list.extend(temp)
return pd.DataFrame(np.array(logData_list).reshape(len(logData_list),len(channel_list)), columns = channel_list)
logData_frame_01 = panda_maker(doc)
logData_frame_01.head() #all goodhttps://stackoverflow.com/questions/51621579
复制相似问题