如何将下面的数据结构转换为多列的数据框?
d =[{'header1':' Energy','Header2':'8.87'},
{'header1':' Energy','Header2':'8.87'},
{'header1':' Energy Equipment & Services','Header2':'6.83'},
{'header1':' Oil & Gas Equipment & Services','Header2':'6.83'},
{'header1':' ENERFLEX LTD','Header2':'1.9'},
{'header1':' Oil, Gas & Consumable Fuels','Header2':'8.9'},
{'header1':' Integrated Oil & Gas','Header2':'6.86'},
{'header1':' CENOVUS ENERGY INC','Header2':'12.97'},
{'header1':' SUNCOR ENERGY INC','Header2':'55'}]
df = pd.DataFrame(d)预期输出
Column1|Column2|Column3|Colum4|Column5|Amount
Energy|||||8.87
Energy| Energy||||8.87
Energy| Energy| Energy Equipment & Services|||6.83
Energy| Energy| Energy Equipment & Services| Oil & Gas Equipment & Services||6.83
Energy| Energy| Energy Equipment & Services| Oil & Gas Equipment & Services| ENERFLEX LTD|1.9
Energy| Energy| Oil, Gas & Consumable Fuels|||8.9
Energy| Energy| Oil, Gas & Consumable Fuels| Integrated Oil & Gas||6.86
Energy| Energy| Oil, Gas & Consumable Fuels| Integrated Oil & Gas| CENOVUS ENERGY INC|12.97
Energy| Energy| Oil, Gas & Consumable Fuels| Integrated Oil & Gas| SUNCOR ENERGY INC|55发布于 2020-08-10 16:09:05
您可以在选项卡上拆分header1中的值(在本例中为5个空格),以便以几乎正确的格式将其扩展为一个数据帧:str.split(" {5}", expand=True)。然后,您可以向前填充空值。在此之后,只需清理和重命名列。
new_df = (
df["header1"]
.str.split(" {5}", expand=True)
.replace("", method="ffill")
.replace("", np.nan)
.dropna(how="all", axis=1)
.fillna("")
)
new_df.columns = [f"Column{i + 1}" for i in range(new_df.shape[1])]
new_df["Amount"] = df["Header2"]发布于 2020-08-10 15:37:42
看看这是否有帮助:
import numpy as np
import pandas as pd
d =[{'header1':' Energy','Header2':'8.87'},
{'header1':' Energy','Header2':'8.87'},
{'header1':' Energy Equipment & Services','Header2':'6.83'},
{'header1':' Oil & Gas Equipment & Services','Header2':'6.83'},
{'header1':' ENERFLEX LTD','Header2':'1.9'},
{'header1':' Oil, Gas & Consumable Fuels','Header2':'8.9'},
{'header1':' Integrated Oil & Gas','Header2':'6.86'},
{'header1':' CENOVUS ENERGY INC','Header2':'12.97'},
{'header1':' SUNCOR ENERGY INC','Header2':'55'}]已将标头%1密钥提取到csv文件
with open ('listofD.csv', 'w') as f:
content=""
for dict in d:
for key, value in dict.items():
if key=='header1':
text = value+'\n'
content=content+text
f.writelines(text)基于缩进的解析层次结构
indentation = []
indentation.append(0)
depth = 0
f = open("listofD.csv", 'r')
contentz=[]
line_count=0
for line in f:
print(line)
line_count += 1
line = line[:-1]
contenty = line.strip()
indent = len(line) - len(contenty)
if indent > indentation[-1] or line_count==1:
depth += 1
indentation.append(indent)
elif indent < indentation[-1]:
while indent < indentation[-1]:
depth -= 1
indentation.pop()
if indent != indentation[-1]:
raise RuntimeError("Bad formatting")
contentz.append(("\t"*depth)+contenty)将其放入数据帧中
df=pd.DataFrame([sub.split("\t") for sub in contentz])填充了所有的“as NaNs”和“None as”
df=df.replace(r'^\s*$', np.nan, regex=True)
mask = df.applymap(lambda x: x is None)
cols = df.columns[(mask).any()]
for col in df[cols]:
df.loc[mask[col], col] = ' '
df删除了所有空列,并用以前的行值填充了空格
df=df.dropna(how='all', axis=1)
df=df.ffill(axis = 0) 已将报头合并到数据帧
df1=pd.DataFrame(d)
df.merge(df1['Header2'], left_index=True, right_index=True)

https://stackoverflow.com/questions/63333527
复制相似问题