首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python pandas :基于较低级别的间距从单列到多列

Python pandas :基于较低级别的间距从单列到多列
EN

Stack Overflow用户
提问于 2020-08-10 10:56:04
回答 2查看 42关注 0票数 1

如何将下面的数据结构转换为多列的数据框?

代码语言:javascript
复制
d =[{'header1':'          Energy','Header2':'8.87'}, 
            {'header1':'               Energy','Header2':'8.87'}, 
            {'header1':'                    Energy Equipment & Services','Header2':'6.83'}, 
            {'header1':'                         Oil & Gas Equipment & Services','Header2':'6.83'}, 
            {'header1':'                              ENERFLEX LTD','Header2':'1.9'}, 
            {'header1':'                    Oil, Gas & Consumable Fuels','Header2':'8.9'}, 
            {'header1':'                         Integrated Oil & Gas','Header2':'6.86'}, 
            {'header1':'                              CENOVUS ENERGY INC','Header2':'12.97'}, 
            {'header1':'                              SUNCOR ENERGY INC','Header2':'55'}]
        df = pd.DataFrame(d)

预期输出

代码语言:javascript
复制
Column1|Column2|Column3|Colum4|Column5|Amount
      Energy|||||8.87
      Energy|               Energy||||8.87
      Energy|               Energy|                    Energy Equipment & Services|||6.83
      Energy|               Energy|                    Energy Equipment & Services|                         Oil & Gas Equipment & Services||6.83
      Energy|               Energy|                    Energy Equipment & Services|                         Oil & Gas Equipment & Services|                              ENERFLEX LTD|1.9
      Energy|               Energy|                    Oil, Gas & Consumable Fuels|||8.9
      Energy|               Energy|                    Oil, Gas & Consumable Fuels|                         Integrated Oil & Gas||6.86
      Energy|               Energy|                    Oil, Gas & Consumable Fuels|                         Integrated Oil & Gas|                              CENOVUS ENERGY INC|12.97
      Energy|               Energy|                    Oil, Gas & Consumable Fuels|                         Integrated Oil & Gas|                              SUNCOR ENERGY INC|55
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-08-10 16:09:05

您可以在选项卡上拆分header1中的值(在本例中为5个空格),以便以几乎正确的格式将其扩展为一个数据帧:str.split(" {5}", expand=True)。然后,您可以向前填充空值。在此之后,只需清理和重命名列。

代码语言:javascript
复制
new_df = (
    df["header1"]
    .str.split(" {5}", expand=True)
    .replace("", method="ffill")
    .replace("", np.nan)
    .dropna(how="all", axis=1)
    .fillna("")
)
new_df.columns = [f"Column{i + 1}" for i in range(new_df.shape[1])]
new_df["Amount"] = df["Header2"]
票数 1
EN

Stack Overflow用户

发布于 2020-08-10 15:37:42

看看这是否有帮助:

代码语言:javascript
复制
import numpy as np
import pandas as pd
d =[{'header1':'          Energy','Header2':'8.87'}, 
            {'header1':'               Energy','Header2':'8.87'}, 
            {'header1':'                    Energy Equipment & Services','Header2':'6.83'}, 
            {'header1':'                         Oil & Gas Equipment & Services','Header2':'6.83'}, 
            {'header1':'                              ENERFLEX LTD','Header2':'1.9'}, 
            {'header1':'                    Oil, Gas & Consumable Fuels','Header2':'8.9'}, 
            {'header1':'                         Integrated Oil & Gas','Header2':'6.86'}, 
            {'header1':'                              CENOVUS ENERGY INC','Header2':'12.97'}, 
            {'header1':'                              SUNCOR ENERGY INC','Header2':'55'}]

已将标头%1密钥提取到csv文件

代码语言:javascript
复制
with open ('listofD.csv', 'w') as f:
    content=""
    for dict in d:
        for key, value in dict.items():
            if key=='header1':
                text = value+'\n'
                content=content+text
                f.writelines(text)

基于缩进的解析层次结构

代码语言:javascript
复制
indentation = []
indentation.append(0)
depth = 0

f = open("listofD.csv", 'r')
contentz=[]
line_count=0
for line in f:
    print(line)
    line_count += 1

    line = line[:-1]

    contenty = line.strip()
    indent = len(line) - len(contenty)
    if indent > indentation[-1]  or line_count==1:
        depth += 1
        indentation.append(indent)

    elif indent < indentation[-1]:
        while indent < indentation[-1]:
            depth -= 1
            indentation.pop()

        if indent != indentation[-1]:
            raise RuntimeError("Bad formatting")

    contentz.append(("\t"*depth)+contenty)

将其放入数据帧中

代码语言:javascript
复制
df=pd.DataFrame([sub.split("\t") for sub in contentz])

填充了所有的“as NaNs”和“None as”

代码语言:javascript
复制
df=df.replace(r'^\s*$', np.nan, regex=True)
mask = df.applymap(lambda x: x is None)
cols = df.columns[(mask).any()]
for col in df[cols]:
    df.loc[mask[col], col] = ' '
df

删除了所有空列,并用以前的行值填充了空格

代码语言:javascript
复制
df=df.dropna(how='all', axis=1)
df=df.ffill(axis = 0) 

已将报头合并到数据帧

代码语言:javascript
复制
df1=pd.DataFrame(d)
df.merge(df1['Header2'], left_index=True, right_index=True)

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63333527

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档