首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何组合不同尺寸的数据.?

如何组合不同尺寸的数据.?
EN

Stack Overflow用户
提问于 2022-03-04 11:03:57
回答 1查看 49关注 0票数 0

我试图将一个项目列表合并成一个主数据,但我似乎不知道如何将它们合并在一起?我生成的框架大小不同,但大多数colum名称都是相同的,只有一两个.

所以基本上,我要列一个项目阶段的清单,就像.(有些项目将只有2或3个阶段,而其他项目将有8或9个阶段。)示例:

代码语言:javascript
复制
Stage 1 SUCCESS
stage 2 SUCCESS
stage 3 SUCCESS
stage 4 DELAYED
stage 5 PENDING

然后,我在python循环中生成如下所示的数据.

df

代码语言:javascript
复制
       project_name    Stage 1    Stage 2     
0      project 1       SUCCESS    DELAYED

df

代码语言:javascript
复制
       project_name    Stage 1    Stage 2    Stage 3    Stage 4   Stage 5 
0      project-2       NaN        NaN        NaN        NaN       NaN

df

代码语言:javascript
复制
       project_name    Stage 1    Stage 2    Stage 3    Stage 4   Stage 5   Stage 6    Stage 7   Stage 8
0      project-3       NaN        NaN        STARTED    ABANDONED NaN       NaN        NaN       
    NaN 

但是,我似乎想不出如何生成包含所有其他帧的主数据帧.

代码语言:javascript
复制
# items passed in from other function...
project_data = [('Stage 1','SUCCESS'),('Stage 2','DELAYED')]
project_name = 'project-x' 
project_headers = ['Stage 1','Stage 2','Stage 3','Stage 4','Stage 5','Stage 6']
project_displayname = ''

# Create the pandas DataFrame
try:
    df
except NameError:
    print("Well, 'df' WASN'T defined after all!")
    df = pd.DataFrame( columns = project_headers, index=['0'])
else:
    df = df.reindex(list(range(0, 1))).reset_index(drop=True)
    df['project_name'] = project_name
    df.loc[df.project_name == project_name, "project"] = project_displayname


combined_frame = pd.DataFrame(columns = ['project_name']) # empty frame with one colum for merge
for details in project_data:
    (item, item_status) = details
    if item not in df:
        df[item] = np.nan
    df.loc[df.project_name == project_name, item] = item_status
    print('')
    print('')
    print(df)  
    print('')
代码语言:javascript
复制
# Which gives us a generated dataframe.... like so... 
#project_name    Stage 1    Stage 2    Stage 3    Stage 4   Stage 5   Stage 6    Stage 7   Stage 8
#project-3       NaN        NaN        STARTED    ABANDONED NaN       NaN        NaN       NaN
代码语言:javascript
复制
    #final_frame = combined_frame.merge(df, how='left')
    try:
        final_frame = pd.merge(df, combined_frame, how='outer', left_index=True, right_on=combined_frame.iloc[: , -1])
    except IndexError:
        final_frame = df.reindex_axis(df.columns.union(combined_frame.columns), axis=1)

print(final_frame)

当我运行代码时,我得到了错误:空DataFrame

或者,我得到..。

代码语言:javascript
复制
Columns: [project, project_name, Stage 1, Stage 2, Stage 3, Stage 4, Stage 5, Stage 6, Stage 7, Stage 8, Stage 9]
Index: []

否则我会..。

代码语言:javascript
复制
Columns: [project, project_name, Stage 1, Stage 2, Stage 3, Stage 4, Stage 5, Stage 6, Stage 7, Stage 8, Stage 9, project_x, project_name_x, Stage 1_x, Stage 2_x, Stage 3_x, Stage 4_x]
Index: []

有人能指出我的错误吗?很明显我漏掉了什么?

我想试着得到这样的输出:

代码语言:javascript
复制
   project_name    Stage 1    Stage 2    Stage 3    Stage 4   Stage 5   Stage 6    Stage 7   Stage 8
0  project-1       STARTED    NaN        NaN        NaN       NaN       NaN        NaN       NaN
1  project-2       STARTED    STARTED    STARTED    DELAYED   NaN       NaN        NaN       NaN
2  project-3       NaN        NaN        STARTED    ABANDONED NaN       NaN        NaN       NaN
3  project-4       NaN        NaN        STARTED    ABANDONED NaN       STARTED    NaN       NaN
4  project-5       CANCELED   NaN        NaN        NaN       NaN       NaN        NaN       NaN
5  project-6       DELAYED    DELAYED    STARTED    ABANDONED NaN       NaN        STARTED    NaN 

提前谢谢你,

E

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-03-04 13:55:47

您可以轻松地从输入数据构建一个单独的框架:

代码语言:javascript
复制
# items passed in from other function...
project_data = [('Stage 1','SUCCESS'),('Stage 2','DELAYED')]
project_name = 'project-x' 
project_headers = ['Stage 1','Stage 2','Stage 3','Stage 4','Stage 5','Stage 6']
project_displayname = ''

df = pd.DataFrame([dict(project_data)], columns = ['project','project_name']
                  + project_headers)
df.loc[:, ['project', 'project_name']] = [[project_name, project_displayname]]

它会给df

代码语言:javascript
复制
     project project_name  Stage 1  Stage 2  Stage 3  Stage 4  Stage 5  Stage 6
0  project-x               SUCCESS  DELAYED      NaN      NaN      NaN      NaN

然后,您可以使用pd.concat连接所有单独的数据文件。唯一的限制是您必须预先知道所有列的名称(或者这里的最大阶段数.)

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71350499

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档