这是我自学Python三天后的第一个问题,所以请多多关照。
我已经连接了四个数据帧:
frames = [dfLocationID, dfDimensions, dfCategories, dfTags]
result = pd.concat(frames,
ignore_index=True,
sort=False)要获得以下信息:
Location ID Dimensions Categories Tags
0 1000.0 NaN NaN NaN
1 NaN 3,000 sq ft NaN NaN
2 NaN NaN * In the Zone NaN
3 NaN NaN Apartment NaN
4 NaN NaN Loft NaN
5 NaN NaN NaN Bohemian
6 NaN NaN NaN Colorful
7 NaN NaN NaN Eclectic Quirky
8 NaN NaN NaN Kitchen
9 NaN NaN NaN Living Room
10 NaN NaN NaN Piano
11 NaN NaN NaN Wood Floor我想要做到这一点:
Location ID Dimensions Item Data
0 1000.0 3,000 sq ft Categories * In the Zone
1 1000.0 3,000 sq ft Categories Apartment
2 1000.0 3,000 sq ft Categories Loft
3 1000.0 3,000 sq ft Tags Bohemian
4 1000.0 3,000 sq ft Tags Colorful
5 1000.0 3,000 sq ft Tags Eclectic Quirky
6 1000.0 3,000 sq ft Tags Kitchen
7 1000.0 3,000 sq ft Tags Living Room
8 1000.0 3,000 sq ft Tags Piano
9 1000.0 3,000 sq ft Tags Wood Floor然后我试了一下:
dfTemp = ((dfLocationID.join(dfDimensions, how='outer')).join(dfCategories, how='outer')).join(dfTags, how='outer')要获得以下信息:
Location ID Dimensions Categories Tags
0 1000.0 3,000 sq ft * In the Zone Bohemian
1 NaN NaN Apartment Colorful
2 NaN NaN Loft Eclectic Quirky
3 NaN NaN NaN Kitchen
4 NaN NaN NaN Living Room
5 NaN NaN NaN Piano
6 NaN NaN NaN Wood Floor现在,我尝试将最后两列转换为行:
dfFinal = dfTemp.melt(id_vars=["Location ID", "Dimensions"],
var_name="Item",
value_name="Data")但我得到的是:
Location ID Dimensions Item Data
0 1000.0 3,000 sq ft Categories * In the Zone
1 NaN NaN Categories Apartment
2 NaN NaN Categories Loft
3 NaN NaN Categories NaN
4 NaN NaN Categories NaN
5 NaN NaN Categories NaN
6 NaN NaN Categories NaN
7 1000.0 3,000 sq ft Tags Bohemian
8 NaN NaN Tags Colorful
9 NaN NaN Tags Eclectic Quirky
10 NaN NaN Tags Kitchen
11 NaN NaN Tags Living Room
12 NaN NaN Tags Piano
13 NaN NaN Tags Wood Floor对如何清理数据有什么建议吗?此外,我将不得不迭代不同的位置in,类别和标签中的值的数量将不会是恒定的。
谢谢。
发布于 2019-09-28 06:31:03
首先,我会将NaN转换为None,因为它们更容易处理:
df = df.where((pd.notnull(df)), None)然后,您希望整个第一列和第二列具有相同的值(我不知道您在哪里做出这样的假设):
df['Location ID'] = df['Location ID'].iloc[0]
df['Dimensions'] = df['Dimensions'].iloc[1]然后你就可以按原样运行你的熔化函数了。现在,您只需过滤掉"Item“或”None“列中的所有行:
df = df[~(df["Item"].isnull() | df["Data"].isnull())]然后,输出是您想要的:
Location ID Dimensions Item Data
2 1000.0 3000 sq ft Categories * In the Zone
3 1000.0 3000 sq ft Categories Apartment
4 1000.0 3000 sq ft Categories Loft
17 1000.0 3000 sq ft Tags Bohemian
18 1000.0 3000 sq ft Tags Colorful
19 1000.0 3000 sq ft Tags Eclectic Quirky
20 1000.0 3000 sq ft Tags Kitchen
21 1000.0 3000 sq ft Tags Living Room
22 1000.0 3000 sq ft Tags Piano
23 1000.0 3000 sq ft Tags Wood Floor如果需要对不同的位置执行此操作,请将此过程打包到函数transform中并使用groupby
df_new = pd.DataFrame(columns = df.columns)
for name, group in df.groupby(['Location ID', 'Dimensions']):
df_group = transform(group)
pd.concat([df_new, df_group], axis=0)https://stackoverflow.com/questions/58141356
复制相似问题