我有2个dataframes,df和df1,它们都有类似的文件路径。
df = pd.DataFrame({"X1": ['f','f','o','o','b','b'],
"X2": ['fb/FOO1/bar0.wav','fb/FOO1/bar1.wav','fb/FOO2/bar2.wav','fb/FOO2/bar3.wav','fb/FOO3/bar4.wav','fb/FOO3/bar5.wav']})
X1 X2
0 f fb/FOO1/bar0.wav
1 f fb/FOO1/bar1.wav
2 o fb/FOO2/bar2.wav
3 o fb/FOO2/bar3.wav
4 b fb/FOO3/bar4.wav
5 b fb/FOO3/bar5.wav还有另一个数据,
df1 = pd.DataFrame({"X1": ['b','o','b','f','o','f'],
"X2": ['fb1/FOO3/bar5.opus','fb1/FOO2/bar2.opus','fb1/FOO3/bar4.opus','fb1/FOO1/bar1.opus','fb1/FOO2/bar3.opus','fb1/FOO1/bar0.opus']})
X1 X2
0 b fb1/FOO3/bar5.opus
1 o fb1/FOO2/bar2.opus
2 b fb1/FOO3/bar4.opus
3 f fb1/FOO1/bar1.opus
4 o fb1/FOO2/bar3.opus
5 f fb1/FOO1/bar0.opus现在,我想根据第一个dataframe的文件排序第二个dataframe 1的X2列(filepath)。这样,输出就应该像这样
X1 X2
0 f fb1/FOO1/bar0.opus
1 f fb1/FOO1/bar1.opus
2 o fb1/FOO2/bar2.opus
3 o fb1/FOO2/bar3.opus
4 b fb1/FOO3/bar4.opus
5 b fb1/FOO3/bar5.opus发布于 2020-10-07 16:47:13
您可以创建一个排序器字典,它允许您使用自定义键对值进行排序:
#the following is creating a key with the name part of the filepath (could have been done with regex)
sorter_dict = dict(zip(df.X2.apply(lambda x : x.split('/')[-1].split('.')[0]),df.index))
#{'bar0': 0, 'bar1': 1, 'bar2': 2, 'bar3': 3, 'bar4': 4, 'bar5': 5}
#on df1, let's create a temp col with the name part of the filepath
df1['temp'] = df1.X2.apply(lambda x : x.split('/')[-1].split('.')[0])
#and apply our sorter dict
df1['sorter'] = df1.temp.map(sorter_dict)
#at the end, simply sort
df1 = df1.sort_values('sorter')
#and delete unecessary cols
del df1['temp'], df1['sorter']输出
| X1 | X2 |
|:-----|:-------------------|
| f | fb1/FOO1/bar0.opus |
| f | fb1/FOO1/bar1.opus |
| o | fb1/FOO2/bar2.opus |
| o | fb1/FOO2/bar3.opus |
| b | fb1/FOO3/bar4.opus |
| b | fb1/FOO3/bar5.opus |发布于 2020-10-07 16:55:03
如果文件路径名在dataframes中是一致的长度,则可以执行此操作。只需创建一个新列,其中包含要按该列排序的部分,然后删除新列:
df['X3'] = df['X2'].astype(str).str[3:-4]
df1['X3'] = df1['X2'].astype(str).str[4:-5]
df1 = df1.set_index('X3')
df1 = df1.reindex(index=df['X3'])
df1 = df1.reset_index()
df1 = df1.drop('X3', axis = 1)
df = df.drop('X3', axis = 1)
df1https://stackoverflow.com/questions/64248350
复制相似问题