这是我的数据:
import pandas as pd
data = {'key_1':['drzerocraic', 'revealed', 'telegraph', 'landg', 'telegraph', 'subyroy'],
'key_2':['cilliandegascun', 'dailymailceleb', 'andrew', 'nhwunlocked', 'andrew', 'coronavirus'],
'key_3':['langan', 'york', 'harper', 'newhomesweek', 'harper', 'video'],
'key_4':['drbosheagp', 'attorney', 'case', 'helptobuy', 'workplace', 'breakout'],
'date':['7/21/2020 18:20', '7/27/2020 7:10', '7/27/2020 15:32', '7/23/2020 2:47', '7/21/2020 12:01',
'7/20/2020 8:26'],
}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df) 有没有一种方法,我可以操纵它,使每个关键字是它自己的行与准确的日期?因此,最后的dataframe将有两个列,关键字和日期。如下所示:
data_final = {'key':['drzerocraic', 'revealed', 'telegraph', 'landg', 'telegraph', 'subyroy',
'cilliandegascun', 'dailymailceleb', 'andrew', 'nhwunlocked', 'andrew', 'coronavirus',
'langan', 'york', 'harper', 'newhomesweek', 'harper', 'video',
'drbosheagp', 'attorney', 'case', 'helptobuy', 'workplace', 'breakout'],
'date':['7/21/2020 18:20', '7/27/2020 7:10', '7/27/2020 15:32', '7/23/2020 2:47', '7/21/2020 12:01', '7/20/2020 8:26',
'7/21/2020 18:20', '7/27/2020 7:10', '7/27/2020 15:32', '7/23/2020 2:47', '7/21/2020 12:01', '7/20/2020 8:26',
'7/21/2020 18:20', '7/27/2020 7:10', '7/27/2020 15:32', '7/23/2020 2:47', '7/21/2020 12:01', '7/20/2020 8:26',
'7/21/2020 18:20', '7/27/2020 7:10', '7/27/2020 15:32', '7/23/2020 2:47', '7/21/2020 12:01','7/20/2020 8:26'
]}
# Create DataFrame
df_final = pd.DataFrame(data_final)
# Print the output.
print(df_final)发布于 2020-08-05 18:17:37
用户pandas.melt,
import pandas as pd
(
pd.melt(df, id_vars=['date'], value_vars=df.columns[:-1],
value_name='key').drop(columns='variable')
) date key
0 7/21/2020 18:20 drzerocraic
1 7/27/2020 7:10 revealed
2 7/27/2020 15:32 telegraph
3 7/23/2020 2:47 landg
4 7/21/2020 12:01 telegraph
5 7/20/2020 8:26 subyroy
6 7/21/2020 18:20 cilliandegascun
7 7/27/2020 7:10 dailymailceleb
8 7/27/2020 15:32 andrew
9 7/23/2020 2:47 nhwunlocked
10 7/21/2020 12:01 andrew
11 7/20/2020 8:26 coronavirus
12 7/21/2020 18:20 langan
13 7/27/2020 7:10 york
14 7/27/2020 15:32 harper
15 7/23/2020 2:47 newhomesweek
16 7/21/2020 12:01 harper
17 7/20/2020 8:26 video
18 7/21/2020 18:20 drbosheagp
19 7/27/2020 7:10 attorney
20 7/27/2020 15:32 case
21 7/23/2020 2:47 helptobuy
22 7/21/2020 12:01 workplace
23 7/20/2020 8:26 breakout发布于 2020-08-05 18:44:13
基本上你想要的是stack。尝试以下几点
import pandas as pd
df_final = df.set_index('date').stack().reset_index()
# Delete column that contains the names of the previous colums
del df_final['level_1']
df_final.columns = ['date', 'key']
print(df_final)
date key
0 7/21/2020 18:20 drzerocraic
1 7/21/2020 18:20 cilliandegascun
2 7/21/2020 18:20 langan
3 7/21/2020 18:20 drbosheagp
4 7/27/2020 7:10 revealed
5 7/27/2020 7:10 dailymailceleb
6 7/27/2020 7:10 york
7 7/27/2020 7:10 attorney
8 7/27/2020 15:32 telegraph
9 7/27/2020 15:32 andrew
10 7/27/2020 15:32 harper
11 7/27/2020 15:32 case
12 7/23/2020 2:47 landg
13 7/23/2020 2:47 nhwunlocked
14 7/23/2020 2:47 newhomesweek
15 7/23/2020 2:47 helptobuy
16 7/21/2020 12:01 telegraph
17 7/21/2020 12:01 andrew
18 7/21/2020 12:01 harper
19 7/21/2020 12:01 workplace
20 7/20/2020 8:26 subyroy
21 7/20/2020 8:26 coronavirus
22 7/20/2020 8:26 video
23 7/20/2020 8:26 breakout现在,您可以看到的主要区别是,将索引设置为'date‘也会分组相同的日期。
https://stackoverflow.com/questions/63271141
复制相似问题