我最初是从以下数据框架开始的:
数据集与用户回答多个问题相关,这些问题具有多个答案选择,并且用户有能力回答多个答案。
movie_id, user_id, rated_value, question_id, answer_id, genre, user_gender, user_ethnicity
101, 345, 3.5, 1, 1, comedy, male, white
101, 345, 3.5, 1, 2, comedy, male, white
101, 345, 3.5, 2, 1, comedy, male, white
125, 345, 4.5, 1, 4, drama, male, white
101, 233, 4.0, 1, 3, comedy, female, black
101, 233, 4.0, 2, 2, comedy, female, black
125, 233, 3.0, 1, 1, drama, female, black
125, 233, 3.0, 2, 2, drama, female, black
125, 333, 3.0, 1, 1, comedy, male, asian
125, 333, 3.0, 2, 2, comedy, male, asian 我想用枢轴把这张桌子压平。我可以在不引入genre, user_gender, user_ethnicity的情况下成功实现,如下所示:
pivoted_df = df_to_pivot.assign(val=1).pivot_table(
index=['movie_id',
'user_id',
'rated_value'],
columns=['question_id',
'answer_id'],
values=['question_id', 'answer_id'],
fill_value=0)然后将问题和答案id组合在一起,这样列将反映为1_1, 1_2
pivoted_df.columns = pivoted_df.columns.droplevel()
pivoted_df.columns = ['{}_{}'.format(l1, l2).strip() for l1, l2 in pivoted_df.columns.values]
pivoted_df = pivoted_df.reset_index()movie_id user_id rating_value 1_1 1_2 1_3 1_4...
但是当尝试添加genre, user_gender, user_ethnicity时
pivoted_df = df_to_pivot.assign(val=1).pivot_table(
index=['movie_id',
'user_id',
'rated_value'],
columns=['question_id',
'answer_id', 'genre', 'user_gender','user_ethnicity'],
values=['question_id', 'answer_id', 'genre', 'user_gender','user_ethnicity'],
fill_value=0)它并不是真的起作用。
我的目标是像其他列一样旋转genre, user_gender, user_ethnicity,这样列就是movie_id user_id rated_value 1_1 1_2 1_3 1_4...comedy, drama...,male, female, black, white, asian的。
output:
movie_id, user_id, rated_value , 1_1, 1_2, 1_3, 1_4, comedy, drama, male, female, white, black, asian
101, 345, 3.5, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0目标是获得每行的movie_id和user_id对,其他所有内容都反映为1和0。
发布于 2018-06-29 04:03:19
将question_id和answer_id合并为一列,然后使用pd.get_dummies
df['QandA'] = df['question_id'].astype(str) + '_' + df['answer_id'].astype(str)
pd.get_dummies(df, columns=['QandA','genre','user_gender','user_ethnicity'])输出:
movie_id user_id rated_value question_id answer_id QandA_1_1 QandA_1_2 QandA_1_3 QandA_1_4 QandA_2_1 QandA_2_2 genre_comedy genre_drama user_gender_female \
0 101 345 3.5 1 1 1 0 0 0 0 0 1 0 0
1 101 345 3.5 1 2 0 1 0 0 0 0 1 0 0
2 101 345 3.5 2 1 0 0 0 0 1 0 1 0 0
3 125 345 4.5 1 4 0 0 0 1 0 0 0 1 0
4 101 233 4.0 1 3 0 0 1 0 0 0 1 0 1
5 101 233 4.0 2 2 0 0 0 0 0 1 1 0 1
6 125 233 3.0 1 1 1 0 0 0 0 0 0 1 1
7 125 233 3.0 2 2 0 0 0 0 0 1 0 1 1
8 125 333 3.0 1 1 1 0 0 0 0 0 1 0 0
9 125 333 3.0 2 2 0 0 0 0 0 1 1 0 0
user_gender_male user_ethnicity_asian user_ethnicity_black user_ethnicity_white
0 1 0 0 1
1 1 0 0 1
2 1 0 0 1
3 1 0 0 1
4 0 0 1 0
5 0 0 1 0
6 0 0 1 0
7 0 0 1 0
8 1 1 0 0
9 1 1 0 0 我觉得你需要pd.get_dummies
pd.get_dummies(df, columns=['genre','user_gender','user_ethnicity'])输出:
movie_id user_id rated_value question_id answer_id genre_comedy genre_drama user_gender_female user_gender_male user_ethnicity_asian user_ethnicity_black \
0 101 345 3.5 1 1 1 0 0 1 0 0
1 101 345 3.5 1 2 1 0 0 1 0 0
2 101 345 3.5 2 1 1 0 0 1 0 0
3 125 345 4.5 1 4 0 1 0 1 0 0
4 101 233 4.0 1 3 1 0 1 0 0 1
5 101 233 4.0 2 2 1 0 1 0 0 1
6 125 233 3.0 1 1 0 1 1 0 0 1
7 125 233 3.0 2 2 0 1 1 0 0 1
8 125 333 3.0 1 1 1 0 0 1 1 0
9 125 333 3.0 2 2 1 0 0 1 1 0
user_ethnicity_white
0 1
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 0
9 0 https://stackoverflow.com/questions/51090321
复制相似问题