我试着在一个情节中画出每种电影类型的两种性别的平均评分。
我的dataset看起来像这样:
item_id title release_date video_release_date \
0 1 Toy Story (1995) 01-Jan-1995 NaN
1 4 Get Shorty (1995) 01-Jan-1995 NaN
... ... ... ... ...
99995 748 Saint, The (1997) 14-Mar-1997 NaN
99996 751 Tomorrow Never Dies (1997) 01-Jan-1997 NaN
imdb_url unknown Action \
0 http://us.imdb.com/M/title-exact?Toy%20Story%2... 0 0
1 http://us.imdb.com/M/title-exact?Get%20Shorty%... 0 1
... ... ... ...
99995 http://us.imdb.com/M/title-exact?Saint%2C%20Th... 0 1
99996 http://us.imdb.com/M/title-exact?imdb-title-12... 0 1
Adventure Animation Childrens ... War Western user_id rating \
0 0 1 1 ... 0 0 308 4
1 0 0 0 ... 0 0 308 5 代码:
labels = ['Action', 'Adventure' , 'Animation' , 'Childrens' , 'Comedy' , 'Crime' , 'Documentary' , 'Drama' , 'Fantasy' , 'Film-Noir' , 'Horror' , 'Musical' , 'Mystery' , 'Romance' , 'Sci-Fi' , 'Thriller' , 'War' , 'Western']
male_values = all_male_users.iloc[:, 6:26]
female_values = all_female_users.iloc[:, 6:26]
x = np.arange(len(labels)) # the label locations
width = 0.35 # the width of the bars
fig, ax = plt.subplots(figsize=(15,7))
rects1 = ax.bar(x - width/2, male_values.rating.mean(), width, label='Male')
rects2 = ax.bar(x + width/2, female_values.rating.mean(), width, label='Female')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Scores')
ax.set_title('Most preferred movie genres', fontsize=14)
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
fig.tight_layout()
plt.show()到目前为止,它绘制了每种性别的总体平均得分,但不是每种电影类型的平均得分。

发布于 2020-06-17 19:51:55
为了重现您的示例,我需要创建一个具有随机值(男性和女性为1000)的样本数据帧:
import numpy as np
import matplotlib.pyplot as plt
# create sample data
cols = ['Action', 'Adventure' , 'Animation' , 'Childrens' , 'Comedy' , 'Crime' , 'Documentary' , 'Drama' , 'Fantasy' , 'Film-Noir' , 'Horror' , 'Musical' , 'Mystery' , 'Romance' , 'Sci-Fi' , 'Thriller' , 'War' , 'Western', 'rating']
male_values = pd.DataFrame(columns = cols)
female_values = pd.DataFrame(columns = cols)
# define parameters for randomly recreated the dataframe
arr_dummy_genre = np.zeros(18, dtype = int)
arr_dummy_genre[0] = 1
range_rating = range(1,6)
# generate 1,000 random values
for i in range(1000):
random_rating = float(np.random.choice(range_rating))
random_genre = np.random.permutation(arr_dummy_genre)
random_row = np.append(random_genre, random_rating)
random_row
male_values.loc[len(male_values)] = random_row
random_rating = float(np.random.choice(range_rating))
random_genre = np.random.permutation(arr_dummy_genre)
random_row = np.append(random_genre, random_rating)
random_row
female_values.loc[len(female_values)] = random_row在这一点上,女性和男性数据帧仅包含1000个流派和收视率的观察。您的数据是另一种形式,但这对于本例来说不是问题。
接下来的步骤准备了数据,以便以您想要的方式呈现,取消表示类型的虚拟变量,并按类型分组:
# reconstruct the dummified genre of the movie
female_values['genre'] = pd.Series(female_values[labels].columns[np.where(female_values[labels]!=0)[1]])
male_values['genre'] = pd.Series(male_values[labels].columns[np.where(male_values[labels]!=0)[1]])
# group by genre
gr_male_values = male_values.groupby('genre')['rating'].mean()
gr_female_values = female_values.groupby('genre')['rating'].mean()现在,使用相同的代码,只需对分组的数据进行更改,就可以按您想要的方式进行绘图:
labels = ['Action', 'Adventure' , 'Animation' , 'Childrens' , 'Comedy' , 'Crime' , 'Documentary' , 'Drama' , 'Fantasy' , 'Film-Noir' , 'Horror' , 'Musical' , 'Mystery' , 'Romance' , 'Sci-Fi' , 'Thriller' , 'War' , 'Western']
x = np.arange(len(labels)) # the label locations
width = 0.35 # the width of the bars
fig, ax = plt.subplots(figsize=(15,7))
rects1 = ax.bar(x - width/2, gr_male_values, width, label='Male')
rects2 = ax.bar(x + width/2, gr_female_values, width, label='Female')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Scores')
ax.set_title('Most preferred movie genres', fontsize=14)
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
fig.tight_layout()
plt.show()生成以下图,在我的例子中是完全随机的:

https://stackoverflow.com/questions/62427211
复制相似问题