首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >每个类型的Movielens评分分布

每个类型的Movielens评分分布
EN

Stack Overflow用户
提问于 2020-06-17 18:48:02
回答 1查看 206关注 0票数 0

我试着在一个情节中画出每种电影类型的两种性别的平均评分。

我的dataset看起来像这样:

代码语言:javascript
复制
      item_id                       title release_date  video_release_date  \
0            1            Toy Story (1995)  01-Jan-1995                 NaN   
1            4           Get Shorty (1995)  01-Jan-1995                 NaN   

...        ...                         ...          ...                 ...   
99995      748           Saint, The (1997)  14-Mar-1997                 NaN   
99996      751  Tomorrow Never Dies (1997)  01-Jan-1997                 NaN   

                                                imdb_url  unknown  Action  \
0      http://us.imdb.com/M/title-exact?Toy%20Story%2...        0       0   
1      http://us.imdb.com/M/title-exact?Get%20Shorty%...        0       1   

...                                                  ...      ...     ...   
99995  http://us.imdb.com/M/title-exact?Saint%2C%20Th...        0       1   
99996  http://us.imdb.com/M/title-exact?imdb-title-12...        0       1   

       Adventure  Animation  Childrens  ...  War  Western  user_id  rating  \
0              0          1          1  ...    0        0      308       4   
1              0          0          0  ...    0        0      308       5   

代码:

代码语言:javascript
复制
labels = ['Action', 'Adventure' , 'Animation' , 'Childrens' , 'Comedy' , 'Crime' , 'Documentary' , 'Drama' , 'Fantasy' , 'Film-Noir' , 'Horror' , 'Musical' , 'Mystery' , 'Romance' , 'Sci-Fi' , 'Thriller' , 'War' , 'Western']
male_values = all_male_users.iloc[:, 6:26]
female_values = all_female_users.iloc[:, 6:26]

x = np.arange(len(labels))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots(figsize=(15,7))
rects1 = ax.bar(x - width/2, male_values.rating.mean(), width, label='Male')
rects2 = ax.bar(x + width/2, female_values.rating.mean(), width, label='Female')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Scores')
ax.set_title('Most preferred movie genres', fontsize=14)
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

fig.tight_layout()
plt.show()

到目前为止,它绘制了每种性别的总体平均得分,但不是每种电影类型的平均得分。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-06-17 19:51:55

为了重现您的示例,我需要创建一个具有随机值(男性和女性为1000)的样本数据帧:

代码语言:javascript
复制
import numpy as np
import matplotlib.pyplot as plt

# create sample data
cols = ['Action', 'Adventure' , 'Animation' , 'Childrens' , 'Comedy' , 'Crime' , 'Documentary' , 'Drama' , 'Fantasy' , 'Film-Noir' , 'Horror' , 'Musical' , 'Mystery' , 'Romance' , 'Sci-Fi' , 'Thriller' , 'War' , 'Western', 'rating']
male_values = pd.DataFrame(columns = cols)
female_values = pd.DataFrame(columns = cols)

# define parameters for randomly recreated the dataframe
arr_dummy_genre = np.zeros(18, dtype = int)
arr_dummy_genre[0] = 1
range_rating = range(1,6)

# generate 1,000 random values
for i in range(1000):
    random_rating = float(np.random.choice(range_rating))
    random_genre = np.random.permutation(arr_dummy_genre)
    random_row = np.append(random_genre, random_rating)
    random_row
    male_values.loc[len(male_values)] = random_row

    random_rating = float(np.random.choice(range_rating))
    random_genre = np.random.permutation(arr_dummy_genre)
    random_row = np.append(random_genre, random_rating)
    random_row
    female_values.loc[len(female_values)] = random_row

在这一点上,女性和男性数据帧仅包含1000个流派和收视率的观察。您的数据是另一种形式,但这对于本例来说不是问题。

接下来的步骤准备了数据,以便以您想要的方式呈现,取消表示类型的虚拟变量,并按类型分组:

代码语言:javascript
复制
    # reconstruct the dummified genre of the movie
    female_values['genre'] = pd.Series(female_values[labels].columns[np.where(female_values[labels]!=0)[1]])
    male_values['genre'] = pd.Series(male_values[labels].columns[np.where(male_values[labels]!=0)[1]])

    # group by genre
    gr_male_values = male_values.groupby('genre')['rating'].mean()
    gr_female_values = female_values.groupby('genre')['rating'].mean()

现在,使用相同的代码,只需对分组的数据进行更改,就可以按您想要的方式进行绘图:

代码语言:javascript
复制
labels = ['Action', 'Adventure' , 'Animation' , 'Childrens' , 'Comedy' , 'Crime' , 'Documentary' , 'Drama' , 'Fantasy' , 'Film-Noir' , 'Horror' , 'Musical' , 'Mystery' , 'Romance' , 'Sci-Fi' , 'Thriller' , 'War' , 'Western']

x = np.arange(len(labels))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots(figsize=(15,7))
rects1 = ax.bar(x - width/2, gr_male_values, width, label='Male')
rects2 = ax.bar(x + width/2, gr_female_values, width, label='Female')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Scores')
ax.set_title('Most preferred movie genres', fontsize=14)
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

fig.tight_layout()
plt.show()

生成以下图,在我的例子中是完全随机的:

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62427211

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档