文章/答案/技术大牛

发布

社区首页 >问答首页 >大熊猫制作DataFrame的列表大小不同

问大熊猫制作DataFrame的列表大小不同
EN

Stack Overflow用户

提问于 2017-03-31 06:47:19

回答 2查看 38关注 0票数 2

我有这样的数据

genre_list
Out[7]: 
0                    [Action, Adventure, Fantasy, Sci-Fi]
1                            [Action, Adventure, Fantasy]
2                           [Action, Adventure, Thriller]
3                                      [Action, Thriller]
4                                           [Documentary]
5                             [Action, Adventure, Sci-Fi]
6                            [Action, Adventure, Romance]
7       [Adventure, Animation, Comedy, Family, Fantasy...
8                             [Action, Adventure, Sci-Fi]
9                   [Adventure, Family, Fantasy, Mystery]
10                            [Action, Adventure, Sci-Fi]
11                            [Action, Adventure, Sci-Fi]

我编写代码使Dataframe具有不同的列表大小。

genre_df = pd.DataFrame()
for i in range(len(genre_list)):
    genre_df = genre_df.append(pd.DataFrame(genre_list[i]).T)

听好了

genre_df.head()
Out[9]: 
             0          1         2       3    4    5    6    7
0       Action  Adventure   Fantasy  Sci-Fi  NaN  NaN  NaN  NaN
0       Action  Adventure   Fantasy     NaN  NaN  NaN  NaN  NaN
0       Action  Adventure  Thriller     NaN  NaN  NaN  NaN  NaN
0       Action   Thriller       NaN     NaN  NaN  NaN  NaN  NaN
0  Documentary        NaN       NaN     NaN  NaN  NaN  NaN  NaN

有什么简单的方法可以得到数据吗？

python

list

pandas

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-03-31 06:50:13

您可以使用DataFrame构造函数，将genre_list的值转换为numpy array，然后通过values转换到list。

df1 = pd.DataFrame(genre_list.values.tolist(), index=genre_list.index)
print (df1)

              0          1         2        3        4
0        Action  Adventure   Fantasy   Sci-Fi     None
1        Action  Adventure   Fantasy     None     None
2        Action  Adventure  Thriller     None     None
3        Action   Thriller      None     None     None
4   Documentary       None      None     None     None
5        Action  Adventure    Sci-Fi     None     None
6        Action  Adventure   Romance     None     None
7     Adventure  Animation    Comedy   Family  Fantasy
8        Action  Adventure    Sci-Fi     None     None
9     Adventure     Family   Fantasy  Mystery     None
10       Action  Adventure    Sci-Fi     None     None
11       Action  Adventure    Sci-Fi     None     None

如果需要，请将None替换为NaN

df1 = pd.DataFrame(genre_list.values.tolist(), index=genre_list.index).replace({None:np.nan})
print (df1)
              0          1         2        3        4
0        Action  Adventure   Fantasy   Sci-Fi      NaN
1        Action  Adventure   Fantasy      NaN      NaN
2        Action  Adventure  Thriller      NaN      NaN
3        Action   Thriller       NaN      NaN      NaN
4   Documentary        NaN       NaN      NaN      NaN
5        Action  Adventure    Sci-Fi      NaN      NaN
6        Action  Adventure   Romance      NaN      NaN
7     Adventure  Animation    Comedy   Family  Fantasy
8        Action  Adventure    Sci-Fi      NaN      NaN
9     Adventure     Family   Fantasy  Mystery      NaN
10       Action  Adventure    Sci-Fi      NaN      NaN
11       Action  Adventure    Sci-Fi      NaN      NaN

另一个更慢的解决方案是apply Series。

df1 = genre_list.apply(pd.Series)
              0          1         2        3        4
0        Action  Adventure   Fantasy   Sci-Fi      NaN
1        Action  Adventure   Fantasy      NaN      NaN
2        Action  Adventure  Thriller      NaN      NaN
3        Action   Thriller       NaN      NaN      NaN
4   Documentary        NaN       NaN      NaN      NaN
5        Action  Adventure    Sci-Fi      NaN      NaN
6        Action  Adventure   Romance      NaN      NaN
7     Adventure  Animation    Comedy   Family  Fantasy
8        Action  Adventure    Sci-Fi      NaN      NaN
9     Adventure     Family   Fantasy  Mystery      NaN
10       Action  Adventure    Sci-Fi      NaN      NaN
11       Action  Adventure    Sci-Fi      NaN      NaN

时间

#[12000 rows]
genre_list = pd.concat([genre_list]*1000).reset_index(drop=True)

In [115]: %timeit pd.DataFrame(genre_list.values.tolist(), index=genre_list.index).replace({None:np.nan})
100 loops, best of 3: 15.7 ms per loop

In [116]: %timeit df1 = genre_list.apply(pd.Series)
1 loop, best of 3: 1.96 s per loop

票数 1

Stack Overflow用户

发布于 2017-03-31 08:28:28

一种numpy方法

lol = s.values.tolist()

lens = [len(l) for l in lol]

i = np.arange(len(lens)).repeat(lens)
j = np.concatenate([np.arange(l) for l in lens])
v = np.concatenate(lol)

pd.Series(v, [i, j]).unstack()

              0          1         2        3        4
0        Action  Adventure   Fantasy   Sci-Fi     None
1        Action  Adventure   Fantasy     None     None
2        Action  Adventure  Thriller     None     None
3        Action   Thriller      None     None     None
4   Documentary       None      None     None     None
5        Action  Adventure    Sci-Fi     None     None
6        Action  Adventure   Romance     None     None
7     Adventure  Animation    Comedy   Family  Fantasy
8        Action  Adventure    Sci-Fi     None     None
9     Adventure     Family   Fantasy  Mystery     None
10       Action  Adventure    Sci-Fi     None     None
11       Action  Adventure    Sci-Fi     None     None

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43134198

复制

相似问题

问大熊猫制作DataFrame的列表大小不同
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问大熊猫制作DataFrame的列表大小不同EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问大熊猫制作DataFrame的列表大小不同
EN