数据集看起来是这样的:
<link>, <type>例如,类型可以是“types”、“no追随者”和"javascript“。
考虑到每个链接可能多次出现在数据集中,我需要以以下方式获得结果
link, dofollow, nofollow, javascript
http://somelink.com, 10 (e.g. it appeared 10 times as dofollow), 0, 101发布于 2014-03-04 17:33:05
您可以按大小使用组:
In [11]: df = pd.DataFrame([['a_link', 'dofollow'], ['a_link', 'dofollow'], ['a_link', 'nofollow'], ['b_link', 'javascript']], columns=['link', 'type'])
In [12]: df
Out[12]:
link type
0 a_link dofollow
1 a_link dofollow
2 a_link nofollow
3 b_link javascript
In [13]: df.groupby(['link', 'type']).size()
Out[13]:
link type
a_link dofollow 2
nofollow 1
b_link javascript 1
dtype: int64现在,您打开第二级(类型)堆栈,使其成为一列并填充空白:
In [14]: df.groupby(['link', 'type']).size().unstack(1)
Out[14]:
type dofollow javascript nofollow
link
a_link 2 NaN 1
b_link NaN 1 NaN
In [15]: df.groupby(['link', 'type']).size().unstack(1).fillna(0)
Out[15]:
type dofollow javascript nofollow
link
a_link 2 0 1
b_link 0 1 0https://stackoverflow.com/questions/22177533
复制相似问题