a b c d
0 Apple Apple Apple Banana
1 Apple Apple Banana Apple
2 Apple Apple Banana Banana
3 Apple Banana Banana Banana
4 Apple Banana Banana Pear
5 Apple Banana Pear Apple
6 Apple Pear Apple Apple
7 Apple Pear Banana Apple
8 Apple Pear Banana Banana
9 Banana Banana Pear Banana
10 Banana Banana Pear Pear
11 Banana Pear Banana Apple
12 Banana Pear Banana Banana
13 Pear Apple Banana Banana
14 Pear Banana Banana Apple
15 Pear Banana Pear Pear
16 Pear Pear Apple Pear
17 Pear Pear Banana Apple
18 Pear Pear Banana Banana你好,
我有以下数据帧df。我想创建一个新的数据帧,其中包含具有两倍于"Apple“值的行的组,或具有两倍于”Banana“值的行的另一组,或者一行中只包含一次香蕉的组等。我想创建有限数量的组,假设只有6个具有这些不同组合的组。我想使用group.by,但我不确定如何在提取值时使用它。有什么帮助吗?谢谢!
发布于 2021-04-27 11:13:19
您可以首先创建一个系列,其中包含行中特定值的值计数。然后根据您想要的值计数过滤该系列。
下面是一个创建新数据帧的示例,其中包含的行的值是"Apple“的两倍。
apple_count = df.apply(lambda row: row.isin(['Apple']).sum(axis=0), axis=1)
df_ = df[apple_count.isin([2, 4])]print(df_)
a b c d
2 Apple Apple Banana Banana
5 Apple Banana Pear Apple
7 Apple Pear Banana Apple发布于 2021-04-27 15:37:34

理想情况下,我喜欢这样的东西,对不起,它应该是组1,2,3等,但由于我的初始df在行中相当长,我想要有限数量的组,可能10个组之后绘制。组1将包含变量成对出现的所有行,组2可以给出变量出现3次的所有行,组4可以包含所有行,其中同一行的每列中只出现一次变量,等等。我希望在最后绘制这些组的频率图。
发布于 2021-04-27 16:42:12
创建群组
def count_words(values, word):
return sum([value == word for value in values])
apple_groups = df.groupby(by=lambda index: count_words(df.loc[index], 'Apple'))
for word_count, group in apple_groups:
print(group) a b c d
9 Banana Banana Pear Banana
10 Banana Banana Pear Pear
12 Banana Pear Banana Banana
15 Pear Banana Pear Pear
18 Pear Pear Banana Banana
a b c d
3 Apple Banana Banana Banana
4 Apple Banana Banana Pear
8 Apple Pear Banana Banana
11 Banana Pear Banana Apple
13 Pear Apple Banana Banana
14 Pear Banana Banana Apple
16 Pear Pear Apple Pear
17 Pear Pear Banana Apple
a b c d
2 Apple Apple Banana Banana
5 Apple Banana Pear Apple
7 Apple Pear Banana Apple
a b c d
0 Apple Apple Apple Banana
1 Apple Apple Banana Apple
6 Apple Pear Apple Apple分组计数
def count_rows_with_exactly_n_words(df, word, n):
return df.apply(lambda row: count_words(row.values, word) == n, axis=1).sum()
def count_word_groups(df, word, max_n):
result = pd.DataFrame(columns=['Count'])
for n in range(max_n + 1):
result.at[word + '_' + str(n), 'Count'] = count_rows_with_exactly_n_words(df, word, n)
return result
print(count_word_groups(df, 'Apple', max_n=4))
print(count_word_groups(df, 'Banana', max_n=4)) Count
Apple_0 5
Apple_1 8
Apple_2 3
Apple_3 3
Apple_4 0
Count
Banana_0 2
Banana_1 6
Banana_2 8
Banana_3 3
Banana_4 0https://stackoverflow.com/questions/67274414
复制相似问题