你好,我有一个df,如:
Groups COL1
G1 Seq:1
G1 Seq:2
G1 Seq_1
G1 Seq:4
G2 Seq_2
G2 Seq_3
G2 Seq_4
G3 Seq:5
G3 Seq:6
G4 Seq:7
G4 Seq_5我想数一数:
有人知道吗?我想我应该起诉一个re.sub,然后在熊猫中计算每个Groups的总和?
发布于 2020-11-10 09:37:31
您可以使用这个来计算使用pd.Series.str.contains,然后使用GroupBy.all和GroupBy.any
om = df['COL1'].str.contains(':')
one = om.groupby(df['Groups']).all().sum() # 1
two = om.groupby(df['Groups']).any().sum() - one # 2
# minus one because `any` counts all Trues too so we need
# subtract groups with all Trues.
three = (~om).groupby(df['Groups']).all().sum() # 1发布于 2020-11-10 09:25:30
使用Series.str.contains作为掩码,然后通过numpy.setdiff1d将DataFrame.loc过滤的值与~或掩码的反向掩码进行比较:
m = df['COL1'].str.contains(':')
a = np.setdiff1d(df['Groups'], df.loc[~m, 'Groups']).tolist()
print (a)
['G3']
c = np.setdiff1d(df['Groups'], df.loc[m, 'Groups']).tolist()
print (c)
['G2']
b = np.setdiff1d(df.loc[~m, 'Groups'], c).tolist()
print (b)
['G1', 'G4']用于计数的Anf获取列表长度:
print (len(a))
print (len(b))
print (len(c))https://stackoverflow.com/questions/64765989
复制相似问题