你好,我有一个df,比如
COL1 Col2
G1 SP1.3
G1 SP2.3
G1 SP6.2
G1 SP4_4
G1 SP4_2
G1 SP8_2
G2 SP3_2
G2 SP1_3
G2 SP2_4
G2 SP2.2如何才能仅将COL2包含(SP1、SP2和SP4实例)的groupbs子集(在COL1中)?
在这里,我应该只获取所有G1组:
COL1 Col2
G1 SP1.3
G1 SP2.3
G1 SP6.2
G1 SP4_4
G1 SP4_2
G1 SP8_2发布于 2020-11-26 20:02:17
解决方案使用get values before .或_ by Series.str.split通过GroupBy.transform中的自定义函数对每个组的集合进行比较
a = ['SP1','SP2','SP4']
f = lambda x: set(x) >= set(a)
m = df['Col2'].str.split('\.|_').str[0].groupby(df['COL1']).transform(f)
df = df[m]
print (df)
COL1 Col2
0 G1 SP1.3
1 G1 SP2.3
2 G1 SP6.2
3 G1 SP4_4
4 G1 SP4_2
5 G1 SP8_2编辑:针对get values by list的Series.str.extract解决方案:
a = ['SP1','SP2','SP4']
f = lambda x: set(x) >= set(a)
m = df['Col2'].str.extract(f'({"|".join(a)})',expand=False).groupby(df['COL1']).transform(f)
df = df[m]发布于 2020-11-26 20:14:43
我为G1做了以下工作(不包括SP8):
df.loc[(df['COL1']== 'G1') & (df['Col2'].str.contains('SP1|SP2|SP4'))]
df
COL1 Col2
0 G1 SP1.3
1 G1 SP2.3
3 G1 SP4_4
4 G1 SP4_2https://stackoverflow.com/questions/65021631
复制相似问题