我有一张这样的数据:
id k1 k2 same
1 re_setup oo_setup true
2 oo_setup oo_setup true
3 alerting bounce false
4 bounce re_oversetup false
5 re_oversetup alerting false
6 alerting_s re_setup false
7 re_oversetup oo_setup true
8 alerting bounce false因此,我需要对字符串“安装”是否包含的行进行分类。
And simple output would be:
id k1 k2 same
1 re_setup oo_setup true
2 oo_setup oo_setup true
3 alerting bounce false
4 bounce re_setup false
5 re_setup alerting false
6 alerting_s re_setup false
7 re_setup oo_setup true
8 alerting bounce false我尝试过这样做,但是当我扩展时,我在选择多列时出错了。
data['same'] = data[data['k1', 'k2'].str.contains('setup')==True]发布于 2017-08-16 09:12:06
我认为您需要apply和str.contains,因为它只适用于Series (一列):
print (data[['k1', 'k2']].apply(lambda x: x.str.contains('setup')))
k1 k2
0 True True
1 True True
2 False False
3 False True
4 True False
5 False True
6 True True
7 False False然后添加DataFrame.all以检查每一行是否所有True的
data['same'] = data[['k1', 'k2']].apply(lambda x: x.str.contains('setup')).all(1)
print (data)
id k1 k2 same
0 1 re_setup oo_setup True
1 2 oo_setup oo_setup True
2 3 alerting bounce False
3 4 bounce re_setup False
4 5 re_setup alerting False
5 6 alerting_s re_setup False
6 7 re_setup oo_setup True
7 8 alerting bounce False或DataFrame.any用于检查每行至少一个True:
data['same'] = data[['k1', 'k2']].applymap(lambda x: 'setup' in x).any(1)
print (data)
id k1 k2 same
0 1 re_setup oo_setup True
1 2 oo_setup oo_setup True
2 3 alerting bounce False
3 4 bounce re_setup True
4 5 re_setup alerting True
5 6 alerting_s re_setup True
6 7 re_setup oo_setup True
7 8 alerting bounce False另一种使用applymap的解决方案是明智的检查:
data['same'] = data[['k1', 'k2']].applymap(lambda x: 'setup' in x).all(1)
print (data)
id k1 k2 same
0 1 re_setup oo_setup True
1 2 oo_setup oo_setup True
2 3 alerting bounce False
3 4 bounce re_setup False
4 5 re_setup alerting False
5 6 alerting_s re_setup False
6 7 re_setup oo_setup True
7 8 alerting bounce False如果只有2列,那么简单的链表条件包括& ( all )或| (如any )
data['same'] = data['k1'].str.contains('setup') & data['k2'].str.contains('setup')
print (data)
id k1 k2 same
0 1 re_setup oo_setup True
1 2 oo_setup oo_setup True
2 3 alerting bounce False
3 4 bounce re_setup False
4 5 re_setup alerting False
5 6 alerting_s re_setup False
6 7 re_setup oo_setup True
7 8 alerting bounce False发布于 2017-08-16 09:31:11
下面是另一种不需要apply的通用减缩操作
In [114]: np.logical_or.reduce([df[c].str.contains('setup') for c in ['k1', 'k2']])
Out[114]: array([ True, True, False, True, True, True, True, False], dtype=bool)详细信息
In [115]: df['same'] = np.logical_or.reduce(
[df[c].str.contains('setup') for c in ['k1', 'k2']])
In [116]: df
Out[116]:
id k1 k2 same
0 1 re_setup oo_setup True
1 2 oo_setup oo_setup True
2 3 alerting bounce False
3 4 bounce re_oversetup True
4 5 re_oversetup alerting True
5 6 alerting_s re_setup True
6 7 re_oversetup oo_setup True
7 8 alerting bounce False时间
小的
In [111]: df.shape
Out[111]: (8, 4)
In [108]: %timeit np.logical_or.reduce([df[c].str.contains('setup') for c in ['k1', 'k2']])
1000 loops, best of 3: 421 µs per loop
In [109]: %timeit df[['k1', 'k2']].apply(lambda x: x.str.contains('setup')).any(1)
1000 loops, best of 3: 2.01 ms per loop大型
In [110]: df.shape
Out[110]: (40000, 4)
In [112]: %timeit np.logical_or.reduce([df[c].str.contains('setup') for c in ['k1', 'k2']])
10 loops, best of 3: 59.5 ms per loop
In [113]: %timeit df[['k1', 'k2']].apply(lambda x: x.str.contains('setup')).any(1)
10 loops, best of 3: 88.4 ms per loophttps://stackoverflow.com/questions/45709488
复制相似问题