文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在使用str.contains时忽略带掩码的行？

问如何在使用str.contains时忽略带掩码的行？
EN

Stack Overflow用户

提问于 2020-01-17 15:58:00

回答 1查看 210关注 0票数 1

我有一个商店名称的数据，我必须标准化。例如，McDonalds 1234 LA -> McDonalds。你可以在下面看到Popeyes和Wallmart已经标准化了：

   id              store  standard
0   1          McDonalds       NaN
1   2               Lidl       NaN
2   3  Lidl New York 123       NaN
3   4                KFC       NaN
4   5      Slidling Shop       NaN
5   6        Lidi Berlin       NaN
6   7         Popeyes NY   Popeyes
7   8  Wallmart LA 90210  Wallmart
8   9               Aldi       NaN
9  10        London Lidl       NaN

我使用str.contains查找商店名称，并将标准化名称放置到standard列中。在这里，我正在标准化Lidl商店：

df.loc[df.store.str.contains(r'\blidl\b', case=False), 'standard'] = 'Lidl'

print(df)

   id              store  standard
0   1          McDonalds       NaN
1   2               Lidl      Lidl
2   3  Lidl New York 123      Lidl
3   4                KFC       NaN
4   5      Slidling Shop       NaN
5   6        Lidi Berlin       NaN
6   7         Popeyes NY   Popeyes
7   8  Wallmart LA 90210  Wallmart
8   9               Aldi       NaN
9  10        London Lidl      Lidl

然而，这里的问题是，它在已经标准化的行上搜索str.contains (Popeyes和Wallmart)。

如何只在str.contains行上运行df['standard'] == NaN而忽略标准化的行？

我尝试过一些非常混乱的东西，但似乎不起作用。我设置了一个掩码，然后在运行str.contains之前使用它

mask = df['standard'].isna()

df[mask].loc[df[mask].store.str.contains(aldi_regex,na=False), 'standard3'] = 'Aldi'

不管用。我也尝试过一些更凌乱的东西，但没有奏效：

df.loc[mask].loc[df.loc[mask].store.str.contains(aldi_regex,na=False), 'standard3'] = 'Aldi'

我怎么能忽略标准化的行？而不用求助于for循环。

我的示例dataframe：

import pandas as pd
import re

df = pd.DataFrame({'id': pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],dtype='int64',index=pd.RangeIndex(start=0, stop=10, step=1)), 'store': pd.Series(['McDonalds', 'Lidl', 'Lidl New York 123', 'KFC', 'Slidling Shop', 'Lidi Berlin', 'Popeyes NY', 'Wallmart LA 90210', 'Aldi', 'London Lidl'],dtype='object',index=pd.RangeIndex(start=0, stop=10, step=1)), 'standard': pd.Series([pd.np.nan, pd.np.nan, pd.np.nan, pd.np.nan, pd.np.nan, pd.np.nan, 'Popeyes', 'Wallmart', pd.np.nan, pd.np.nan],dtype='object',index=pd.RangeIndex(start=0, stop=10, step=1))}, index=pd.RangeIndex(start=0, stop=10, step=1))

python

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-01-17 16:38:42

我怎么能忽略标准化的行？而不用求助于for循环。

通过过滤检查空值：

df.loc[df['standard'].isnull() & df.store.str.contains(r'\blidl\b', case=False), 'standard'] = 'Lidl'

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/59790807

复制

相似问题

问如何在使用str.contains时忽略带掩码的行？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在使用str.contains时忽略带掩码的行？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在使用str.contains时忽略带掩码的行？
EN