我试图根据一个字符串是否包含在另一个专栏中,在熊猫数据中创建一个新的列。我使用基于这个np.select的post。下面是一个示例dataframe和一个创建新列的示例函数
df=pd.DataFrame({'column':['one','ones','other','two','twos','others','three','threes']})
def add(df):
conditions = [
('one' in df['column']),
('two' in df['column']),
('three' in df['column']),
('other' in df['column'])]
choices = [1, 2, 3, 0]
df['Int'] = np.select(conditions, choices, default=0)
return df
new_df=add(df)我得到的输出是
column Int
0 one 0
1 ones 0
2 other 0
3 two 0
4 twos 0
5 others 0
6 three 0
7 threes 0我想要的是
column Int
0 one 1
1 ones 1
2 other 0
3 two 2
4 twos 2
5 others 0
6 three 3
7 threes 3我做错什么了?
发布于 2019-04-25 10:51:59
如果需要测试子字符串,请使用Series.str.contains
conditions = [
(df['column'].str.contains('one')),
(df['column'].str.contains('two')),
(df['column'].str.contains('three')),
(df['column'].str.contains('other'))] 如果需要精确匹配,请使用Series.eq或==
conditions = [
(df['column'].eq('one')),
(df['column'].eq('two')),
(df['column'].eq('three')),
(df['column'].eq('other'))] conditions = [
(df['column'] == 'one'),
(df['column'] == 'two'),
(df['column'] == 'three'),
(df['column'] == 'other')] print (new_df)
column Int
0 one 1
1 ones 1
2 other 0
3 two 2
4 twos 2
5 others 0
6 three 3
7 threes 3https://stackoverflow.com/questions/55847571
复制相似问题