我用Python编写了自己的函数。该函数非常简单,下面可以看到数据和函数:
data_1 = {'id':['1','2','3','4','5'],
'name': ['Company1', 'Company1', 'Company3', 'Company4', 'Company5'],
'employee': [10, 3, 5, 1, 0],
'sales': [100, 30, 50, 200, 0],
}
df = pd.DataFrame(data_1, columns = ['id','name', 'employee','sales'])
threshold_1=40
threshold_2=50该函数如下所示:
def my_function(employee,sales):
conditions = [
(sales == 0 ),
(sales < threshold_1),
(sales >= threshold_1 & employee <= threshold_2)]
values = [0, sales*2, sales*4]
sales_estimation = np.select(conditions, values)
return (sales_estimation)
df['new_column'] = df.apply(lambda x: my_function(x.employee,x.sales), axis=1)
df因此,该函数工作良好,给出了预期的结果。
现在,我想做同样的功能,但与矢量化操作跨潘达斯系列。我需要这个函数,因为矢量化操作减少了执行时间。由于这个原因,我编写了这个函数,但是这个函数不起作用。
def my_function1(
pandas_series:pd.Series
)-> pd.Series:
"""
Vectorized operation across Pandas Series
"""
conditions = [
(sales == 0 ),
(sales < threshold_1),
(sales >= threshold_1 & employee <= threshold_2)]
values = [0, sales*2, sales*4]
sales_estimation = np.select(conditions, values)
return sales_estimation
df['new_column_1']=my_function1(data['employee','sales'])可能我的错误与此函数的输入参数有关。那么,有谁能帮我解决这个问题,使my_function1的功能吗?
发布于 2022-02-22 09:05:10
你需要稍微改变一个条件才能通过系列赛:
(sales >= threshold_1 & employee <= threshold_2)
# equivalent to
# sales >= (threshold_1 & employee) <= threshold_2转入:
(sales >= threshold_1) & (employee <= threshold_2)因为运算符优先是错误的。
def my_function(employee,sales):
conditions = [
(sales == 0 ),
(sales < threshold_1),
(sales >= threshold_1) & (employee <= threshold_2)]
values = [0, sales*2, sales*4]
sales_estimation = np.select(conditions, values)
return (sales_estimation)
df['new_column'] = my_function(df['employee'], df['sales'])产出:
id name employee sales new_column
0 1 Company1 10 100 400
1 2 Company1 3 30 60
2 3 Company3 5 50 200
3 4 Company4 1 200 800
4 5 Company5 0 0 0您还可以在那里传递整个dataframe和子集列:
def my_function(df):
employee = df['employee']
sales = df['sales']
conditions = [
(sales == 0 ),
(sales < threshold_1),
(sales >= threshold_1) & (employee <= threshold_2)]
values = [0, sales*2, sales*4]
sales_estimation = np.select(conditions, values)
return (sales_estimation)
df['new_column'] = my_function(df)发布于 2022-02-22 09:07:05
Pass功能类似,还可以添加()以避免ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().,因为操作符的优先级:
def my_function1(employee, sales):
conditions = [
(sales == 0 ),
(sales < threshold_1),
(sales >= threshold_1) & (employee <= threshold_2)] #<- here
values = [0, sales*2, sales*4]
sales_estimation = np.select(conditions, values)
return sales_estimation
df['new_column_1']= my_function1(df['employee'],df['sales'])
print (df)
id name employee sales new_column_1
0 1 Company1 10 100 400
1 2 Company1 3 30 60
2 3 Company3 5 50 200
3 4 Company4 1 200 800
4 5 Company5 0 0 0https://stackoverflow.com/questions/71218475
复制相似问题