我非常欢迎关于使这段代码更简洁的任何提示,请。在这个例子中,我们有一些实验结果。对于每个计划中的实验,我们根据4种温度预测结果,我们有4种预测结果,每种温度1种。
对于每个实验,当收到实际的实验时,我们需要找到它的预测结果,并使用实际的结果来找出结果之间的两个温度之间,然后使用与这些温度相对应的预测值。
也就是说,如果实际结果在Temp_10和Temp_15的值之间,我们知道我们必须使用Predicted_Result_10和Predicted_Result_15。
我欢迎任何关于如何使之更简洁的建议,而且大多数实验室至少有18种温度和18种预测结果,同时也能满足不同实验室的需求,这些实验室都有超过18种温度。我将知道每个实验室有多少个实验室有25种温度,有25种预测结果,英国实验室有20种温度,20种预测结果AUS实验室有18种温度和18种预测结果
耽误您时间,实在对不起
import pandas as pd
predicted_data = [['A',35,36,37,37,11.1955,11.8546,12.3809,12.8378],
['B',38,36,38,37,9.2410,9.7486,10.1248,10.4282],
['C',34,35,35,39,9.2686,9.7707,10.1330,10.4166]
]
result_data = [['A',11.3],
['B',10.11],
['C',9.53]]
predicted_df = pd.DataFrame(predicted_data,columns=['Experiment','Predicted_Result_10','Predicted_Result_15', \
'Predicted_Result_20','Predicted_Result_30', \
'Temp_10','Temp_15','Temp_20','Temp_30'])
print (predicted_df)
result_df = pd.DataFrame(result_data,columns=['Experiment','Result'])
print (result_df)
def dummy_function (predicted_result1: float, predicted_result2: float):
# actual function is more complex
print ('calculation using ', predicted_result1, predicted_result2)
return predicted_result1 + predicted_result2
Experiment_A_result = 0
Experiment_B_result = 0
Experiment_C_result = 0
for index,result in result_df.iterrows():
# search predicted results for each experiment
# print (predicted_df.loc[predicted_df['Experiment'] == result['Experiment']])
# print ( ' actual result ',result['Result'])
# print (type(result['Result']))
# print (predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Temp_10'])
# For an actual result, we want to find which 2 Temp columns from the predicted data does the actual
# result fall between. Having found those 2 columns
if (predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Temp_10'].iloc[0]) <= result['Result'] \
< predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Temp_15'].iloc[0]:
print (' do a calculation using the predicted results 10 and 15 as they match the range')
my_result = dummy_function(predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Predicted_Result_10'].iloc[0],
predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Predicted_Result_15'].iloc[0])
elif (predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Temp_15'].iloc[0]) <= result['Result'] \
< predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Temp_20'].iloc[0]:
print (' do a calculation using the predicted results 15 and 20 as they match the range')
my_result = dummy_function(
predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Predicted_Result_15'].iloc[0],
predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Predicted_Result_20'].iloc[0])
elif (predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Temp_20'].iloc[0]) <= result['Result'] \
< predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Temp_30'].iloc[0]:
print (' do a calculation using the predicted results 20 and 30 as they match the range')
my_result = dummy_function(predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Predicted_Result_20'].iloc[0],
predicted_df.loc[predicted_df['Experiment'] == result['Experiment']]['Predicted_Result_30'].iloc[0])
if result['Experiment'] == 'A':
Experiment_A_result = Experiment_A_result + my_result
elif result['Experiment'] == 'B':
Experiment_B_result = Experiment_B_result + my_result
else: Experiment_C_result = Experiment_C_result + my_result发布于 2022-06-24 19:12:37
你的数据是畸形的。而不是每个.有单独的Temp_列。这些(10,15等)是什么?温度?每个实验和温度应该有一个索引级别,Result列应该有3*4= 12行,这是因为这些索引级别的笛卡尔积。
不要编写_data列表,也不要单独定义columns。只需将一个dict直接传递到DataFrame构造函数中,该构造函数的键与列对应。
需要删除dummy_function并将其替换为聚合.sum()。
不要iterrows,除非作为最后的手段。更重要的是,我相信这是因为您拒绝展示您真正的代码在做什么,您的my_result基本上是没有意义的,因为它在每次迭代中都覆盖了自己。我将假设您只关心最后一行,这就是这个循环将提供的内容;因此,完全删除该循环。
链接的if语句在边缘没有良好的行为:如果实验结果超出了预测结果的范围,怎么办?目前,您根本不初始化my_result。所有这些ifs都应该被searchsorted的电话所取代。
import pandas as pd
predicted_df = pd.DataFrame(
{
'Temp': (11.1955, 11.8546, 12.3809, 12.8378, 9.2410, 9.7486, 10.1248, 10.4282, 9.2686, 9.7707, 10.1330, 10.4166),
'Result': (35, 36, 37, 37, 38, 36, 38, 37, 34, 35, 35, 39),
}, index=pd.MultiIndex.from_product(
names=('Experiment', 'Base_Temp'),
iterables=(
('A', 'B', 'C'),
(10, 15, 20, 30),
),
),
)
result_df = pd.DataFrame(
{'Result': (11.30, 10.11, 9.53)},
index=predicted_df.index.unique('Experiment'),
)
print(predicted_df)
print(result_df)
result = result_df.iloc[-1:]
experiment = predicted_df.loc[result.index]
y, = experiment.Temp.searchsorted(result.Result)
if 0 < y < experiment.shape[0]:
my_result = experiment.Result.iloc[y-1: y+1].sum()
print(my_result) Temp Result
Experiment Base_Temp
A 10 11.1955 35
15 11.8546 36
20 12.3809 37
30 12.8378 37
B 10 9.2410 38
15 9.7486 36
20 10.1248 38
30 10.4282 37
C 10 9.2686 34
15 9.7707 35
20 10.1330 35
30 10.4166 39
Result
Experiment
A 11.30
B 10.11
C 9.53
69https://codereview.stackexchange.com/questions/277577
复制相似问题