首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >不同尺寸数据采集系统的数值比较

不同尺寸数据采集系统的数值比较
EN

Stack Overflow用户
提问于 2022-10-17 08:33:56
回答 1查看 32关注 0票数 0

我有一个DataFrame,其中我想比较某些ID在不同条件下的速度。边界条件:

  1. ID不必在每个条件中表示,
  2. ID并不是在频率相同的每个条件中表示的。

我的目标是分配速度是否保持不变

  • (速度大于CondA +10%中的速度)、
  • (速度小于CondA中的速度10%)或
  • ( CondA +10%中的速度<)和( CondA中的速度>10%)

相同。

视情况而定。

数据

代码语言:javascript
复制
    import numpy as np
    import pandas as pd

    data1 = {
         'ID' : [1, 1, 1, 2, 3, 3, 4, 5],
         'Condition' : ['Cond_A', 'Cond_A', 'Cond_A', 'Cond_A', 'Cond_A', 'Cond_A','Cond_A','Cond_A', ],
         'Speed' : [1.2, 1.05, 1.2, 1.3, 1.0, 0.85, 1.1, 0.85],
         }
    df1 = pd.DataFrame(data1)
    
       
    data2 = {
         'ID' : [1, 2, 3, 4, 5, 6],
         'Condition' : ['Cond_B', 'Cond_B', 'Cond_B', 'Cond_B', 'Cond_B', 'Cond_B' ],
         'Speed' : [0.8, 0.55, 0.7, 1.15, 1.2, 1.4],
         }
    df2 = pd.DataFrame(data2)
    
    data3 = {
         'ID' : [1, 2, 3, 4, 6],
         'Condition' : ['Cond_C', 'Cond_C', 'Cond_C', 'Cond_C', 'Cond_C' ],
         'Speed' : [1.8, 0.99, 1.7, 131, 0.2, ],
         }
    df3 = pd.DataFrame(data3)
    lst_of_dfs = [df1,df2, df3]
     
    # creating a Dataframe object 
    data = pd.concat(lst_of_dfs)

我的目标是存档这样的结果

代码语言:javascript
复制
   Condition  ID    Speed   Category
0     Cond_A   1    1.150   NaN
1     Cond_A   2    1.300   NaN
2     Cond_A   3    0.925   NaN
3     Cond_A   4    1.100   NaN
4     Cond_A   5    0.850   NaN
5     Cond_B   1    0.800   faster
6     Cond_B   2    0.550   slower
7     Cond_B   3    0.700   slower
8     Cond_B   4    1.150   equal
...

我的尝试:

  1. 计算每个条件下每个ID的平均速度

代码语言:javascript
复制
    data = data.groupby(["Condition", "ID"]).mean()["Speed"].reset_index()

  1. 定义阈值。假设我希望在CondA-Values

周围实现10 %的阈值

代码语言:javascript
复制
    threshold_upper = data.loc[(data.Condition == 'CondA')]['Speed'] + (data.loc[(data.Condition == 'CondA')]['Speed']*10/100) 
    threshold_lower = data.loc[(data.Condition == 'CondA')]['Speed'] - (data.loc[(data.Condition == 'CondA')]['Speed']*10/100)

  1. 根据使用numpy select.

的条件映射字符串“更快”、“相等”、“更慢”

代码语言:javascript
复制
conditions = [
    (data.loc[(data.Condition == 'CondB')]['Speed'] >  threshold_upper),  #check whether Speed of each ID in CondB is faster than Speed in CondA+10% 
    (data.loc[(data.Condition == 'CondC')]['Speed'] >  threshold_upper),    #check whether Speed of each ID in CondC is faster than Speed in CondA+10%
    ((data.loc[(data.Condition == 'CondB')]['Speed'] <  threshold_upper) &  (data.loc[(data.Condition == 'CondB')]['Speed'] >  threshold_lower)), #check whether Speed of each ID in CondB is slower than Speed in CondA+10% AND faster than Speed in CondA-10%
    ((data.loc[(data.Condition == 'CondC')]['Speed'] <  threshold_upper) &  (data.loc[(data.Condition == 'CondC')]['Speed'] >  threshold_lower)), #check whether Speed of each ID in CondC is slower than Speed in CondA+10% AND faster than Speed in CondA-10%
    (data.loc[(data.Condition == 'CondB')]['Speed'] <  threshold_upper),  #check whether Speed of each ID in CondB is slower than Speed in CondA-10%
    (data.loc[(data.Condition == 'CondC')]['Speed'] <  threshold_upper),    #check whether Speed of each ID in CondC is faster than Speed in CondA-10%
    ]
    
values = [
    'faster',
    'faster',
     'equal', 
     'equal',
     'slower',
     'slower'
      ]
    
    
data['Category'] = np.select(conditions, values)

产生此错误: )不匹配

不幸的是,我的数据帧有不同的长度(因为并非所有ID都对每种情况执行所有的试验)。我很感激你的暗示。在此之前,非常感谢您。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-10-17 11:14:29

代码语言:javascript
复制
# Dataframe created

data
    ID  Condition   Speed
0   1   Cond_A  1.20
1   1   Cond_A  1.05
2   1   Cond_A  1.20
代码语言:javascript
复制
# Reset the index
data = data.reset_index(drop=True)

# Creating based on ID
data['group'] = data.groupby(['ID']).ngroup()

# Creating functions which returns the upper and lower limit of speed

def lowlimit(x):
    return x[x['Condition']=='Cond_A'].Speed.mean() * 0.9

def upperlimit(x):
    return x[x['Condition']=='Cond_A'].Speed.mean() * 1.1

# Calculate the upperlimit and lowerlimit for the groups
df = pd.DataFrame()
df['ul'] = data.groupby('group').apply(lambda x: upperlimit(x))
df['ll'] = data.groupby('group').apply(lambda x: lowlimit(x))

# reseting the index
# So that we can merge the values of 'group' column
df = df.reset_index() 

# Merging the data and df dataframe

data_new = pd.merge(data,df,on='group',how='left')

data_new
    ID  Condition   Speed   group   ul  ll
0   1   Cond_A  1.20    0   1.2650  1.0350
1   1   Cond_A  1.05    0   1.2650  1.0350
2   1   Cond_A  1.20    0   1.2650  1.0350
3   2   Cond_A  1.30    1   1.4300  1.1700

现在我们必须应用这些条件

代码语言:javascript
复制
data_new.loc[(data_new['Speed'] >= data_new['ul']) & (data_new['Condition'] != 'Cond_A'),'Category'] = 'larger'
data_new.loc[(data_new['Speed'] <= data_new['ll']) & (data_new['Condition'] != 'Cond_A'),'Category'] = 'smaller'
data_new.loc[(data_new['Speed'] < data_new['ul']) & (data_new['Speed'] > data_new['ll']) & (data_new['Condition'] != 'Cond_A'),'Category'] = 'Same'

这是输出

如果您想要data_new = data_new.drop(columns=['group','ul','ll']),现在可以删除其他列

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74094538

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档