文章/答案/技术大牛

发布

社区首页 >问答首页 >熊猫条件检查取决于行和列

问熊猫条件检查取决于行和列
EN

Stack Overflow用户

提问于 2020-03-31 08:57:29

回答 3查看 468关注 0票数 2

我有以下阈值：

for a_x: red=10 blue=5
for a_y: red=50 blue=15
for b_x: red=8  blue=4
for b_y: red=40 blue=10

这就是我所拥有的数据

type  x1   x2  x3  y1  y2  y3  z
a     1    3   5   11  13  9   qaz
a     2    7   9   23  67  35  qeq
a     7    9   13  36  24  8   rfc
b     10   3   5   51  19  10  qwe
b     5    4   2   21  12  11  erg
b     1    2   3   9   7   8   gbt

现在，对于带有type=a的行，我希望对列名中包含x的任何列和包含y的a_y阈值列使用a_x阈值。

类似地，对于带有type=b的行，我希望对包含x的列使用b_x阈值，而对包含y的列使用b_y阈值。

最后，我想创建两个新的数据点，红色和蓝色，包含所有红色阈值被突破的行(当值>=阈值被破坏时)和只有蓝色阈值被突破(因此，如果突破了红色阈值，那么就不需要检查蓝色阈值，因为它将只属于红色而不是蓝色)。

最后，我们将获得以下数据：

红色：

type  x1   x2  x3  y1  y2  y3  z
a     2    7   9   23  67  35  qeq
a     7    9   13  36  24  8   rfc
b     10   3   5   51  19  10  qwe

蓝色：

type  x1   x2  x3  y1  y2  y3  z
a     1    3   5   11  13  9   qaz
b     5    4   2   21  12  11  erg

作为，

type  x1   x2  x3  y1  y2  y3  z
a     1    3   5   11  13  9   qaz  -> x3 breaches blue
a     2    7   9   23  67  35  qeq  -> y2 breaches red
a     7    9   13  36  24  8   rfc  -> x3 breaches red
b     10   3   5   51  19  10  qwe  -> x1, y1 breaches red
b     5    4   2   21  12  11  erg  -> x1, x2, y1, y2, y3 breaches blue
b     1    2   3   9   7   8   gbt  -> no breach

现在，显然，我可以遍历所有的行和列，检查是否存在阈值缺口，但是必须有更好的方法来实现它！

python

pandas

dataframe

回答 3

Stack Overflow用户

回答已采纳

发布于 2020-03-31 10:30:40

首先，我创建嵌套字典的条件：

d = {'a_x': {'red':10, 'blue':5},
     'a_y': {'red':50, 'blue':15},
     'b_x': {'red':8, 'blue':4},
     'b_y': {'red':40, 'blue':10}}

更好的格式是将red和blue值分离到外部字典：

from collections import defaultdict


d1 = defaultdict(dict)
for k, v in d.items():
    for k1, v1 in v.items():
      d1[k1][k] = v1

print (d1)
defaultdict(<class 'dict'>, {'red': {'a_x': 10, 'a_y': 50, 'b_x': 8, 'b_y': 40}, 
                             'blue': {'a_x': 5, 'a_y': 15, 'b_x': 4, 'b_y': 10}})

然后按每个切块循环，用boolean indexing进行滤波，用_分割k值。

red = [df.loc[df['type'].eq(k.split('_')[0]), 
            df.columns.str.startswith(k.split('_')[1])] >= v for k, v in d1['red'].items()]

然后通过concat将掩码连接起来，并通过DataFrame.any测试是否至少有一行匹配。

mask_red = pd.concat(red).any(level=0).any(axis=1)
# print (mask_red)


blue = [df.loc[df['type'].eq(k.split('_')[0]), 
           df.columns.str.startswith(k.split('_')[1])] >= v for k, v in d1['blue'].items()]

mask_blue = pd.concat(blue).any(level=0).any(axis=1)
# print (mask_blue)

最后一个筛选器匹配红色值：

df1 = df[mask_red]
print (df1)
  type  x1  x2  x3  y1  y2  y3    z
1    a   2   7   9  23  67  35  qeq
2    a   7   9  13  36  24   8  rfc
3    b  10   3   5  51  19  10  qwe

和已在红色DataFrame中使用的已排除的蓝色值：

df2 = df[mask_blue & ~mask_red]
print (df2)
  type  x1  x2  x3  y1  y2  y3    z
0    a   1   3   5  11  13   9  qaz
4    b   5   4   2  21  12  11  erg

为了避免重复代码，可以使用字典理解掩码的字典：

d = {'a_x': {'red':10, 'blue':5},
     'a_y': {'red':50, 'blue':15},
     'b_x': {'red':8, 'blue':4},
     'b_y': {'red':40, 'blue':10}}


d1 = defaultdict(dict)
for k, v in d.items():
    for k1, v1 in v.items():
      d1[k1][k] = v1

print (d1)
defaultdict(<class 'dict'>, {'red': {'a_x': 10, 'a_y': 50, 'b_x': 8, 'b_y': 40}, 
                             'blue': {'a_x': 5, 'a_y': 15, 'b_x': 4, 'b_y': 10}})

masks = {k1: pd.concat([df.loc[df['type'].eq(k.split('_')[0]), 
               df.columns.str.startswith(k.split('_')[1])] >= v 
           for k, v in v1.items()]).any(level=0).any(axis=1)
           for k1, v1 in d1.items()}

print (masks)
{'red': 0    False
1     True
2     True
3     True
4    False
5    False
dtype: bool, 'blue': 0     True
1     True
2     True
3     True
4     True
5    False
dtype: bool}

df1 = df[masks['red']]
print (df1)
  type  x1  x2  x3  y1  y2  y3    z
1    a   2   7   9  23  67  35  qeq
2    a   7   9  13  36  24   8  rfc
3    b  10   3   5  51  19  10  qwe

df2 = df[masks['blue'] & ~masks['red']]
print (df2)
  type  x1  x2  x3  y1  y2  y3    z
0    a   1   3   5  11  13   9  qaz
4    b   5   4   2  21  12  11  erg

票数 1

Stack Overflow用户

发布于 2020-03-31 09:34:25

您可以声明一些条件筛选器，例如：

 a_red_filter = (df[type] == 'a') & (df[['x1','x2','x3']] < [10]*3) &
(df[['y1','y2','y3']]<[50]*3)

 b_red_filter = (df[type] == 'b') & (df[['x1','x2','x3']] < [8]*3) &
(df[['y1','y2','y3']] <[40]*3)

如果您执行以下命令：

df[a_red_filter | b_red_filter]

因此，您将得到红色点的DataFrame：

type  x1   x2  x3  y1  y2  y3  z
a     2    7   9   23  67  35  qeq
a     7    9   13  36  24  8   rfc
b     10   3   5   51  19  10  qwe

很明显蓝色也是一样的

票数 0

Stack Overflow用户

发布于 2020-03-31 10:53:57

希望你能比我之前发布的混乱更好地读懂这篇文章

#reshape table and create column of just the alphabets
M = (df.melt(['type','z'])
     .assign(letters=lambda x: x.variable.str[0])
     )

red_a = M.query('type=="a" and ((letters=="x" and value >=10) or (letters=="y" and value >=50))')
red_b = M.query('type=="b" and ((letters=="x" and value >=8) or (letters=="y" and value >=40))')
blue_a = M.query('type=="a" and ((letters=="x" and value >=5) or (letters=="y" and value >=15))')
blue_b = M.query('type=="b" and ((letters=="x" and value >=4) or (letters=="y" and value >=10))')

red_only = (pd.concat([red_a,red_b])
            .filter(['type','z'])
            .drop_duplicates('z')
           )

red_only_z = red_only["z"].tolist()

blue_only = (pd.concat([blue_a,blue_b])
             .filter(['type','z'])
              #if row already belongs in red, filter it out
             .query('z != @red_only_z')
             .drop_duplicates('z')
             )
#extract tables from the original dataframe
cond_red = df['z'].isin(red_only_z) & (df['type'].isin(red_only['type']))
cond_blue = df['z'].isin(blue_only['z']) & (df['type'].isin(blue_only['type']))
red_table = df.loc[cond_red]
blue_table = df.loc[cond_blue]

print(blues)    
    type    x1  x2  x3  y1  y2  y3  z
0   a       1   3   5   11  13  9   qaz
4   b       5   4   2   21  12  11  erg

print(reds)
    type    x1  x2  x3  y1  y2  y3  z
1   a       2   7   9   23  67  35  qeq
2   a       7   9   13  36  24  8   rfc
3   b      10   3   5   51  19  10  qwe

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60946784

复制

相似问题

问熊猫条件检查取决于行和列
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫条件检查取决于行和列EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫条件检查取决于行和列
EN