文章/答案/技术大牛

发布

社区首页 >问答首页 >使用带有范围的列的np.select

问使用带有范围的列的np.select
EN

Stack Overflow用户

提问于 2022-03-15 13:16:22

回答 3查看 134关注 0票数 1

我有这样的代码：

df = pd.DataFrame({'r': {0: '01', 1: '02', 2: '03', 3: '04', 4:''},\
                   'an': {0: 'a', 1: 'b,c', 2: '', 3: 'c,a,b',4:''}})

产生以下数据：

    r   an
0   01  a
1   02  b,c
2   03  
3   04  c,a,b
4

使用np.select，所需的输出如下：

    r   an    s
0   01  a     13
1   02  b,c   [88,753]
2   03  
3   04  c,a,b [789,48,89] 
4

我试着设计了以下代码：

conditions=[
     (df['an']=='a')&(df['r']=='01'),
     (df['an']=='b')&(df['r']=='01'),
     (df['an']=='c')&(df['r']=='01'),
     (df['an']=='d')&(df['r']=='01'),
     (df['an']=='')&(df['r']=='01'),
     (df['an']=='a')&(df['r']=='02'),
     (df['an']=='b')&(df['r']=='02'),
     (df['an']=='c')&(df['r']=='02'),
     (df['an']=='d')&(df['r']=='02'),
     (df['an']=='')&(df['r']=='02'),
     (df['an']=='a')&(df['r']=='03'),
     (df['an']=='b')&(df['r']=='03'),
     (df['an']=='c')&(df['r']=='03'),
     (df['an']=='d')&(df['r']=='03'),
     (df['an']=='')&(df['r']=='03'),
     (df['an']=='a')&(df['r']=='04'),
     (df['an']=='b')&(df['r']=='04'),
     (df['an']=='c')&(df['r']=='04'),
     (df['an']=='d')&(df['r']=='04'),
     (df['an']=='')&(df['r']=='04')
      ]
      
choices=[
    13,
    75,
    6,
    89,
    '-',
    45,
    88,
    753,
    75,
    '-',
    0.2,
    15,
    79,
    63,
    '-',
    48,
    89,
    789,
    15,
    '-',
    ]
    
df['s']=np.select(conditions, choices)

不幸的是，上面的代码只返回原始0 (单)的期望输出，而其他raw则返回它重新设置的0。是否可以将np.select与一系列值一起使用？

pandas

numpy

select

range

python

回答 3

Stack Overflow用户

回答已采纳

发布于 2022-03-15 13:36:01

IIUC，使用容器(Series/DataFrame/字典)包含匹配，然后使用循环引用它们：

# mapping the references, can be any value
df_map = pd.DataFrame({'a': ['sa01', 'sa02', 'sa03', 'sa04'],
                       'b': ['sb01', 'sb02', 'sb03', 'sb04'],
                       'c': ['sc01', 'sc02', 'sc03', 'sc04'],
                       'd': ['sd01', 'sd02', 'sd03', 'sd04'],
                        '': ['s01', 's02', 's03', 's04'],     # optional
                       }, index=['01', '02', '03', '04']
                       )
# derive a dictionary
# (you could also manually define the dictionary
#  if not all combinations are needed)
d = df_map.stack().to_dict()
# {(0, 'a'): 'sa01',
#  (0, 'b'): 'sb01',
#  (0, 'c'): 'sc01',
#  (0, 'd'): 'sd01',
#  (0,  ''): 's01',
#  (1, 'a'): 'sa02',

# map the values
df['s'] = [l if len(l:=[d.get((r, e)) for e in s.split(',')])>1 else l[0]
           for r,s in zip(df['r'], df['an'])]

产出：

    r     an                   s
0  01      a                sa01
1  02    b,c        [sb02, sc02]
2  03                        s03
3  04  c,a,b  [sc04, sa04, sb04]
4                           None

票数 1

Stack Overflow用户

发布于 2022-03-15 13:21:36

对于一个元素列表和空列表使用一些if, elif, else连接两个列的解决方案：

#create dictionary for mapping by splitted columns
d = {('01','a'):10,
    ('02','b'):20,
    ('02','c'):50,
    ('04','a'):100,
    ('04','b'):200,
    ('04','c'):500}


def f(a, b):
    #if no match return -
    L = [d.get((a, c), '-') if a!='' and c!='' else '' for c in b.split(',')]
    if len(L) == 1:
        return L[0]
    elif not bool(L):
        return ''
    else:
        return L
     
df['new'] = [f(a, b) for a, b in zip(df['r'], df['an'])]
print (df)
    r     an              new
0  01      a               10
1  02    b,c         [20, 50]
2  03                        
3  04  c,a,b  [500, 100, 200]
4

票数 1

Stack Overflow用户

发布于 2022-03-15 13:30:05

IIUC，试着：

df = pd.DataFrame({'r': {0: '01', 1: '02', 2: '03', 3: '04', 4:''},
                   'an': {0: 'a', 1: 'b,c', 2: '', 3: 'c,a,b',4:''}})

df["an"] = df["an"].str.split(",")
df = df.explode("an")

conditions = [df["an"].eq("a")&df["r"].eq("01"),
              df["an"].eq("b")&df["r"].eq("01"),
              df["an"].eq("c")&df["r"].eq("01"),
              df["an"].eq("d")&df["r"].eq("01"),
              df["an"].eq("a")&df["r"].eq("02"),
              df["an"].eq("b")&df["r"].eq("02"),
              df["an"].eq("c")&df["r"].eq("02"),
              df["an"].eq("d")&df["r"].eq("02"),
              df["an"].eq("a")&df["r"].eq("03"),
              df["an"].eq("b")&df["r"].eq("03"),
              df["an"].eq("c")&df["r"].eq("03"),
              df["an"].eq("d")&df["r"].eq("03"),
              df["an"].eq("a")&df["r"].eq("04"),
              df["an"].eq("b")&df["r"].eq("04"),
              df["an"].eq("c")&df["r"].eq("04"),
              df["an"].eq("d")&df["r"].eq("04")]

choices = [13, 75, 6, 89,
           45, 88, 753, 75,
           0.2, 15, 79, 63,
           48, 89, 789, 15]

df["s"] = np.select(conditions, choices, np.nan)
output = df.groupby("r").agg({"an": ",".join, "s": list}).reset_index()

>>> output
    r     an                    s
0                           [nan]
1  01      a               [13.0]
2  02    b,c        [88.0, 753.0]
3  03                       [nan]
4  04  c,a,b  [789.0, 48.0, 89.0]

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71482924

复制

相似问题

问使用带有范围的列的np.select
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用带有范围的列的np.selectEN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用带有范围的列的np.select
EN