文章/答案/技术大牛

发布

社区首页 >问答首页 >如何正确地将pandas数据帧切片分配给另一个数据帧中的值

问如何正确地将pandas数据帧切片分配给另一个数据帧中的值
EN

Stack Overflow用户

提问于 2021-10-19 07:58:57

回答 1查看 64关注 0票数 1

虽然我已经解决了这个问题，但我想知道，是否有更直接的方法来完成我的任务。

import pandas as pd
df1 = pd.DataFrame({'position': ['20', '8000', '8000'],
                   'SNP_ID': ['rs01', 'rs02', 'rs03'],
                   'SNP_ref': ['A', 'C', 'T'],
                   'SNP_alts': ['G', 'T','A,G,']})

df2 = pd.DataFrame({'position': ['400', '8000', '90000'],
                   'SNP_ID': ['', '', ''],
                   'SNP_ref': ['', '', ''],
                   'SNP_alts': ['', '',''],
                   'check_ref':['T','T','A'],
                   'check_alts':['T','G','A'],
                   'other_data': ['xx','yy','zz']})

c1 = ['SNP_ID','SNP_ref','SNP_alts']

for i in range(len(df2)):

    SNVs = df1[df1['position'] == df2['position'].loc[i]]

    if not SNVs.empty:
        df2.loc[df2.index[i],c1] = SNVs.loc[SNVs['SNP_ref'] == df2['check_ref'].loc[i],c1].iloc[0]

        print(df2)

因此，从本质上讲，基于某些条件(比这里显示的更多)，我希望将给定行的三列的值(基于某些条件)分配给另一个df中的三列。我只能使用.tolist()让它工作。

有没有更简单的方法来实现这一点？

*注意:我知道循环df中的行并不是一种好的做法，但据我所知，我目前无法想出一个更好的解决方案，我必须进行更多的比较来决定复制哪些行。就目前而言，我的dfs相当小，所以时间不是大问题。

谢谢黑根

*更新:根据答案，我使用更真实的数据集再次修改了我的代码，并在没有.tolist()的情况下让它工作。

import pandas as pd
df1 = pd.DataFrame({'position': ['20', '8000', '8000'],
                   'SNP_ID': ['rs01', 'rs02', 'rs03'],
                   'SNP_ref': ['A', 'C', 'T'],
                   'SNP_alts': ['G', 'T','A,G,']})

df2 = pd.DataFrame({'position': ['400', '8000', '90000'],
                   'SNP_ID': ['', '', ''],
                   'SNP_ref': ['', '', ''],
                   'SNP_alts': ['', '',''],
                   'check_ref':['T','T','A'],
                   'check_alts':['T','G','A'],
                   'other_data': ['xx','yy','zz']})

c1 = ['SNP_ID','SNP_ref','SNP_alts']

for i in range(len(df2)):

    SNVs = df1[df1['position'] == df2['position'].loc[i]]

    if not SNVs.empty:
        df2.loc[df2.index[i],c1] = SNVs.loc[SNVs['SNP_ref'] == df2['check_ref'].loc[i],c1].iloc[0]

print(df2)

*更新2不检查字母(' A‘、' T’等)在*_alts中是否匹配，但SNP_alts可以包含多个由冒号分隔的序列(例如A、T、G、AA、GG)

import pandas as pd
df1 = pd.DataFrame({'position': ['20', '8000', '8000'],
                   'SNP_ID': ['rs01', 'rs02', 'rs03'],
                   'SNP_ref': ['A', 'C', 'T'],
                   'SNP_alts': ['G', 'T','A,G,']})

df2 = pd.DataFrame({'position': ['400', '8000', '90000'],
                   'SNP_ID': ['', '', ''],
                   'SNP_ref': ['', '', ''],
                   'SNP_alts': ['', '',''],
                   'check_ref':['T','T','A'],
                   'check_alts':['T','G','A'],
                   'other_data': ['xx','yy','zz']})

c1 = ['SNP_ID','SNP_ref','SNP_alts']

for i in range(len(df2)):

    SNVs = df1[df1['position'] == df2['position'].loc[i]]

    if not SNVs.empty:
        bm1 = SNVs['SNP_ref'] == df2['check_ref'].loc[i]
        bm2 = SNVs['SNP_alts'].apply(lambda x: True if df2['check_alts'].loc[i] in x.split(',') else False)

        if len(SNVs.loc[bm1 & bm2,c1])>0:
            df2.loc[df2.index[i],c1] = SNVs.loc[bm1 & bm2,c1].iloc[0]

print(df2)

dataframe

python

pandas

回答 1

Stack Overflow用户

发布于 2021-10-19 08:02:33

将DataFrame.update与重命名列一起使用以进行正确匹配：

c1 = ['SNP_ID','SNP_ref','SNP_alts']
c2 = ['name','ref','alts']
d = dict(zip(c2, c1))

#for align values by column position
df11 = df1.set_index(['position','SNP_ref'])
df22 = df2.set_index(['position','check_ref'])
    
df22.update(df11.rename(columns=d))
df22 = df22.reset_index().reindex(df2.columns, axis=1)
print (df22)

  position SNP_ID SNP_ref SNP_alts check_ref other_data
0      400                                 T         xx
1     8000   rs03                A         T         yy
2    90000                                 A         zz

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69627049

复制

相似问题

问如何正确地将pandas数据帧切片分配给另一个数据帧中的值
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何正确地将pandas数据帧切片分配给另一个数据帧中的值EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何正确地将pandas数据帧切片分配给另一个数据帧中的值
EN