文章/答案/技术大牛

发布

社区首页 >问答首页 >对pandas中的列使用唯一值

问对pandas中的列使用唯一值
EN

Stack Overflow用户

提问于 2019-03-01 01:04:45

回答 1查看 28关注 0票数 0

我在pandas中有一个dataframe，它有五列: contig、length、identity、percent和hit。此数据从BLAST输出中解析，并按重叠群长度和匹配百分比进行排序。我的目标是让输出只为每个唯一的重叠群写一行。输出示例：

   contig        length  identity     percent  hit                                                                             
   contig-100_0  5485    [1341/1341]  [100.%]  ['hit1']
   contig-100_0  5485    [5445/5445]  [100.%]  ['hit2']
   contig-100_0  5485        [59/59]  [100.%]  ['hit3']
   contig-100_1  2865    [2865/2865]  [100.%]  ['hit1']
   contig-100_2  2800    [2472/2746]  [90.0%]  ['hit1']
   contig-100_3  2417    [2332/2342]  [99.5%]  ['hit1']
   contig-100_4  2204    [2107/2107]  [100.%]  ['hit1']
   contig-100_4  2000    [1935/1959]  [98.7%]  ['hit2']

我希望上面的代码看起来像这样：

   contig        length  identity     percent  hit                                                                             
   contig-100_0  5485    [1341/1341]  [100.%]  ['hit1']
   contig-100_1  2865    [2865/2865]  [100.%]  ['hit1']
   contig-100_2  2800    [2472/2746]  [90.0%]  ['hit1']
   contig-100_3  2417    [2332/2342]  [99.5%]  ['hit1']
   contig-100_4  2204    [2107/2107]  [100.%]  ['hit1']

下面是我用来生成上述输出的代码：

df = pd.read_csv(path+i,sep='\t', header=None, engine='python', \ 
     names=['contig','length','identity','percent','hit'])
df = df.sort_values(['length', 'percent'], ascending=[False, False])
top_hits = df.to_string(justify='left',index=False)
with open ('sorted_contigs', 'a') as sortedfile:
    sortedfile.write(top_hits+"\n")

我知道pandas中唯一的()方法，并且认为我需要使用的语法是df.contig.unique()，但是我不确定我应该把它放在代码中的什么位置。我还在学习熊猫，所以任何帮助都是非常感谢的！谢谢。

python

pandas

sorting

unique

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-03-01 01:09:25

您可以使用DataFrame.groupby(<colname>).head(<num_of_rows>)完成此操作

df.groupby('contig').head(1)

和输出：

          contig    length  identity    percent hit
0   contig-100_0    5485    [1341/1341] [100.%] ['hit1']
3   contig-100_1    2865    [2865/2865] [100.%] ['hit1']
4   contig-100_2    2800    [2472/2746] [90.0%] ['hit1']
5   contig-100_3    2417    [2332/2342] [99.5%] ['hit1']
6   contig-100_4    2204    [2107/2107] [100.%] ['hit1']

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54930819

复制

相似问题

问对pandas中的列使用唯一值
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问对pandas中的列使用唯一值EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问对pandas中的列使用唯一值
EN