文章/答案/技术大牛

发布

社区首页 >问答首页 >熊猫:基于字典的文件分割与编辑

问熊猫:基于字典的文件分割与编辑
EN

Stack Overflow用户

提问于 2014-09-23 19:56:16

回答 1查看 253关注 0票数 1

我刚接触过熊猫，在解决以下问题时遇到了一些小麻烦。我有两个文件需要用来创建输出。第一个文件包含关于功能和相关基因的列表。文件的一个示例(显然完全由数据组成)

File 1:

Function    Genes
Emotions    HAPPY,SAD,GOOFY,SILLY
Walking    LEG,MUSCLE,TENDON,BLOOD
Singing    VOCAL,NECK,BLOOD,HAPPY

我正在读字典，用：

from collections import *

FunctionsWithGenes = defaultdict(list)

def read_functions_file(File):
    Header = File.readline()
    Lines = File.readlines()
    for Line in Lines:
        Function, Genes = Line[0], Line[1] 
        FunctionsWithGenes[Function] = Genes.split(",") # the genes for each function are in the same row and separated by commas

第二个表包含一个包含基因列的.txt文件中所需的所有信息，例如：

chr    start    end    Gene    Value   MoreData
chr1    123    123    HAPPY    41.1    3.4
chr1    342    355    SAD    34.2    9.0
chr1    462    470    LEG    20.0    2.7

我在书中读到：

import pandas as pd 

df = pd.read_table(File)

dataframe包含多个列，其中一个列是"Genes“。此列可以包含可变数目的条目。我想通过FunctionsWithGenes字典中的“函数”键来分割数据。到目前为止，我已经：

df = df[df["Gene"].isin(FunctionsWithGenes.keys())] # to remove all rows with no matching entries

现在我需要基于基因功能来分割数据。也许我想增加一个带有基因功能的新专栏，但不确定这是否有效，因为有些基因可以有多个功能。

python

dictionary

pandas

dataframe

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-09-24 00:40:09

我对你的最后一行代码感到有点困惑：

 df = df[df["Gene"].isin(FunctionsWithGenes.keys())]

因为FunctionsWithGenes的键是实际函数(Emotions等)但基因列有价值。生成的DataFrame将始终为空。

如果我正确地理解了你，你想把表分成几个部分，这样属于一个函数的所有基因都在一个表中，如果是这样的话，你可以使用简单的字典理解，我设置了一些变量，类似于你的：

>>> for function, genes in FunctionsWithGenes.iteritems():
...     print function, genes
... 
Walking ['LEG', 'MUSCLE', 'TENDON', 'BLOOD']
Singing ['VOCAL', 'NECK', 'BLOOD', 'HAPPY']
Emotions ['HAPPY', 'SAD', 'GOOFY', 'SILLY']
>>> df
    Gene  Value
0  HAPPY   3.40
1    SAD   4.30
2    LEG   5.55

然后我像这样把DataFrame分开：

>>> FunctionsWithDf = {function:df[df['Gene'].isin(genes)]
...     for function, genes in FunctionsWithGenes.iteritems()}

现在，FunctionsWithDf是一种字典，它将Function映射到具有Gene列以FunctionsWithGenes[Function]值表示的所有行的DataFrame。

例如：

>>> FunctionsWithDf['Emotions']
    Gene  Value
0  HAPPY    3.4
1    SAD    4.3
>>> FunctionsWithDf['Singing']
    Gene  Value
0  HAPPY    3.4

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/26003662

复制

相似问题

问熊猫:基于字典的文件分割与编辑
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫:基于字典的文件分割与编辑EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫:基于字典的文件分割与编辑
EN