文章/答案/技术大牛

发布

社区首页 >问答首页 >Pandas:在每列上应用lambda时使用rowname

问Pandas:在每列上应用lambda时使用rowname
EN

Stack Overflow用户

提问于 2018-10-30 00:35:03

回答 1查看 92关注 0票数 0

在对数据帧中的所有列执行操作时，我尝试使用索引(或行名)。以下是我的数据帧的结构：

gene    6   6   6   6   6   6   8   8   8   10  ... 28  67  67  67  67  67  67  35  35  35                                                                                  
mn:1:chr1:un    0   1   0   0   0   0   3   0   1   2   ... 17  8   8   6   8   7   14  9   17  15
pl:1:chr1:un    0   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0
mn:2:chr1:un    1   0   0   0   0   1   0   0   0   0   ... 16  2   3   4   3   6   12  11  10  4
mn:3:chr1:un    7   16  10  9   8   7   11  10  15  9   ... 295 153 130 173 194 187 181 265 269 271

我想要做的是应用一个归一化函数，如下所示：

count = count.apply(lambda x: (x * 114 * 1000000) / (np.sum(x) * lengthDict[rowname]), axis=0)

简化版：

dataframe = for each element in dataframe: {perform some operation involving constant on element ÷ (sum of column containing element × dictionary[row index])}

其中count是我的数据帧，x应该是每列中的单个元素。这里的问题是lengthDict，它是一个包含每行数值的字典。在某种程度上，我尝试对元素使用列的和，并与lengthDict返回的值相乘，该值取决于索引。我尝试使用x.name，但它返回列的名称。有没有一种有效的方法来做到这一点？

编辑:这是lengthDict - {'mn:1:chr1:un': 1680,'mn:2:chr1:un': 1000,'mn:3:chr1:un': 10040,'pl:1:chr1:un': 2960,'mn:5:chr1:un': 14000}的结构。它本质上是将索引映射到一个数值。

下面是我如何初始化和设置dataframe本身：

count = pd.read_csv("count.csv")
count = count.set_index('gene') 

Intended output:
gene        6   6   6   6   6   6   8   8   8   10  ... 28  67  67  67  67  67  67  35  35  35                                                                                  
    mn:1:chr1:un    0.000000    16.534392   0.000000    0.000000    0.000000    0.000000    29.614697   0.000000    10.126420   27.466967   ... 9.467610    9.224107    9.082131    6.759914    6.741892    5.856967    11.921943   5.707930    10.533360   9.566057
    pl:1:chr1:un    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
    mn:2:chr1:un    27.893320   0.000000    0.000000    0.000000    0.000000    32.167043   0.000000    0.000000    0.000000    0.000000    ... 14.969962   3.874125    5.721743    7.571104    4.247392    8.434032    17.167597   11.720283   10.409438   4.285593
    mn:3:chr1:un    19.447534   44.267375   28.098445   28.521137   25.638344   22.427221   18.169974   16.413099   25.416912   20.682298   ... 27.490903   29.518980   24.695436   32.614565   27.357040   26.181341   25.791294   28.122737   27.889829   28.919219

在此错误中使用x.index会产生以下结果：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-78-da4ea45fc265> in <module>()
      9 #count = count.T
---> 10 count = count.apply(lambda x: (x * 114 * 1000000) / (np.sum(x) * lengthDict[x.index]), axis=0)
     11 count = count.groupby(by=count.columns, axis=1).median()

/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
   6012                          args=args,
   6013                          kwds=kwds)
-> 6014         return op.get_result()
   6015 
   6016     def applymap(self, func):

/anaconda3/lib/python3.7/site-packages/pandas/core/apply.py in get_result(self)
    316                                       *self.args, **self.kwds)
    317 
--> 318         return super(FrameRowApply, self).get_result()
    319 
    320     def apply_broadcast(self):

/anaconda3/lib/python3.7/site-packages/pandas/core/apply.py in get_result(self)
    140             return self.apply_raw()
    141 
--> 142         return self.apply_standard()
    143 
    144     def apply_empty_result(self):

/anaconda3/lib/python3.7/site-packages/pandas/core/apply.py in apply_standard(self)
    246 
    247         # compute the result using the series generator
--> 248         self.apply_series_generator()
    249 
    250         # wrap results

/anaconda3/lib/python3.7/site-packages/pandas/core/apply.py in apply_series_generator(self)
    275             try:
    276                 for i, v in enumerate(series_gen):
--> 277                     results[i] = self.f(v)
    278                     keys.append(v.name)
    279             except Exception as e:

<ipython-input-78-da4ea45fc265> in <lambda>(x)
      9 #count = count.T
     10 #count = (count * 114 * 1000000) / (genes[5] * count.sum())
---> 11 count = count.apply(lambda x: (x * 114 * 1000000) / (np.sum(x) * lengthDict[x.index]), axis=0)
     12 #count = count.T
     13 count = count.groupby(by=count.columns, axis=1).median()

/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in __hash__(self)
   2060 
   2061     def __hash__(self):
-> 2062         raise TypeError("unhashable type: %r" % type(self).__name__)
   2063 
   2064     def __setitem__(self, key, value):

TypeError: ("unhashable type: 'Index'", 'occurred at index 6')

python

pandas

dataframe

回答 1

Stack Overflow用户

发布于 2018-10-30 23:16:43

我决定使用一种更原始、更不优雅的方法。代码如下：

sumCount = count.sum()
sumCount = sumCount.tolist()

count = count * (fragLength * 1000000)
length = count.index.to_series().map(lengthDict)
length = length.tolist()
scaleMatrix = np.zeros(shape=(len(sumCount),len(length)))

for i in range(0, len(sumCount)):
    for k in range(0, len(length)):
        scaleMatrix[i,k] = sumCount[i] * length[k]

scaleDataframe = pd.DataFrame(data = scaleMatrix.T, columns=count.columns, index=count.index)
count = count.divide(scaleDataframe)

我没有直接对数据帧进行操作，而是创建了一个包含缩放因子的独立数据帧，并将原始数据帧与"scalingFactor“数据帧分开。这似乎是可行的，但是仍然不能解释为什么我在使用lambda/apply时不能访问行名。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/53050012

复制

相似问题

问Pandas:在每列上应用lambda时使用rowname
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pandas:在每列上应用lambda时使用rownameEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pandas:在每列上应用lambda时使用rowname
EN