文章/答案/技术大牛

发布

社区首页 >问答首页 >Python加权分位数为R wtd.quantile()

问Python加权分位数为R wtd.quantile()
EN

Stack Overflow用户

提问于 2022-07-12 20:18:29

回答 1查看 125关注 0票数 0

我想将R包Hmisc::wtd.quantile()转换为python。

以下是R中的示例：

我以this为参考，逻辑似乎与R不同：

# First function
def weighted_quantile(values, quantiles, sample_weight = None,
                              values_sorted = False, old_style = False):
    """ Very close to numpy.percentile, but supports weights.
    NOTE: quantiles should be in [0, 1]!
    :param values: numpy.array with data
    :param quantiles: array-like with many quantiles needed
    :param sample_weight: array-like of the same length as `array`
    :return: numpy.array with computed quantiles.
    """
    values = np.array(values)
    quantiles = np.array(quantiles)
    if sample_weight is None:
        sample_weight = np.ones(len(values))
    sample_weight = np.array(sample_weight)
    assert np.all(quantiles >= 0) and np.all(quantiles <= 1),         'quantiles should be in [0, 1]'

    if not values_sorted:
        sorter = np.argsort(values)
        values = values[sorter]
        sample_weight = sample_weight[sorter]

    # weighted_quantiles = np.cumsum(sample_weight)
    # weighted_quantiles /= np.sum(sample_weight)
    weighted_quantiles = np.cumsum(sample_weight)/np.sum(sample_weight)
    return np.interp(quantiles, weighted_quantiles, values)

weighted_quantile(values = [0.4890342, 0.4079128, 0.5083345, 0.2136325, 0.6197319],
                  quantiles = np.arange(0, 1 + 1 / 5, 1 / 5),
                  sample_weight = [1,1,1,1,1])

>> array([0.2136325, 0.2136325, 0.4079128, 0.4890342, 0.5083345, 0.6197319])

# Second function
def weighted_percentile(data, weights, perc):
    """
    perc : percentile in [0-1]!
    """
    data = np.array(data)
    weights = np.array(weights)
    ix = np.argsort(data)
    data = data[ix] # sort data
    weights = weights[ix] # sort weights
    cdf = (np.cumsum(weights) - 0.5 * weights) / np.sum(weights) # 'like' a CDF function
    return np.interp(perc, cdf, data)

weighted_percentile([0.4890342, 0.4079128, 0.5083345, 0.2136325, 0.6197319], [1,1,1,1,1], np.arange(0, 1 + 1 / 5, 1 / 5))

>> array([0.2136325 , 0.31077265, 0.4484735 , 0.49868435, 0.5640332 ,
       0.6197319 ])

两个人都和R不一样有什么想法吗？

python

numpy

quantile

hmisc

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-07-12 23:22:21

我是Python文盲，但据我所见，经过一些快速检查后，我可以告诉你以下几点。

这里使用统一(抽样)权重，因此也可以直接使用quantile()函数。毫不奇怪，它给出的结果与具有相同权重的wtd.quantile()相同：

x <- c(0.4890342, 0.4079128, 0.5083345, 0.2136325, 0.6197319)
n <- length(x)
x <- sort(x)
quantile(x, probs = seq(0,1,0.2))
#       0%       20%       40%       60%       80%      100% 
# 0.2136325 0.3690567 0.4565856 0.4967543 0.5306140 0.6197319

R quantile()函数以“教科书”的方式得到分位数，即通过确定指数I与i= q(n+1)一起使用。就你而言：

seq(0,1,0.2)*(n+1)
# 0.0 1.2 2.4 3.6 4.8 6.0

当然，由于您有5个值/obs，并且您想要的是五分位数，所以索引不是整数。但是，您知道，例如，第一个五分之一(i = 1.2)位于OB-1和OB-2之间。更准确地说，它是两个观测值的线性组合(“权重”是从指数的值中得出的)：

0.2*x[1] + 0.8*x[2]
# 0.3690567

您也可以根据索引对所有的五分位数进行相同的操作：

q <-
    c(min(x), ## 0: actually, the first obs
      0.2*x[1] + 0.8*x[2], ## 1.2: quintile lies between obs 1 and 2
      0.4*x[2] + 0.6*x[3], ## 2.4: quintile lies between obs 2 and 3
      0.6*x[3] + 0.4*x[4], ## 3.6: quintile lies between obs 3 and 4
      0.8*x[4] + 0.2*x[5], ## 4.8: quintile lies between obs 4 and 5
      max(x)  ## 6: actually, the last obs
      )
q
# 0.2136325 0.3690567 0.4565856 0.4967543 0.5306140 0.6197319

您可以看到，您得到的正是quantile()和wtd.quantile()的输出。

如果我们不考虑0.2*x[1] + 0.8*x[2]，而是考虑以下几点：

0.5*x[1] + 0.5*x[2]
# 0.3107726

我们得到第二个Python函数的输出。似乎你的第二个函数在结合这两个观测时考虑了均匀的“权重”(很明显，这里我不是在讨论抽样权重)。这个问题(至少对于第二个Python函数来说是这样)。我知道这些只是洞察力，但我希望它们能有所帮助。

编辑:请注意，这两者之间的差异是不必要的，这是python代码的一个“问题”。有不同的分位数估计器(及其加权版本)，python函数可以依赖与Hmisc::wtd.quantile()不同的估计器。我认为后者使用了Harrell-Davis分位数估计量的加权版本。如果您真的想实现这个，您应该检查Hmisc::wtd.quantile()的源代码，并尝试将其“直接”转换为Python。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72957948

复制

相似问题

问Python加权分位数为R wtd.quantile()
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python加权分位数为R wtd.quantile()EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python加权分位数为R wtd.quantile()
EN