首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >加速比自定义连续随机变量

加速比自定义连续随机变量
EN

Stack Overflow用户
提问于 2017-11-22 03:16:41
回答 1查看 356关注 0票数 2

我已经创建了一个scipy.stats.rv_continuous子类,它似乎在做我想做的事情,但是它非常慢。下面的代码和测试结果。

我使用的分布函数(破坏幂律)很容易集成和计算属性,那么是否有另一种内部方法,我应该用解析值子类来使它更快呢?关于rvs实际上是如何绘制的,文档还不清楚,但大概是在寻找与cdf相反的东西。

代码语言:javascript
复制
class Broken_Power_Law(sp.stats.rv_continuous):

    def __init__(self, slopes, breaks, name='Broken_Power_Law'):
        """
        Here `slopes` are the power-law indices for each section, and
        `breaks` are the edges of each section such that `slopes[0]` applies
        between `breaks[0]` and `breaks[1]`, etc.
        """
        super().__init__(a=np.min(breaks), b=np.max(breaks), name=name)
        nums = len(slopes)

        # Calculate the proper normalization of the PDF semi-analytically
        pdf_norms = np.array([np.power(breaks[ii], slopes[ii-1] - slopes[ii]) if ii > 0 else 1.0
                              for ii in range(nums)])
        pdf_norms = np.cumprod(pdf_norms)

        # The additive offsets to calculate CDF values
        cdf_offsets = np.array([(an/(alp+1))*(np.power(breaks[ii+1], alp+1) -
                                              np.power(breaks[ii], alp+1))
                                for ii, (alp, an) in enumerate(zip(slopes, pdf_norms))])

        off_sum = cdf_offsets.sum()
        cdf_offsets = np.cumsum(cdf_offsets)
        pdf_norms /= off_sum
        cdf_offsets /= off_sum

        self.breaks = breaks
        self.slopes = slopes
        self.pdf_norms = pdf_norms
        self.cdf_offsets = cdf_offsets
        self.num_segments = nums
        return

    def _pdf(self, xx):
        mm = np.atleast_1d(xx)
        yy = np.zeros_like(mm)
        # For each power-law, calculate the distribution in that region 
        for ii in range(self.num_segments):
            idx = (self.breaks[ii] < mm) & (mm <= self.breaks[ii+1])
            aa = self.slopes[ii]
            an = self.pdf_norms[ii]
            yy[idx] = an * np.power(mm[idx], aa)

        return yy

    def _cdf(self, xx):
        mm = np.atleast_1d(xx)
        yy = np.zeros_like(mm)
        off = 0.0
        # For each power-law, calculate the cumulative dist in that region
        for ii in range(self.num_segments):
            # incorporate the cumulative offset from previous segments
            off = self.cdf_offsets[ii-1] if ii > 0 else 0.0
            idx = (self.breaks[ii] < mm) & (mm <= self.breaks[ii+1])
            aa = self.slopes[ii]
            an = self.pdf_norms[ii]
            ap1 = aa + 1
            yy[idx] = (an/(ap1)) * (np.power(mm[idx], ap1) - np.power(self.breaks[ii], ap1)) + off

        return yy

在测试时:

代码语言:javascript
复制
> test1 = sp.stats.norm()
> %timeit rvs = test1.rvs(size=100)
46.3 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

> test2 = Broken_Power_Law([-1.3, -2.2, -2.7], [0.08, 0.5, 1.0, 150.0])
> %timeit rvs = test2.rvs(size=100)
200 ms ± 8.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

即慢5000倍!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-11-22 04:39:04

一种解决方案是重写_rvs方法本身,并使用解析公式来使用反变换采样绘制样本。

代码语言:javascript
复制
def _rvs(self, size=None):
    """Invert the CDF (semi)-analytically to draw samples from distribution.
    """
    if size is None:
        size = self._size
    rands = np.random.uniform(size=size)
    samps = np.zeros_like(rands)
    # Go over each segment region, find the region each random-number belongs in based on
    #    the offset values
    for ii in range(self.num_segments):
        lo = self.cdf_offsets[ii]
        hi = self.cdf_offsets[ii+1]
        idx = (lo <= rands) & (rands < hi)

        mlo = self.breaks[ii]
        aa = self.slopes[ii]
        an = self.pdf_norms[ii]
        ap1 = aa + 1

        vals = (ap1/an) * (rands[idx] - lo) + np.power(mlo, ap1)
        samps[idx] = np.power(vals, 1.0/ap1)

    return samps

速度几乎和内置采样一样,

代码语言:javascript
复制
> %timeit rvs = test3.rvs(size=100)
56.8 µs ± 1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
票数 4
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/47426241

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档