文章/答案/技术大牛

发布

社区首页 >问答首页 >Python -如何将时间序列数据标准化

问Python -如何将时间序列数据标准化
EN

Stack Overflow用户

提问于 2013-10-08 19:46:52

回答 5查看 22.6K关注 0票数 10

我有一个时间序列例子的数据集。我想要计算各种时间序列例子之间的相似性，但是我不想考虑由于缩放而产生的差异(也就是说，我想看看时间序列形状上的相似之处，而不是它们的绝对值)。因此，为了达到这个目的，我需要一种数据正常化的方法。也就是说，使所有的时间序列示例介于某个区域之间，例如0,100。有人能告诉我如何在python中做到这一点吗？

python

time-series

回答 5

Stack Overflow用户

回答已采纳

发布于 2013-10-08 20:35:22

假设您的timeseries是一个数组，请尝试如下所示：

(timeseries-timeseries.min())/(timeseries.max()-timeseries.min())

这将将您的值限制在0到1之间。

票数 11

Stack Overflow用户

发布于 2017-05-09 15:30:38

给出的解决方案是好的，对于一个系列，既不是增量，也不是衰老(平稳)。在金融时间序列(或任何其他带有偏见的序列)中，给出的公式是不正确的。首先，它应该在最新的100到200个样本中去趋势或执行缩放.

如果时间序列不来自正态分布(如金融中的情况)，则最好应用非线性函数(例如，一个标准的CDF函数)来压缩离群值。

Aronson和Masters (用于算法交易的统计健全机器学习)使用以下公式( 200天块)：

V= 100 *N( 0.5( X -F50)/(F75-F25)) -50

其中：

X:数据点

F50 :最近200点的平均值

F75 :百分位数75

F25 :百分位数25

N:正常民防

票数 14

Stack Overflow用户

发布于 2017-05-09 20:43:40

在我之前的评论之后，这里是一个(非优化的) python函数，它进行缩放和/或规范化：(它需要一个熊猫DataFrame作为输入，并且它不检查它，所以如果提供另一个对象类型，它会引发错误。如果需要使用列表或numpy.array，则需要对其进行修改。但是可以首先将这些对象转换为pandas.DataFrame()。

这个函数很慢，所以最好只运行一次并存储结果。

    from scipy.stats import norm
    import pandas as pd

    def get_NormArray(df, n, mode = 'total', linear = False):
        '''
                 It computes the normalized value on the stats of n values ( Modes: total or scale ) 
                 using the formulas from the book "Statistically sound machine learning..."
                 (Aronson and Masters) but the decission to apply a non linear scaling is left to the user.
                 It is modified to fit the data from -1 to 1 instead of -100 to 100
                 df is an imput DataFrame. it returns also a DataFrame, but it could return a list.
                 n define the number of data points to get the mean and the quartiles for the normalization
                 modes: scale: scale, without centering. total:  center and scale.
         '''
        temp =[]

        for i in range(len(df))[::-1]:

            if i  >= n: # there will be a traveling norm until we reach the initian n values. 
                        # those values will be normalized using the last computed values of F50,F75 and F25
                F50 = df[i-n:i].quantile(0.5)
                F75 =  df[i-n:i].quantile(0.75)
                F25 =  df[i-n:i].quantile(0.25)

            if linear == True and mode == 'total':
                 v = 0.5 * ((df.iloc[i]-F50)/(F75-F25))-0.5
            elif linear == True and mode == 'scale':
                 v =  0.25 * df.iloc[i]/(F75-F25) -0.5
            elif linear == False and mode == 'scale':
                 v = 0.5* norm.cdf(0.25*df.iloc[i]/(F75-F25))-0.5

            else: # even if strange values are given, it will perform full normalization with compression as default
                v = norm.cdf(0.5*(df.iloc[i]-F50)/(F75-F25))-0.5

            temp.append(v[0])
        return  pd.DataFrame(temp[::-1])

票数 7

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/19256930

复制

相似问题

问Python -如何将时间序列数据标准化
EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python -如何将时间序列数据标准化EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python -如何将时间序列数据标准化
EN