山顶
使用X&Y的状态来发现异常,其中X的值已经达到峰值。
在异常周围获取数据中的一组数据。例如,异常前5行和异常后5行。
这种异常也可能是全球趋势中局部趋势的起点。基本上,从数据中提取一个时间序列的子序列,并查看这个本地趋势以获得更多的信息,特别是确认本地趋势的信号没有逆转。
通过确定X值为@最高点(这是振荡值)来识别和验证局部趋势。它也类似于直方图的中心值。我们需要确定X峰之前和之后的值都是出租人值,而不是X峰。理想情况下,我们希望在前后确认几个值。
样本数据
df = pd.DataFrame({
'X': [-0.27, -0.28, -0.33, -0.37, -0.60, -0.90, -0.99, -0.94, -0.85, -0.75, -0.64, -0.51, -0.35, -0.21, 1.78, 1.98, 2.08, 2.42, 2.56, 2.51, 2.57, 2.53, 2.37, 2.24, 2.11, 2.01, 1.82, 1.64, ],
'X_State': ['3', '3', '3', '3', '5', '5', '5', '5', '5', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '6', '6', '6', '6', '6', ],
'Y_State': ['23', '23', '23', '23', '24', '24', '24', '24', '24', '23', '23', '23', '22', '22', '18', '18', '18', '17', '17', '18', '17', '17', '18', '18', '18', '18', '18', '19', ],
})
df2 = pd.DataFrame() #create new empty dataframe第二个dataframe用于存储我们找到的子集数据。
码
Label = []
# Get Previous
df['X_STATE_Previous_Value'] = df.X_State.shift(1)
df['Y_STATE_Previous_Value'] = df.Y_State.shift(1)
df['Y_STATE_Change'] = (df.Y_State.ne(df.Y_State.shift())).astype(int)
for index, row in df.iterrows():
if (row['Y_State'] == '17' and row['Y_STATE_Previous_Value'] == '18'):
Label.append('Index Position: ' + str(index))
# Select 5 rows before and after
df2 = df2.append(df.iloc[index-5:index+5])
# Find where X peaked
for i, row2 in df2.iterrows():
# get index position of the first instance of the largest value
peak = df2.X.idxmax()
# Go back and label where X peaked
df.loc[peak, 'Label'] = 'Top of Peak'
else:
Label.append('...')
df['Label'] = Label
df2['Max_Label'] = peak
print(df)
print(df2)
#del df2 需要帮助
第一。峰值标签顶部不更新df,甚至被引用为df。它正在更新df2,最终df2只是临时的,以帮助我们找到峰值。
第二,寻找更好的确定山顶顶的方法。在子集中使用max值,这实际上并不是确认之前和之后的值都是出租人。
发布于 2018-05-04 14:23:31
如果我明白的话,我会怎样做你想要做的事情:
import pandas as pd
df = pd.DataFrame({
'X': [-0.27, -0.28, -0.33, -0.37, -0.60, -0.90, -0.99, -0.94, -0.85, -0.75, -0.64, -0.51, -0.35, -0.21, 1.78, 1.98, 2.08, 2.42, 2.56, 2.51, 2.57, 2.53, 2.37, 2.24, 2.11, 2.01, 1.82, 1.64, ],
'X_State': ['3', '3', '3', '3', '5', '5', '5', '5', '5', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '6', '6', '6', '6', '6', ],
'Y_State': ['23', '23', '23', '23', '24', '24', '24', '24', '24', '23', '23', '23', '22', '22', '18', '18', '18', '17', '17', '18', '17', '17', '18', '18', '18', '18', '18', '19', ],
})
df['X_STATE_Previous_Value'] = df.X_State.shift(1)
df['Y_STATE_Previous_Value'] = df.Y_State.shift(1)
df['Y_STATE_Change'] = (df.Y_State.ne(df.Y_State.shift())).astype(int)
df['Label'] = '' #or '...' if you like better
# get a list of indexes where abnormality:
abnormal_idx = df[(df['Y_State'] == '17') & (df['Y_STATE_Previous_Value'] == '18')].index
# write it in column Label:
df.loc[abnormal_idx ,'Label'] = 'abnormality'
# get a subset of +/- 5 rows around abnormalities
df2 = df[min(abnormal_idx )-5:max(abnormal_idx )+5]
# and the max of X on this subset
peak_idx = df2.X.idxmax()
# you don't really df2, you can do directly: peak_idx = df[min(abnormal_idx )-5:max(abnormal_idx )+5].X.idxmax()
# add this number in a column, not sure why?
df['Max_Label'] = peak_idx如果它对你想要的东西有用,请告诉我。
编辑:表示子集max,您可以:
df['subset_max'] = ''
for idx in abnormal_idx:
idx_max = df[idx-5:idx+6].X.idxmax()
#note the +6 instead of +5 as the upbound is not consider, sorry for that
if idx == idx_max:
df.loc[idx,'subset_max'] = 'max of the subset'
else:
df.loc[idx, 'subset_max'] = 'subset max at %s' % idx_maxhttps://stackoverflow.com/questions/50175886
复制相似问题