我在一个数组中有5列是DateTime类型的值。每列中都有100k+值
这些值是从2015-01-15 00:30到2020-12-31 23:00这段时间线的DateTime,以30分钟为间隔。
基本上,我想要做的是遍历数组中的值,并检查当前值是否是从上一个值开始的精确30分钟时间步长。在包含列的数组中,这将是当前被调查的值之上的值
可能有几种方法可以做到这一点,但我已经包含了一个伪代码示例,说明我是如何思考它的
for row in _the_whole_array :
for cell in row:
if cell == to the 30 minute timestep of the cell above it
continue iterating
else:
store that value
return the smallest timestep found, and the biggest timestep found 我已经查看了for循环和nditer,但在迭代日期时间时遇到错误,我还想知道如何才能找到高于当前单元格值的单元格值。
非常感谢您的帮助。
发布于 2021-09-26 15:30:00
尝试将您的npArray转换为pandas dataFrame,因为它允许您遍历它的行,这里有一段代码片段可以帮助您。
import pandas
import numpy
# Creating Dataframe From NumPy Array
df = pd.DataFrame(yout_array)
#iterating through the dataframe
for index, row in df.iterrows():
# print current row
print(row)
# print previous row
print(df[index -1]) if index != 0 else print('no previous row')发布于 2021-09-26 16:51:09
即使您在伪代码中使用行,我也不确定您的轴。但总体思想是相同的:如果您的dtype是datetime,则可以对其进行基本的算术运算。
x, y = _the_whole_array.shape
for row in range(x):
diff=_the_whole_array[row][1:]-_the_whole_array[row][:-1]
print(np.amin(diff), np.amax(diff))
for column in range(y):
diff=_the_whole_array[:,column][1:]-_the_whole_array[:,column][:-1]
print(np.amin(diff), np.amax(diff))发布于 2021-09-26 19:16:59
正如@Heidiki所提到的,考虑使用pandas,因为它将更容易处理大型数据。
为了解决您的问题,您可以在循环时创建一个临时变量来存储前一行的值。在每次迭代中,您计算差异并检查绝对min和max,这与您的伪代码完全相同。
这里有一个例子和测试数据,3x5矩阵。请检查输出是否符合您的要求。
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
# test data
arr = np.array([
[datetime(2019, 1, 2, 13, 12), datetime(2019, 1, 2, 15, 19),
datetime(2019, 1, 2, 15, 59), datetime(2019, 1, 2, 17, 23),
datetime(2019, 1, 2, 15, 18)],
[datetime(2019, 1, 2, 13, 34), datetime(2019, 1, 2, 15, 57),
datetime(2019, 1, 2, 18, 53), datetime(2019, 1, 2, 17, 34),
datetime(2019, 1, 2, 15, 29)],
[datetime(2019, 1, 2, 13, 49), datetime(2019, 1, 2, 16, 35),
datetime(2019, 1, 2, 21, 18), datetime(2019, 1, 2, 17, 59),
datetime(2019, 1, 2, 15, 46)]
])
def timedelta_to_minutes(dt: timedelta) -> int:
return (dt.days * 24 * 60) + (dt.seconds // 60)
def min_max_timestep(data: np.array) -> tuple:
prev_row = min_step = max_step = None
df = pd.DataFrame(data)
for idx, row in df.iterrows():
if not idx:
prev_row = row
continue
diff = row - prev_row
min_diff, max_diff = min(diff), max(diff)
if min_step is None or min_diff < min_step:
min_step = min_diff
if max_step is None or max_diff > max_step:
max_step = max_diff
prev_row = row
# convert timedelta to minutes
return timedelta_to_minutes(min_step), timedelta_to_minutes(max_step)
result = min_max_timestep(arr)https://stackoverflow.com/questions/69336199
复制相似问题