我有一个字典,类似于{:datetime [unix-timestamp] :count [longs]}。
在:datetime和:count中有相同数量的东西。
:datetime没有指定的间隔,通常是滴答数据。我想重新采样数据,以便它们具有定义的间隔,例如5分钟,并汇总范围的:count。
示例:
{
:datetime [timestamp every minute]
:count [1 1 1 1 1. . .]
} 将其重新采样为
{
:datetime [timestamp every 5 minutes]
:count [5 5 5 5 5 ...]
}发布于 2014-07-07 06:06:15
您希望从时间戳向量中获取五分之一的元素,并从计数向量中添加五个计数的组。下面这样的代码就可以做到:
(defn resample [m]
(let [{dt :datetime ct :count} m
newdt (map first (partition 5 dt))
newct (map (partial apply +) (partition 5 ct))]
{:datetime newdt
:count newct}))发布于 2014-07-07 17:54:16
这里有一些花哨的东西,但可能效率不高:
(defn resample-5 [{:keys [datetime count]}]
(letfn [(floor-5 [dt] (- dt (mod dt (* 5 60 1000))))
(sum-counts [[time pairs]]
[time (reduce + (map second pairs))])]
(let [pairs (partition 2 (interleave datetime count))
pair-groups (group-by #(floor-5 (first %)) pairs)
sums (map sum-counts pair-groups)]
{:datetime (map first sums)
:count (map second sums)})))注意它对集合执行了多少操作:interleave、partition、group-by、map+reduce,然后再次执行map两次。
这里有一些更高效的东西,它只扫描集合一次:
(defn resample-5 [{:keys [datetime count]}]
(letfn [(add-tick [result dt c]
(if dt
(-> result
(update-in [:datetime] conj dt)
(update-in [:count] conj c))
result))]
(loop [datetimes datetime
counts count
rounded-last nil
count-last 0
result {:datetime [] :count []}]
(if (empty? datetimes)
(add-tick result rounded-last count-last)
(let [dt (first datetimes)
c (first counts)
rounded (- dt (mod dt (* 5 60 1000)))]
(if (= rounded-last rounded)
(recur (rest datetimes) (rest counts) rounded (+ count-last c) result)
(recur (rest datetimes) (rest counts) rounded c (add-tick result rounded-last count-last))))))))https://stackoverflow.com/questions/24600107
复制相似问题