我需要为每一行查找按用户和ID_option分组的最后3小时的使用情况(使用是数据集中的列之一)。每一行(行)代表一个记录(在3分钟的时间间隔内)。例如(包括所需的列sum_usage_3hr):
User ID_option time usage sum_usage_3hr
1 a1 12OCT2017:11:20:32 3 10
1 a1 12OCT2017:10:23:24 7 14
1 b1 12OCT2017:09:34:55 12 12
2 b1 12OCT2017:08:55:06 4 6
1 a1 12OCT2017:07:59:53 7 7
2 b1 12OCT2017:06:59:12 2 2我对哈希表使用了下面的代码:
data want;
if _n_=1 then do;
if 0 then set have(rename=(usage=_usage));
declare hash h(dataset:'have(rename=(usage=_usage))',hashexp:20);
h.definekey('user','id_option','time');
h.definedata('_usage');
h.definedone();
end;
set have;
sum_usage_3hr=0;
do i=time-3*3600 to time ;
if h.find(key:user,key:id_option,key:i)=0 then sum_usage_3hr+_usage;
end;
drop _usage i;
run;但是我得到了一个错误:无效的DO循环控制信息,不管是初始的还是要的表达式都丢失了,或者BY表达式丢失了,0或者无效。如果我添加: output;end:就在"run;“上面,它会给我一个错误:‘没有匹配的DO/Select语句’。有人知道是什么导致了这个问题吗?我还有一个版本,首先对表进行排序,并给出同样的错误。谢谢
在执行完“回答”之后:
User ID_option time usage sum_usage_3hr col_i_got
1 a1 12OCT2017:11:22:32 3 12 3
1 a1 12OCT2017:11:20:24 0 9 3
1 a1 12OCT2017:10:34:55 2 9 2
1 a1 12OCT2017:09:55:06 0 7 2
1 a1 12OCT2017:09:43:45 0 7 0
1 a1 12OCT2017:08:59:53 7 7 7
1 a1 12OCT2017:06:59:12 0 0 7发布于 2017-11-28 11:35:53
试试这个:
问题1:
输入:
data have;
input User ID_option $ time usage ;
informat time datetime18.;
format time datetime18.;
cards;
1 a1 12OCT2017:11:20:32 3
1 a1 12OCT2017:10:23:24 7
1 b1 12OCT2017:09:34:55 12
2 b1 12OCT2017:08:55:06 4
1 a1 12OCT2017:07:59:53 7
2 b1 12OCT2017:06:59:12 2
;
run;代码:
proc sort data=have out=have1;
by user id_option time;
quit;
data have2;
set have1;
by user id_option;
format previous_time datetime18.;
previous_time = lag(time);
previous_usage = lag(usage);
if first.ID_option then previous_time=.;
if previous_time ~= . and intnx("hour",time,-3,"s") <= previous_time <= time then sum_usage_3hr=usage+previous_usage;
else sum_usage_3hr = usage;
drop previous_time previous_usage;
run;
proc sort data=have2 out=want;
by descending time ;
quit;输出:
User ID_option time usage sum_usage_3hr
1 a1 12Oct2017 11:20:32 3 10
1 a1 12Oct2017 10:23:24 7 14
1 b1 12Oct2017 9:34:55 12 12
2 b1 12Oct2017 8:55:06 4 6
1 a1 12Oct2017 7:59:53 7 7
2 b1 12Oct2017 6:59:12 2 2Problem2:
输入:
data have;
input user1 ID_option $ time usage ;
informat time datetime18.;
format time datetime18.;
cards;
1 a1 12OCT2017:11:22:32 3
1 a1 12OCT2017:11:20:24 0
1 a1 12OCT2017:10:34:55 2
1 a1 12OCT2017:09:55:06 0
1 a1 12OCT2017:09:43:45 0
1 a1 12OCT2017:08:59:53 7
1 a1 12OCT2017:06:59:12 0
;
run;代码:
proc sql;
create table want as
select user1,id_option,time,min(usage) as usage,sum(usage1) as sum_usage_3hr
from
(
select a.*,b.time as time1 ,b.usage as usage1
from
have a
left join
have b
on a.user1 = b.user1 and a.id_option = b.id_option and b.time <= a.time
where intck("hour",a.time ,b.time) >= -3
)
group by 1,2,3
order by time desc;
quit; 输出:
user1 ID_option time usage sum_usage_3hr
1 a1 12Oct2017 11:22:32 3 12
1 a1 12Oct2017 11:20:24 0 9
1 a1 12Oct2017 10:34:55 2 9
1 a1 12Oct2017 9:55:06 0 7
1 a1 12Oct2017 9:43:45 0 7
1 a1 12Oct2017 8:59:53 7 7
1 a1 12Oct2017 6:59:12 0 0如果有任何疑问,请告诉我。
https://stackoverflow.com/questions/47530030
复制相似问题