我正在从SAS迁移到R。我需要帮助找出如何汇总日期范围的天气数据。在SAS中,我获取日期范围,使用数据步骤为范围中的每个日期(使用startdate、enddate、date)创建一条记录,与天气合并,然后汇总(VAR hdd cdd;CLASS=startdate enddate sum=)以汇总日期范围的值。
R代码:
startdate <- c(100,103,107)
enddate <- c(105,104,110)
billperiods <-data.frame(startdate,enddate);要获得以下信息:
> billperiods
startdate enddate
1 100 105
2 103 104
3 107 110R代码:
weatherdate <- c(100:103,105:110)
hdd <- c(0,0,4,5,0,0,3,1,9,0)
cdd <- c(4,1,0,0,5,6,0,0,0,10)
weather <- data.frame(weatherdate,hdd,cdd)要获得以下信息:
> weather
weatherdate hdd cdd
1 100 0 4
2 101 0 1
3 102 4 0
4 103 5 0
5 105 0 5
6 106 0 6
7 107 3 0
8 108 1 0
9 109 9 0
10 110 0 10注:缺少weatherdate = 104。我可能一天都不会有天气。
我想不出怎么去:
> billweather
startdate enddate sumhdd sumcdd
1 100 105 9 10
2 103 104 5 0
3 107 110 13 10其中sumhdd是天气data.frame中从startdate到enddate的data.frame的总和。
有什么想法吗?
发布于 2013-03-26 05:30:16
下面是一个使用IRanges和data.table的方法。似乎,对于这个问题,这个答案可能看起来有点过头了。但是总的来说,我发现使用IRanges来处理间隔是很方便的,不管它们是多么的简单。
# load packages
require(IRanges)
require(data.table)
# convert data.frames to data.tables
dt1 <- data.table(billperiods)
dt2 <- data.table(weather)
# construct Ranges to get overlaps
ir1 <- IRanges(dt1$startdate, dt1$enddate)
ir2 <- IRanges(dt2$weatherdate, width=1) # start = end
# find Overlaps
olaps <- findOverlaps(ir1, ir2)
# Hits of length 10
# queryLength: 3
# subjectLength: 10
# queryHits subjectHits
# <integer> <integer>
# 1 1 1
# 2 1 2
# 3 1 3
# 4 1 4
# 5 1 5
# 6 2 4
# 7 3 7
# 8 3 8
# 9 3 9
# 10 3 10
# get billweather (final output)
billweather <- cbind(dt1[queryHits(olaps)],
dt2[subjectHits(olaps),
list(hdd, cdd)])[, list(sumhdd = sum(hdd),
sumcdd = sum(cdd)),
by=list(startdate, enddate)]
# startdate enddate sumhdd sumcdd
# 1: 100 105 9 10
# 2: 103 104 5 0
# 3: 107 110 13 10最后一行的代码分解:首先我使用queryHits,subjectHits和cbind构造了一个中途data.table,然后,我按startdate, enddate进行分组,并得到hdd和cdd的和。为了更好地理解,如下所示,单独查看这一行更容易。
# split for easier understanding
billweather <- cbind(dt1[queryHits(olaps)],
dt2[subjectHits(olaps),
list(hdd, cdd)])
billweather <- billweather[, list(sumhdd = sum(hdd),
sumcdd = sum(cdd)),
by=list(startdate, enddate)]发布于 2013-03-26 05:16:08
billweather <- cbind(billperiods,
t(apply(billperiods, 1, function(x) {
colSums(weather[weather[, 1] %in% c(x[1]:x[2]), 2:3])
})))发布于 2013-03-26 05:20:26
cbind(billperiods, t(sapply(apply(billperiods, 1, function(x)
weather[weather$weatherdate >= x[1] &
weather$weatherdate <= x[2], c("hdd", "cdd")]), colSums)))
startdate enddate hdd cdd
1 100 105 9 10
2 103 104 5 0
3 107 110 13 10https://stackoverflow.com/questions/15624706
复制相似问题