使用一个数据帧对R中另一个数据帧的数据范围求和

使用一个数据帧对R中另一个数据帧的数据范围求和,r,dataframe,R,Dataframe,我正在从SAS迁移到R。我需要帮助了解如何总结日期范围的天气数据。在SAS中,我获取日期范围,使用数据步骤为范围中的每个日期(使用startdate,enddate,date)创建一个记录,与天气合并,然后汇总(VAR hdd cdd;CLASS=startdate enddate sum=)以汇总日期范围的值 R代码: startdate <- c(100,103,107) enddate <- c(105,104,110) billperiods <-data.frame(

我正在从SAS迁移到R。我需要帮助了解如何总结日期范围的天气数据。在SAS中,我获取日期范围,使用数据步骤为范围中的每个日期(使用
startdate
enddate
date
)创建一个记录,与天气合并,然后汇总(VAR hdd cdd;CLASS=startdate enddate sum=)以汇总日期范围的值

R代码:

startdate <- c(100,103,107)
enddate <- c(105,104,110)
billperiods <-data.frame(startdate,enddate);
weatherdate <- c(100:103,105:110)
hdd <- c(0,0,4,5,0,0,3,1,9,0)
cdd <- c(4,1,0,0,5,6,0,0,0,10)
weather <- data.frame(weatherdate,hdd,cdd)
R代码:

startdate <- c(100,103,107)
enddate <- c(105,104,110)
billperiods <-data.frame(startdate,enddate);
weatherdate <- c(100:103,105:110)
hdd <- c(0,0,4,5,0,0,3,1,9,0)
cdd <- c(4,1,0,0,5,6,0,0,0,10)
weather <- data.frame(weatherdate,hdd,cdd)
注意:
weatherdate=104
缺失。我可能一天都没有天气

我不知道如何到达:

> billweather
  startdate enddate sumhdd sumcdd
1       100     105      9     10
2       103     104      5      0
3       107     110     13     10
其中
sumhdd
是天气
data.frame
中从
startdate
enddate
hdd
的总和

有什么想法吗?

billweather
cbind(billperiods,t)sapply(apply)(billperiods,1,function(x)
billweather <- cbind(billperiods, 
                 t(apply(billperiods, 1, function(x) { 
                   colSums(weather[weather[, 1] %in% c(x[1]:x[2]), 2:3])
               })))
天气[weather$weatherdate>=x[1]和
weather$weatherdate这里有一个使用
IRanges
data.table
的方法。对于这个问题,这个答案似乎有点过分。但总的来说,我发现使用
IRanges
来处理时间间隔很方便,因为它们可能很简单

# load packages
require(IRanges)
require(data.table)

# convert data.frames to data.tables
dt1 <- data.table(billperiods)
dt2 <- data.table(weather)

# construct Ranges to get overlaps
ir1 <- IRanges(dt1$startdate, dt1$enddate)
ir2 <- IRanges(dt2$weatherdate, width=1) # start = end

# find Overlaps
olaps <- findOverlaps(ir1, ir2)

# Hits of length 10
# queryLength: 3
# subjectLength: 10
#    queryHits subjectHits 
#     <integer>   <integer> 
#  1          1           1 
#  2          1           2 
#  3          1           3 
#  4          1           4 
#  5          1           5 
#  6          2           4 
#  7          3           7 
#  8          3           8 
#  9          3           9 
#  10         3          10 

# get billweather (final output)
billweather <- cbind(dt1[queryHits(olaps)], 
                dt2[subjectHits(olaps), 
                list(hdd, cdd)])[, list(sumhdd = sum(hdd), 
                sumcdd = sum(cdd)), 
                by=list(startdate, enddate)]

#    startdate enddate sumhdd sumcdd
# 1:       100     105      9     10
# 2:       103     104      5      0
# 3:       107     110     13     10

感谢您的快速响应!我在更大的数据框(12356行)上进行了尝试,耗时6.75秒,效果良好!感谢您的快速响应!我在更大的数据框(12356行)上进行了尝试这花了7.89秒,结果很好!我很惊讶人们的反应如此之快。这是我第一次在这里问问题。
# load packages
require(IRanges)
require(data.table)

# convert data.frames to data.tables
dt1 <- data.table(billperiods)
dt2 <- data.table(weather)

# construct Ranges to get overlaps
ir1 <- IRanges(dt1$startdate, dt1$enddate)
ir2 <- IRanges(dt2$weatherdate, width=1) # start = end

# find Overlaps
olaps <- findOverlaps(ir1, ir2)

# Hits of length 10
# queryLength: 3
# subjectLength: 10
#    queryHits subjectHits 
#     <integer>   <integer> 
#  1          1           1 
#  2          1           2 
#  3          1           3 
#  4          1           4 
#  5          1           5 
#  6          2           4 
#  7          3           7 
#  8          3           8 
#  9          3           9 
#  10         3          10 

# get billweather (final output)
billweather <- cbind(dt1[queryHits(olaps)], 
                dt2[subjectHits(olaps), 
                list(hdd, cdd)])[, list(sumhdd = sum(hdd), 
                sumcdd = sum(cdd)), 
                by=list(startdate, enddate)]

#    startdate enddate sumhdd sumcdd
# 1:       100     105      9     10
# 2:       103     104      5      0
# 3:       107     110     13     10
# split for easier understanding
billweather <- cbind(dt1[queryHits(olaps)], 
            dt2[subjectHits(olaps), 
            list(hdd, cdd)])
billweather <- billweather[, list(sumhdd = sum(hdd), 
            sumcdd = sum(cdd)), 
            by=list(startdate, enddate)]