R-根据第二个数据帧中最接近的匹配指定列值
我有两个数据帧,logger和df(时间为数字):R-根据第二个数据帧中最接近的匹配指定列值,r,loops,dataframe,matching,closest,R,Loops,Dataframe,Matching,Closest,我有两个数据帧,logger和df(时间为数字): logger您可以使用data.table库。这也将有助于更高效地处理大数据量- library(data.table) logger <- data.frame( time = c(1280248354:1280248413), temp = runif(60,min=18,max=24.5) ) df <- data.frame( obs = c(1:10), time = runif(10,min=1280
logger您可以使用data.table
库。这也将有助于更高效地处理大数据量-
library(data.table)
logger <- data.frame(
time = c(1280248354:1280248413),
temp = runif(60,min=18,max=24.5)
)
df <- data.frame(
obs = c(1:10),
time = runif(10,min=1280248354,max=1280248413)
)
logger <- data.table(logger)
df <- data.table(df)
setkey(df,time)
setkey(logger,time)
df2 <- logger[df, roll = "nearest"]
我会使用data.table
来实现这一点。它使您可以超轻松、超快速地按键
。甚至还有一个非常有用的roll=“nearest”
参数,可以精确地说明您正在寻找的行为(除了在示例数据中,它不是必需的,因为df
中的所有时间都出现在logger
中)。在下面的示例中,我将df$time
重命名为df$time1
,以明确哪个列属于哪个表
# Load package
require( data.table )
# Make data.frames into data.tables with a key column
ldt <- data.table( logger , key = "time" )
dt <- data.table( df , key = "time1" )
# Join based on the key column of the two tables (time & time1)
# roll = "nearest" gives the desired behaviour
# list( obs , time1 , temp ) gives the columns you want to return from dt
ldt[ dt , list( obs , time1 , temp ) , roll = "nearest" ]
# time obs time1 temp
# 1: 1280248361 8 1280248361 18.07644
# 2: 1280248366 4 1280248366 21.88957
# 3: 1280248370 3 1280248370 19.09015
# 4: 1280248376 5 1280248376 22.39770
# 5: 1280248381 6 1280248381 24.12758
# 6: 1280248383 10 1280248383 22.70919
# 7: 1280248385 1 1280248385 18.78183
# 8: 1280248389 2 1280248389 18.17874
# 9: 1280248393 9 1280248393 18.03098
#10: 1280248403 7 1280248403 22.74372
#加载包
要求(数据表)
#使用键列将data.frames设置为data.tables
ldt+1用于具有样本数据的可复制示例,显示您想要什么,以及您尝试了什么。顺便说一句-下次使用进行随机采样的数据时,首先运行命令set.seed(x)
,其中x
是任意整数(大多数人使用1
)。这样,每个复制您的示例的人都将得到相同的数据集。
library(data.table)
logger <- data.frame(
time = c(1280248354:1280248413),
temp = runif(60,min=18,max=24.5)
)
df <- data.frame(
obs = c(1:10),
time = runif(10,min=1280248354,max=1280248413)
)
logger <- data.table(logger)
df <- data.table(df)
setkey(df,time)
setkey(logger,time)
df2 <- logger[df, roll = "nearest"]
> df2
time temp obs
1: 1280248356 22.81437 7
2: 1280248360 24.08711 10
3: 1280248366 22.31738 2
4: 1280248367 18.61222 5
5: 1280248388 19.46300 4
6: 1280248393 18.26535 6
7: 1280248400 20.61901 9
8: 1280248402 21.92584 1
9: 1280248410 19.36526 8
10: 1280248410 19.36526 3
# Load package
require( data.table )
# Make data.frames into data.tables with a key column
ldt <- data.table( logger , key = "time" )
dt <- data.table( df , key = "time1" )
# Join based on the key column of the two tables (time & time1)
# roll = "nearest" gives the desired behaviour
# list( obs , time1 , temp ) gives the columns you want to return from dt
ldt[ dt , list( obs , time1 , temp ) , roll = "nearest" ]
# time obs time1 temp
# 1: 1280248361 8 1280248361 18.07644
# 2: 1280248366 4 1280248366 21.88957
# 3: 1280248370 3 1280248370 19.09015
# 4: 1280248376 5 1280248376 22.39770
# 5: 1280248381 6 1280248381 24.12758
# 6: 1280248383 10 1280248383 22.70919
# 7: 1280248385 1 1280248385 18.78183
# 8: 1280248389 2 1280248389 18.17874
# 9: 1280248393 9 1280248393 18.03098
#10: 1280248403 7 1280248403 22.74372