Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/66.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 将每个观察结果相互匹配_R_Join - Fatal编程技术网

R 将每个观察结果相互匹配

R 将每个观察结果相互匹配,r,join,R,Join,我有以下1051个观测值的数据框 customer_id long lat 11111 111.320 110.574 11112 111.243 110.311 我需要操纵数据帧,以便每个观测值都与每个观测值匹配。这将允许我获得每次观察之间的距离 customer_id_a long_a lat_b customer_id_b long_a lat_b 11111 111.320 110.574

我有以下1051个观测值的数据框

customer_id  long       lat
11111        111.320    110.574 
11112        111.243    110.311
我需要操纵数据帧,以便每个观测值都与每个观测值匹配。这将允许我获得每次观察之间的距离

customer_id_a  long_a   lat_b    customer_id_b  long_a    lat_b
11111          111.320  110.574  11112          111.243   110.311

在R中,我该怎么做

我们可以从
数据表中使用
dcast

library(data.table)
dcast(setDT(df1)[, newid := 1], newid ~ letters[rowid(newid)], 
     value.var = c('customer_id', 'long', 'lat'))[, newid := NULL][]
#    customer_id_a customer_id_b long_a  long_b   lat_a   lat_b
#1:         11111         11112 111.32 111.243 110.574 110.311

或者使用
base R

df2 <- transform(df1, newid = 1)
df2$Seq <- with(df2, letters[ave(newid, newid, FUN = seq_along)])
reshape(df2, idvar = 'newid', timevar= 'Seq', direction = 'wide')[-1]
#  customer_id.a long.a   lat.a customer_id.b  long.b   lat.b
#1         11111 111.32 110.574         11112 111.243 110.311

df2base R中的解决方案。首先,我创建一些玩具数据:

n <- 50
df <- data.frame(customer_id = sprintf("1%0.5d", 1:50),
                 long = rnorm(n)+105, lat = rnorm(n)+110)
head(df)
#  customer_id     long      lat
#1      100001 105.7532 109.4935
#2      100002 102.0772 110.9918
#3      100003 102.8655 110.7422
#4      100004 103.3984 111.1385
#5      100005 102.8614 111.8068
#6      100006 105.1860 110.3117

merge(df,df,all=TRUE)
将为您提供一个笛卡尔式的连接,该连接可能与“不要重新发明轮子”重复。检查例如
geosphere::distm
无需重塑。
n <- 50
df <- data.frame(customer_id = sprintf("1%0.5d", 1:50),
                 long = rnorm(n)+105, lat = rnorm(n)+110)
head(df)
#  customer_id     long      lat
#1      100001 105.7532 109.4935
#2      100002 102.0772 110.9918
#3      100003 102.8655 110.7422
#4      100004 103.3984 111.1385
#5      100005 102.8614 111.8068
#6      100006 105.1860 110.3117
cs <- combn(nrow(df), 2)   
new_df <- cbind(a = df[cs[1,], ], b = df[cs[2,], ])    
rownames(new_df) <- NULL  # Remove default rownames

head(new_df)
#  a.customer_id   a.long    a.lat b.customer_id   b.long    b.lat
#1        100001 105.7532 109.4935        100002 102.0772 110.9918
#2        100001 105.7532 109.4935        100003 102.8655 110.7422
#3        100001 105.7532 109.4935        100004 103.3984 111.1385
#4        100001 105.7532 109.4935        100005 102.8614 111.8068
#5        100001 105.7532 109.4935        100006 105.1860 110.3117
#6        100001 105.7532 109.4935        100007 103.8722 111.2530