R 使用最接近的匹配值交叉引用两个表_R

R 使用最接近的匹配值交叉引用两个表

R 使用最接近的匹配值交叉引用两个表,r,R,我需要交叉引用两个表，并根据第二个表在第一个表中创建另一个变量。这两个表格是： > dput(df) structure(list(PlayerName = "Example", DateOfBirth = structure(1069113600, class = c("POSIXct", "POSIXt"), tzone = "UTC"), DateOfTest = structure(1476316800, class = c("POSIXct", "POSIXt"), tzon

我需要交叉引用两个表，并根据第二个表在第一个表中创建另一个变量。这两个表格是：

> dput(df)
structure(list(PlayerName = "Example", DateOfBirth = structure(1069113600, class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), DateOfTest = structure(1476316800, class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Stature = 151.7, SittingHeight = 77, 
    BodyMass = 74, Age = 12.9034907597536, LegLength = 74.7, 
    year_from_phv = -0.993206850280964, AgeAtPHV = 13.8966976100346, 
    Maturation_stat = "Average"), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -1L))

> dput(reference)
structure(list(year_from_phv = c(-1, -0.8, -0.6, -0.4, -0.2, 
0, 0.2, 0.4, 0.6, 0.8, 1, -1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 
0.4, 0.6, 0.8, 1, -1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 
0.8, 1), Maturation_stat = c("Early", "Early", "Early", "Early", "Early", 
"Early", "Early", "Early", "Early", "Early", "Early", "Average", 
"Average", "Average", "Average", "Average", "Average", "Average", 
"Average", "Average", "Average", "Average", "Late", "Late", "Late", 
"Late", "Late", "Late", "Late", "Late", "Late", "Late", "Late"
), cm = c("27.66", "26.24", "24.68", "22.96", "21.07", "19.04", 
"16.96", "14.92", "13.01", "11.26", "9.6999999999999993", "24.36", 
"22.99", "21.51", "19.88", "18.09", "16.16", "14.21", "12.35", 
"10.65", "9.1199999999999992", "7.78", "20.22", "18.96", "17.68", 
"16.31", "14.76", "13.05", "11.32", "9.7100000000000009", "8.27", 
"6.94", "5.7")), row.names = c(NA, -33L), class = c("tbl_df", 
"tbl", "data.frame"))

在他们内部，我需要：

查看

df$u统计数据

，然后在何处进行筛选

参考$u stat

相同，则：

查看

df$year\u from\u phv

并在

reference$year\u from\u phv

基于以上两个过滤器，返回

参考$cm

的值，并将其作为

df

中的变量。对于

df

中的样本数据，它应该返回

24.36

如果可能的话，这个或其中的一部分是否也可以封装在函数中？

像这样

add_cm <- function(df, reference) {
    # Filter for equal Maturation_stat
    filter1 <- reference[reference$Maturation_stat==df$Maturation_stat, ]
    # Calculate absolute difference of year_from_phv from reference and df 
    filter2 <- transform(filter1, diff=abs(year_from_phv-df$year_from_phv))
    # Add cm with minimum absolute difference
    df$cm <- filter2$cm[which.min(filter2$diff)]     
    df
}

add_cm(df, reference)

  PlayerName DateOfBirth DateOfTest Stature SittingHeight BodyMass      Age
1    Example  2003-11-18 2016-10-13   151.7            77       74 12.90349
  LegLength year_from_phv AgeAtPHV Maturation_stat    cm
1      74.7    -0.9932069  13.8967         Average 24.36

像这样添加\u cm
add_cm <- function(df, reference) {
    # Filter for equal Maturation_stat
    filter1 <- reference[reference$Maturation_stat==df$Maturation_stat, ]
    # Calculate absolute difference of year_from_phv from reference and df 
    filter2 <- transform(filter1, diff=abs(year_from_phv-df$year_from_phv))
    # Add cm with minimum absolute difference
    df$cm <- filter2$cm[which.min(filter2$diff)]     
    df
}

add_cm(df, reference)

  PlayerName DateOfBirth DateOfTest Stature SittingHeight BodyMass      Age
1    Example  2003-11-18 2016-10-13   151.7            77       74 12.90349
  LegLength year_from_phv AgeAtPHV Maturation_stat    cm
1      74.7    -0.9932069  13.8967         Average 24.36

add_cm作为第一次尝试，您可以循环遍历df
的每一行，并实现您的逻辑以查找参考的匹配行

# create the extra column of df
df$cm <- NA
for (i in 1:nrow(df)) {
    # find rows in reference with the same Maturation_stat
    reference_ss <- reference[reference$Maturation_stat == df$Maturation_stat[i])

    # find the closest year_from_phv
    reference_ss <- reference_ss[which.min(abs(df$year_from_phv[i] - reference_ss$year_from_phv[i]))]

    # extract the cm and store it
    df$cm[i] <- reference_ss$cm[1]
}

作为第一次尝试，您可以在df的每一行中循环，并实现您的逻辑来查找reference的匹配行
# create the extra column of df
df$cm <- NA
for (i in 1:nrow(df)) {
    # find rows in reference with the same Maturation_stat
    reference_ss <- reference[reference$Maturation_stat == df$Maturation_stat[i])

    # find the closest year_from_phv
    reference_ss <- reference_ss[which.min(abs(df$year_from_phv[i] - reference_ss$year_from_phv[i]))]

    # extract the cm and store it
    df$cm[i] <- reference_ss$cm[1]
}

当df
只有一行，但最终我需要为多行工作时，这一点起作用，感谢当df
只有一行，但最终我需要为多行工作时，这一点起作用，感谢我使用了数据。表
最后使用了滚动连接，感谢我使用了数据。表
最后使用了滚动连接，谢谢