R 使用最接近的匹配值交叉引用两个表

R 使用最接近的匹配值交叉引用两个表,r,R,我需要交叉引用两个表,并根据第二个表在第一个表中创建另一个变量。这两个表格是: > dput(df) structure(list(PlayerName = "Example", DateOfBirth = structure(1069113600, class = c("POSIXct", "POSIXt"), tzone = "UTC"), DateOfTest = structure(1476316800, class = c("POSIXct", "POSIXt"), tzon

我需要交叉引用两个表,并根据第二个表在第一个表中创建另一个变量。这两个表格是:

> dput(df)
structure(list(PlayerName = "Example", DateOfBirth = structure(1069113600, class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), DateOfTest = structure(1476316800, class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Stature = 151.7, SittingHeight = 77, 
    BodyMass = 74, Age = 12.9034907597536, LegLength = 74.7, 
    year_from_phv = -0.993206850280964, AgeAtPHV = 13.8966976100346, 
    Maturation_stat = "Average"), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -1L))

> dput(reference)
structure(list(year_from_phv = c(-1, -0.8, -0.6, -0.4, -0.2, 
0, 0.2, 0.4, 0.6, 0.8, 1, -1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 
0.4, 0.6, 0.8, 1, -1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 
0.8, 1), Maturation_stat = c("Early", "Early", "Early", "Early", "Early", 
"Early", "Early", "Early", "Early", "Early", "Early", "Average", 
"Average", "Average", "Average", "Average", "Average", "Average", 
"Average", "Average", "Average", "Average", "Late", "Late", "Late", 
"Late", "Late", "Late", "Late", "Late", "Late", "Late", "Late"
), cm = c("27.66", "26.24", "24.68", "22.96", "21.07", "19.04", 
"16.96", "14.92", "13.01", "11.26", "9.6999999999999993", "24.36", 
"22.99", "21.51", "19.88", "18.09", "16.16", "14.21", "12.35", 
"10.65", "9.1199999999999992", "7.78", "20.22", "18.96", "17.68", 
"16.31", "14.76", "13.05", "11.32", "9.7100000000000009", "8.27", 
"6.94", "5.7")), row.names = c(NA, -33L), class = c("tbl_df", 
"tbl", "data.frame"))
在他们内部,我需要:

  • 查看
    df$u统计数据
    ,然后在何处进行筛选
    参考$u stat
    相同,则:
  • 查看
    df$year\u from\u phv
    并在
    reference$year\u from\u phv
  • 基于以上两个过滤器,返回
    参考$cm
    的值,并将其作为
    df
    中的变量。对于
    df
    中的样本数据,它应该返回
    24.36
  • 如果可能的话,这个或其中的一部分是否也可以封装在函数中?

    像这样

    add_cm <- function(df, reference) {
        # Filter for equal Maturation_stat
        filter1 <- reference[reference$Maturation_stat==df$Maturation_stat, ]
        # Calculate absolute difference of year_from_phv from reference and df 
        filter2 <- transform(filter1, diff=abs(year_from_phv-df$year_from_phv))
        # Add cm with minimum absolute difference
        df$cm <- filter2$cm[which.min(filter2$diff)]     
        df
    }
    
    add_cm(df, reference)
    
      PlayerName DateOfBirth DateOfTest Stature SittingHeight BodyMass      Age
    1    Example  2003-11-18 2016-10-13   151.7            77       74 12.90349
      LegLength year_from_phv AgeAtPHV Maturation_stat    cm
    1      74.7    -0.9932069  13.8967         Average 24.36
    
    像这样添加\u cm

    add_cm <- function(df, reference) {
        # Filter for equal Maturation_stat
        filter1 <- reference[reference$Maturation_stat==df$Maturation_stat, ]
        # Calculate absolute difference of year_from_phv from reference and df 
        filter2 <- transform(filter1, diff=abs(year_from_phv-df$year_from_phv))
        # Add cm with minimum absolute difference
        df$cm <- filter2$cm[which.min(filter2$diff)]     
        df
    }
    
    add_cm(df, reference)
    
      PlayerName DateOfBirth DateOfTest Stature SittingHeight BodyMass      Age
    1    Example  2003-11-18 2016-10-13   151.7            77       74 12.90349
      LegLength year_from_phv AgeAtPHV Maturation_stat    cm
    1      74.7    -0.9932069  13.8967         Average 24.36
    

    add_cm作为第一次尝试,您可以循环遍历
    df
    的每一行,并实现您的逻辑以查找
    参考的匹配行

    # create the extra column of df
    df$cm <- NA
    for (i in 1:nrow(df)) {
        # find rows in reference with the same Maturation_stat
        reference_ss <- reference[reference$Maturation_stat == df$Maturation_stat[i])
    
        # find the closest year_from_phv
        reference_ss <- reference_ss[which.min(abs(df$year_from_phv[i] - reference_ss$year_from_phv[i]))]
    
        # extract the cm and store it
        df$cm[i] <- reference_ss$cm[1]
    }
    

    作为第一次尝试,您可以在
    df的每一行中循环,并实现您的逻辑来查找
    reference的匹配行

    # create the extra column of df
    df$cm <- NA
    for (i in 1:nrow(df)) {
        # find rows in reference with the same Maturation_stat
        reference_ss <- reference[reference$Maturation_stat == df$Maturation_stat[i])
    
        # find the closest year_from_phv
        reference_ss <- reference_ss[which.min(abs(df$year_from_phv[i] - reference_ss$year_from_phv[i]))]
    
        # extract the cm and store it
        df$cm[i] <- reference_ss$cm[1]
    }
    

    df
    只有一行,但最终我需要为多行工作时,这一点起作用,感谢当
    df
    只有一行,但最终我需要为多行工作时,这一点起作用,感谢我使用了
    数据。表
    最后使用了滚动连接,感谢我使用了
    数据。表
    最后使用了滚动连接,谢谢