R 从数据帧中的其他行提取和使用信息？_R

R 从数据帧中的其他行提取和使用信息？

R 从数据帧中的其他行提取和使用信息？,r,R,因此，我有一个如下所示的数据帧： id age friend1 friend2 01 15 02 05 02 23 01 05 03 51 04 04 41 03 05 33 01 02 我如何计算朋友的平均年龄并创建一个新列来存储这些信息？理想情况下，它看起来像这样： id age friend1 friend2 AvgAgeF 01 15 02

因此，我有一个如下所示的数据帧：

id age  friend1 friend2
01  15   02      05    
02  23   01      05    
03  51   04            
04  41   03            
05  33   01      02

我如何计算朋友的平均年龄并创建一个新列来存储这些信息？理想情况下，它看起来像这样：

id age  friend1 friend2 AvgAgeF
01  15   02      05        28
02  23   01      05        24
03  51   04                41
04  41   03                51
05  33   01      02        19

目前，我可以运行以下代码来执行此操作：

inx <- grep("friend", names(dat))
tmp <- sapply(inx, function(i) dat$age[dat[[i]]])
dat$AvgAgeF <- rowMeans(tmp, na.rm = TRUE)

inx子集列age
以friend1
和friend2
为索引，然后cbind
得到的向量。现在只需调用rowMeans

tmp <- with(dat, cbind(age[friend1], age[friend2]))
dat$AvgAgeF <- rowMeans(tmp, na.rm = TRUE)
rm(tmp)

dat
#  id age friend1 friend2 AvgAgeF
#1  1  15       2       5      28
#2  2  23       1       5      24
#3  3  51       4      NA      41
#4  4  41       3      NA      51
#5  5  33       1       2      19

现在，使用grep
获取friend
列
inx <- grep("friend", names(dat))
tmp <- sapply(inx, function(i) dat$age[dat[[i]]])
dat$AvgAgeF <- rowMeans(tmp, na.rm = TRUE)

编辑2-id
是字符串。
如果列id
和friend*
是字符串而不是数字，则使用匹配索引列age

inx <- grep("friend", names(dat2))
tmp <- sapply(inx, function(i) {
  x <- as.character(dat2[[i]])
  y <- as.character(dat2$id)
  dat2$age[match(x, y)]
  })
dat2$AvgAgeF <- rowMeans(tmp, na.rm = TRUE)
rm(tmp)

dat2
#     id age friend1 friend2 AvgAgeF
#1   Bob  15    Jack     Sam      37
#2  Jack  23     Sam     Bob      33
#3   Sam  51     Bob    Jack      19
#4  Sara  41   Henry    <NA>      33
#5 Henry  33    Sara    <NA>      41

inx如果我有20个“friend”列，我不想手动执行，那么有没有更简单的方法来执行此操作？我正在使用此代码，我意识到如果id在数字上“不合适”，那么它将停止正常工作。我得到的错误是，我用06替换了id 05，并用id 06替换了它的所有实例。发生这种情况时，代码将id视为NA。有办法解决这个问题吗？我意识到我可以转换所有的id，这样就不会有这个问题，但在我处理的数据中，我不能这样做，因为id包含其他信息。@杰克，我不明白，代码只处理其他列，为什么id相关？你能用一个例子来编辑这个问题吗？我编辑了这篇文章。因此，在这种情况下，id/标识符不再是数字的，当这种情况发生时，代码无法正确运行。
set.seed(1234)

tmp <- matrix(sample(c(NA, 1:5), 20, TRUE), nrow = 5)
colnames(tmp) <- paste0("friend", 3:6)
dat <- cbind(dat, tmp)

inx <- grep("friend", names(dat))
tmp <- sapply(inx, function(i) dat$age[dat[[i]]])
dat$AvgAgeF <- rowMeans(tmp, na.rm = TRUE)

rm(tmp)

inx <- grep("friend", names(dat2))
tmp <- sapply(inx, function(i) {
  x <- as.character(dat2[[i]])
  y <- as.character(dat2$id)
  dat2$age[match(x, y)]
  })
dat2$AvgAgeF <- rowMeans(tmp, na.rm = TRUE)
rm(tmp)

dat2
#     id age friend1 friend2 AvgAgeF
#1   Bob  15    Jack     Sam      37
#2  Jack  23     Sam     Bob      33
#3   Sam  51     Bob    Jack      19
#4  Sara  41   Henry    <NA>      33
#5 Henry  33    Sara    <NA>      41

dat <- read.table(text = "
id age  friend1 friend2
01  15   02      05    
02  23   01      05    
03  51   04     NA       
04  41   03     NA       
05  33   01      02                    
", header = TRUE)

dat2 <- read.table(text = "
id    age  friend1 friend2
Bob   15   Jack    Sam    
Jack  23   Sam     Bob   
Sam   51   Bob     Jack     
Sara  41   Henry   NA       
Henry 33   Sara    NA                   
", header = TRUE)