r中的数据重组
我有以下类型的数据:r中的数据重组,r,dataframe,reorganize,R,Dataframe,Reorganize,我有以下类型的数据: Person <- c("A", "B", "C", "AB", "BC", "AC", "D", "E") Father <- c(NA, NA, NA, "A", "B", "C", NA, "D") Mother <- c(NA, NA, NA, "B", "C", "A", "C", NA) var1 <- c( 1, 2, 3, 4, 2, 1, 6, 9) var2 <
Person <- c("A", "B", "C", "AB", "BC", "AC", "D", "E")
Father <- c(NA, NA, NA, "A", "B", "C", NA, "D")
Mother <- c(NA, NA, NA, "B", "C", "A", "C", NA)
var1 <- c( 1, 2, 3, 4, 2, 1, 6, 9)
var2 <- c(1.4, 2.3, 4.3, 3.4, 4.2, 6.1, 2.6, 8.2)
myd <- data.frame (Person, Father, Mother, var1, var2)
Person Father Mother var1 var2
1 A <NA> <NA> 1 1.4
2 B <NA> <NA> 2 2.3
3 C <NA> <NA> 3 4.3
4 AB A B 4 3.4
5 BC B C 2 4.2
6 AC C A 1 6.1
7 D <NA> C 6 2.6
8 E D <NA> 9 8.2
Person我会将您的问题表示为一个问题,然后设计一个图遍历算法来收集您正在寻找的所有三个问题
例如,这里有问题中三个问题的一个子集:
A B C
\ / \ /
vv vv
AB BC
您可以从顶点开始,而不需要任何边(AB和BC),然后使用它们的父对象创建一个三元组。然后去他们的父母那里,重复这个过程。您将需要一种方法来跟踪您已经访问过的顶点(人),以避免多次探索相同的顶点
R有几个使用图形的包。例如,您可以看看。这大概就是您想要的
Person <- c("A", "B", "C", "AB", "BC", "AC", "D", "E")
Father <- c(NA, NA, NA, "A", "B", "C", NA, "D")
Mother <- c(NA, NA, NA, "B", "C", "A", "C", NA)
var1 <- c( 1, 2, 3, 4, 2, 1, 6, 9)
var2 <- c(1.4, 2.3, 4.3, 3.4, 4.2, 6.1, 2.6, 8.2)
myd <- data.frame (Person, Father, Mother, var1, var2,stringsAsFactors=F)
如果您想要数据帧,可以使用plyr
包
library(plyr)
ans<-adply(seq_along(myd$Person),1,parentage,myd)
库(plyr)
ans+1此解决方案的效率可能不如在人数较多时使用图形,因为它需要遍历传递给函数父代关系的每个人的所有人员。但这是一个比我提出的简单得多的解决方案,当处理的人数不是很高时,它将具有类似的效率。@Betabando,理论上听起来很有趣,但不知道如何实际实施……谢谢
A B C
\ / \ /
vv vv
AB BC
Person <- c("A", "B", "C", "AB", "BC", "AC", "D", "E")
Father <- c(NA, NA, NA, "A", "B", "C", NA, "D")
Mother <- c(NA, NA, NA, "B", "C", "A", "C", NA)
var1 <- c( 1, 2, 3, 4, 2, 1, 6, 9)
var2 <- c(1.4, 2.3, 4.3, 3.4, 4.2, 6.1, 2.6, 8.2)
myd <- data.frame (Person, Father, Mother, var1, var2,stringsAsFactors=F)
parentage<-function(x,myd){
y<-myd[x,]
p1<-as.character(y['Father'])
p2<-as.character(y['Mother'])
out<-y
if(!is.na(p1)){
out<-rbind(out,myd[myd$Person==p1,])
}
if(!is.na(p2)){
out<-rbind(out,myd[myd$Person==p2,])
}
out$Trio=x
out
}
ans<-lapply(seq_along(myd$Person),parentage,myd)
> ans
[[1]]
Person Father Mother var1 var2 Trio
1 A <NA> <NA> 1 1.4 1
[[2]]
Person Father Mother var1 var2 Trio
2 B <NA> <NA> 2 2.3 2
[[3]]
Person Father Mother var1 var2 Trio
3 C <NA> <NA> 3 4.3 3
[[4]]
Person Father Mother var1 var2 Trio
4 AB A B 4 3.4 4
2 A <NA> <NA> 1 1.4 4
21 B <NA> <NA> 2 2.3 4
[[5]]
Person Father Mother var1 var2 Trio
5 BC B C 2 4.2 5
2 B <NA> <NA> 2 2.3 5
3 C <NA> <NA> 3 4.3 5
[[6]]
Person Father Mother var1 var2 Trio
6 AC C A 1 6.1 6
3 C <NA> <NA> 3 4.3 6
31 A <NA> <NA> 1 1.4 6
[[7]]
Person Father Mother var1 var2 Trio
7 D <NA> C 6 2.6 7
3 C <NA> <NA> 3 4.3 7
[[8]]
Person Father Mother var1 var2 Trio
8 E D <NA> 9 8.2 8
7 D <NA> C 6 2.6 8
library(plyr)
ans<-adply(seq_along(myd$Person),1,parentage,myd)