如何在R中将一个观测值的变量附加到同一数据集中的另一个观测值

如何在R中将一个观测值的变量附加到同一数据集中的另一个观测值,r,dataframe,R,Dataframe,我的问题如下: 假设我有一个人年数据集,其中包含关于婚姻状况(cStatus)、种族、年份、配偶id(pID)和居住城市的信息: data<-data.frame(cbind(c(rep(1,5),rep(2,5),rep(3,5),rep(4,5),rep(5,5)),c(rep(c(1,2,3,4,5),5)),c(NA,NA,NA,NA,NA,NA,NA,3,3,NA,NA,NA,2,2,7,6,6,6,6,6,NA,NA,NA,NA,NA),c(0,0,0,0,0,0,0,1,1,

我的问题如下:

假设我有一个人年数据集,其中包含关于婚姻状况(cStatus)、种族、年份、配偶id(pID)和居住城市的信息:

data<-data.frame(cbind(c(rep(1,5),rep(2,5),rep(3,5),rep(4,5),rep(5,5)),c(rep(c(1,2,3,4,5),5)),c(NA,NA,NA,NA,NA,NA,NA,3,3,NA,NA,NA,2,2,7,6,6,6,6,6,NA,NA,NA,NA,NA),c(0,0,0,0,0,0,0,1,1,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0),c(1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1),c(rep(1,5),rep(1,2),rep(2,3),rep(2,4),1,rep(1,5),rep(1,5))))
names(data)<- c("id","year","pID","cStatus","race","city")

> head(data)  
id year pID cStatus race city
 1    1    NA       0    1    1
 1    2    NA       0    1    1
 1    3    NA       0    1    1
 1    4    NA       0    1    1
 1    5    NA       0    1    1
 2    1    NA       0    1    1
数据
请注意,如果数据中不存在
pID
(没有相应的
id
),或者没有配偶,则会出现NA

如果您想考虑配偶随年份的变化,只需将年份添加到helper数据框以及
merge
参数中即可。正如@joran指出的,
merge
可以接受多个列进行合并,类似于SQL

# create a dataframe that has unique entries for each person and their race
spouses.yearly <- unique(data[c("id", "year", "race")])
names(spouses.yearly) <- c("pID", "pRace")

# merge race via spouse id
data <- merge(data, spouses.yearly, by=c("pID", "year"), all.x=TRUE)
#为每个人及其种族创建一个具有唯一条目的数据框

听起来你只需要使用
merge
@joran-Andy在下面建议的答案。这是可行的,但我也遇到过这样的情况:人们更换了合作伙伴,我希望在给定的年份里有一个合作伙伴的种族。你可以在多个变量上合并<代码>合并
提供与SQL连接大致相同的功能。
Warning messages:
1: In `[<-.data.frame`(`*tmp*`, data$id == i, , value = list(id = c(1,  :
  provided 8 variables to replace 7 variables
# create a dataframe that has unique entries for each person and their race
spouses <- unique(data[c("id", "race")])
names(spouses) <- c("pID", "pRace")

# merge race via spouse id
data <- merge(data, spouses, by="pID", all.x=TRUE)
> data
   pID id year cStatus race city pRace
1    2  3    4       1    0    2     1
2    2  3    3       1    0    2     1
3    3  2    4       1    1    2     0
4    3  2    3       1    1    2     0
5    6  4    2       1    0    1    NA
6    6  4    1       1    0    1    NA
7    6  4    3       1    0    1    NA
8    6  4    5       1    0    1    NA
9    6  4    4       1    0    1    NA
10   7  3    5       1    0    1    NA
11  NA  1    1       0    1    1    NA
12  NA  1    2       0    1    1    NA
[...]
# create a dataframe that has unique entries for each person and their race
spouses.yearly <- unique(data[c("id", "year", "race")])
names(spouses.yearly) <- c("pID", "pRace")

# merge race via spouse id
data <- merge(data, spouses.yearly, by=c("pID", "year"), all.x=TRUE)