R填写缺失值的plyr应用程序_R_Dataframe_Plyr_Split Apply Combine

R填写缺失值的plyr应用程序

r dataframe

R填写缺失值的plyr应用程序,r,dataframe,plyr,split-apply-combine,R,Dataframe,Plyr,Split Apply Combine,我有一个数据框架，其中包含许多变量的人-年观测值。看起来是这样的： year serial moved urban.rural.code 15 1982 1000_1 0 0 16 1983 1000_1 0 0 17 1984 1000_1 0 0 18 1985 1000_1 1 0 19

我有一个数据框架，其中包含许多变量的人-年观测值。看起来是这样的：

   year     serial moved urban.rural.code   
15 1982     1000_1     0                0
16 1983     1000_1     0                0
17 1984     1000_1     0                0
18 1985     1000_1     1                0
19 1986     1000_1     1                1
20 1981     1000_2     0                1
21 1982     1000_2     0                1
22 1983     1000_2     0                1
23 1984     1000_2     0                0
24 1985     1000_2     0                9   
25 1996     1000_2     0                1
26 1993     1000_3     0                1
27 1994     1000_3     0                1
28 1984     1000_4     0                0
29 1985     1000_4     0                7  
30 1987     1000_5     0                1
31 1984     1000_6     0                0
32 1999     1000_6     0                8

fill1984 <- function(group) {
    if((1984 %in% group$year) & (group[group$year == 1985, 'moved'] == 0)) {
        group[group$year == 1984, 'urban.rural.code'] <- group[group$year == 1985,     'urban.rural.code']
        } 
     return(group)
 }

data <- ddply(data, 'serial', fill1984, .parallel=TRUE)

对于序列号内的每个观测值，如果观测值记录在1985年，并且1895年的值为

moved

=0，那么我想将1984年的

urban.rural.code

分配给1985年的值。在上面的示例中，仅第23行和第28行的

urban.rural.code

应分别指定给9和7

我结合使用了

ddply

和一个helper函数，如下所示：

   year     serial moved urban.rural.code   
15 1982     1000_1     0                0
16 1983     1000_1     0                0
17 1984     1000_1     0                0
18 1985     1000_1     1                0
19 1986     1000_1     1                1
20 1981     1000_2     0                1
21 1982     1000_2     0                1
22 1983     1000_2     0                1
23 1984     1000_2     0                0
24 1985     1000_2     0                9   
25 1996     1000_2     0                1
26 1993     1000_3     0                1
27 1994     1000_3     0                1
28 1984     1000_4     0                0
29 1985     1000_4     0                7  
30 1987     1000_5     0                1
31 1984     1000_6     0                0
32 1999     1000_6     0                8

fill1984 <- function(group) {
    if((1984 %in% group$year) & (group[group$year == 1985, 'moved'] == 0)) {
        group[group$year == 1984, 'urban.rural.code'] <- group[group$year == 1985,     'urban.rural.code']
        } 
     return(group)
 }

data <- ddply(data, 'serial', fill1984, .parallel=TRUE)

我不知道我错在哪里。如何在每个

序列

编号组中编辑

城市.农村.code

这是在dplyr中，可能可以清理一些，但它看起来很有效：

library(dplyr)
newdf <- data %>%
          group_by(serial) %>%
          mutate(
            cidx = year == 1985 & moved == 0,
            urban.rural.code = ifelse(year == 1984 & isTRUE(cidx[year==1985]),
                                      urban.rural.code[year == 1985],
                                      urban.rural.code)
          )

库（dplyr）
新DF%
分组依据（序列）%>%
变异(
cidx=年份==1985，移动==0，
urban.rural.code=ifelse（年份==1984）和isTRUE（cidx[年份==1985]），
城市.农村.代码[年份==1985]，
城市。农村。代码）
)