R 如果缺少观测值,请在数据框中添加行

R 如果缺少观测值,请在数据框中添加行,r,tidyverse,R,Tidyverse,我有一个df1,每个人(id)有多份调查问卷(测量),在特定时间点(日期)回答。通常每个人每节课都应填写三份问卷(第一、前、后)。一些参与者未能填写全部三份问卷。他们可能只回答三个问题中的一个或两个。因此,可能的模式可能是完整的(参与者A)、缺少“post”(参与者B)、缺少“first”(参与者C)、缺少“pre”(参与者D),或者只回答了三者中的一个(参与者E、F、G) 见df1: df1 <- structure(list(id = structure(c(1L, 1L, 1L, 2

我有一个df1,每个(id)有多份调查问卷(测量),在特定时间点(日期)回答。通常每个人每节课都应填写三份问卷(第一、前、后)。一些参与者未能填写全部三份问卷。他们可能只回答三个问题中的一个或两个。因此,可能的模式可能是完整的(参与者A)、缺少“post”(参与者B)、缺少“first”(参与者C)、缺少“pre”(参与者D),或者只回答了三者中的一个(参与者E、F、G)

见df1:

df1 <- structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L,  4L, 5L, 6L, 7L), .Label = c("A", "B", "C", "D", "E", "F", "G"), class = "factor"), measure = structure(c(1L, 3L, 2L, 1L, 3L, 3L, 2L, 1L, 2L, 1L, 3L, 2L), .Label = c("first", "post", "pre"), class = "factor"), date = structure(c(17558, 17558, 17558,  17558, 17559, 17559, 17559, 17559, 17558, 17558, 17558, 17558 ), class = "Date"), result = c(1, 5, 4, 7, 8, 7, 2, 1, 3, 5, 7, 7)), class = "data.frame", row.names = c(NA, -12L))
df1%
解组()
我也试过了

final <- df1 %>% complete(id, nesting(measure, date))
最终完成百分比(id、嵌套(度量、日期))

也许更复杂的是,参与者可以参加不止一次会议。因此,每个id都可能有x*(第一个,后一个,前一个)。

应该通过
complete(df1,id,measure)
完成。试试这个:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

df1 <- structure(list(
  id = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L,  4L, 5L, 6L, 7L), 
                 .Label = c("A", "B", "C", "D", "E", "F", "G"), 
                 class = "factor"), 
  measure = structure(c(1L, 3L, 2L, 1L, 3L, 3L, 2L, 1L, 2L, 1L, 3L, 2L), 
                      .Label = c("first", "post", "pre"), 
                      class = "factor"), 
  date = structure(c(17558, 17558, 17558,  17558, 17559, 17559, 17559, 17559, 17558, 17558, 17558, 17558 ), class = "Date"), 
  result = c(1, 5, 4, 7, 8, 7, 2, 1, 3, 5, 7, 7)), class = "data.frame", row.names = c(NA, -12L))

df2 <- structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L), .Label = c("A", "B", "C", "D", "E", "F", "G"), class = "factor"), measure = structure(c(1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L), .Label = c("first", "post", "pre"), class = "factor"), date = structure(c(17558, 17558, 17558, 17558, 17559, NA, NA, 17559, 17559, 17559, NA, 17558, 17558, NA, NA, NA, 17558, NA, NA, NA, 17558), class = "Date"), result = c(1, 5, 4, 7, 8, NA, NA, 7, 2, 1, NA, 3, 5, NA, NA, NA, 7, NA, NA, NA, 7)), class = "data.frame", row.names = c(NA, -21L))

# Result with complete(df1, id, measure) and setting order of measure
complete(df1, id, measure) %>% 
  mutate(measure = factor(measure, levels = c("first", "pre", "post"))) %>% 
  arrange(id, measure, date) %>% 
  as.data.frame()
#>    id measure       date result
#> 1   A   first 2018-01-27      1
#> 2   A     pre 2018-01-27      5
#> 3   A    post 2018-01-27      4
#> 4   B   first 2018-01-27      7
#> 5   B     pre 2018-01-28      8
#> 6   B    post       <NA>     NA
#> 7   C   first       <NA>     NA
#> 8   C     pre 2018-01-28      7
#> 9   C    post 2018-01-28      2
#> 10  D   first 2018-01-28      1
#> 11  D     pre       <NA>     NA
#> 12  D    post 2018-01-27      3
#> 13  E   first 2018-01-27      5
#> 14  E     pre       <NA>     NA
#> 15  E    post       <NA>     NA
#> 16  F   first       <NA>     NA
#> 17  F     pre 2018-01-27      7
#> 18  F    post       <NA>     NA
#> 19  G   first       <NA>     NA
#> 20  G     pre       <NA>     NA
#> 21  G    post 2018-01-27      7

# Desired output
df2 %>% 
  mutate(measure = factor(measure, levels = c("first", "pre", "post"))) %>% 
  arrange(id, measure, date)
#>    id measure       date result
#> 1   A   first 2018-01-27      1
#> 2   A     pre 2018-01-27      5
#> 3   A    post 2018-01-27      4
#> 4   B   first 2018-01-27      7
#> 5   B     pre 2018-01-28      8
#> 6   B    post       <NA>     NA
#> 7   C   first       <NA>     NA
#> 8   C     pre 2018-01-28      7
#> 9   C    post 2018-01-28      2
#> 10  D   first 2018-01-28      1
#> 11  D     pre       <NA>     NA
#> 12  D    post 2018-01-27      3
#> 13  E   first 2018-01-27      5
#> 14  E     pre       <NA>     NA
#> 15  E    post       <NA>     NA
#> 16  F   first       <NA>     NA
#> 17  F     pre 2018-01-27      7
#> 18  F    post       <NA>     NA
#> 19  G   first       <NA>     NA
#> 20  G     pre       <NA>     NA
#> 21  G    post 2018-01-27      7
库(dplyr)
#> 
#>正在附加包:“dplyr”
#>以下对象已从“package:stats”屏蔽:
#> 
#>滤波器,滞后
#>以下对象已从“package:base”屏蔽:
#> 
#>相交、setdiff、setequal、并集
图书馆(tidyr)
df1%
安排(id、度量、日期)%>%
as.data.frame()
#>id测量日期结果
#>1第一次2018-01-27 1
#>2 A 2018年1月27日之前5
#>3 A 2018年1月27日后4
#>4 B第一次2018-01-27 7
#>5 B 2018年1月28日之前8
#>6 B邮政编码:NA
#>7 C第一不适用
#>8 C 2018年1月28日之前7
#>9 C 2018年1月28日之后2
#>第10天第一次2018-01-28 1
#>11日NA前
#>2018年1月27日后12天3
#>13 E第一次2018-01-27 5
#>14 E前NA
#>15 E邮政局NA
#>16 F第一NA
#>17 F 2018年1月27日前7
#>18楼邮政局
#>19克第一钠
#>20克预钠
#>21 G 2018年1月27日后7
#期望输出
df2%>%
突变(度量=因子(度量,级别=c(“第一”、“前”、“后”))%>%
安排(id、尺寸、日期)
#>id测量日期结果
#>1第一次2018-01-27 1
#>2 A 2018年1月27日之前5
#>3 A 2018年1月27日后4
#>4 B第一次2018-01-27 7
#>5 B 2018年1月28日之前8
#>6 B邮政编码:NA
#>7 C第一不适用
#>8 C 2018年1月28日之前7
#>9 C 2018年1月28日之后2
#>第10天第一次2018-01-28 1
#>11日NA前
#>2018年1月27日后12天3
#>13 E第一次2018-01-27 5
#>14 E前NA
#>15 E邮政局NA
#>16 F第一NA
#>17 F 2018年1月27日前7
#>18楼邮政局
#>19克第一钠
#>20克预钠
#>21 G 2018年1月27日后7

由(v0.3.0)

于2020-03-09创建,看起来是一个很好的解决方案。谢谢。必须检查“真实”数据。然而,期望输出的测量顺序并不是第一位的➜ 之前➜ 邮递但如果日期变量更精确,这也可以实现?或者我先重新编码,先编码,后编码成1,2,3?是的。根据所需的顺序重新编码度量,例如`因子(度量,级别=c(“第一”、“前”、“后”))。我只是重新安排来比较解决方案。(;df1%变异(度量值=因子(度量值,级别=c(“第一”、“前”、“后”))更改因子级别。但是排列(id、日期、度量值)不会导致正确的顺序,因为完成后缺少“时间”。您好。只需更改排列中变量的顺序,即排列(id、度量值、日期).刚刚对我的帖子进行了编辑,包括重新排列因子的代码以及排列中使用的变量的顺序。
final <- df1 %>% complete(id, nesting(measure, date))
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

df1 <- structure(list(
  id = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L,  4L, 5L, 6L, 7L), 
                 .Label = c("A", "B", "C", "D", "E", "F", "G"), 
                 class = "factor"), 
  measure = structure(c(1L, 3L, 2L, 1L, 3L, 3L, 2L, 1L, 2L, 1L, 3L, 2L), 
                      .Label = c("first", "post", "pre"), 
                      class = "factor"), 
  date = structure(c(17558, 17558, 17558,  17558, 17559, 17559, 17559, 17559, 17558, 17558, 17558, 17558 ), class = "Date"), 
  result = c(1, 5, 4, 7, 8, 7, 2, 1, 3, 5, 7, 7)), class = "data.frame", row.names = c(NA, -12L))

df2 <- structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L), .Label = c("A", "B", "C", "D", "E", "F", "G"), class = "factor"), measure = structure(c(1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L), .Label = c("first", "post", "pre"), class = "factor"), date = structure(c(17558, 17558, 17558, 17558, 17559, NA, NA, 17559, 17559, 17559, NA, 17558, 17558, NA, NA, NA, 17558, NA, NA, NA, 17558), class = "Date"), result = c(1, 5, 4, 7, 8, NA, NA, 7, 2, 1, NA, 3, 5, NA, NA, NA, 7, NA, NA, NA, 7)), class = "data.frame", row.names = c(NA, -21L))

# Result with complete(df1, id, measure) and setting order of measure
complete(df1, id, measure) %>% 
  mutate(measure = factor(measure, levels = c("first", "pre", "post"))) %>% 
  arrange(id, measure, date) %>% 
  as.data.frame()
#>    id measure       date result
#> 1   A   first 2018-01-27      1
#> 2   A     pre 2018-01-27      5
#> 3   A    post 2018-01-27      4
#> 4   B   first 2018-01-27      7
#> 5   B     pre 2018-01-28      8
#> 6   B    post       <NA>     NA
#> 7   C   first       <NA>     NA
#> 8   C     pre 2018-01-28      7
#> 9   C    post 2018-01-28      2
#> 10  D   first 2018-01-28      1
#> 11  D     pre       <NA>     NA
#> 12  D    post 2018-01-27      3
#> 13  E   first 2018-01-27      5
#> 14  E     pre       <NA>     NA
#> 15  E    post       <NA>     NA
#> 16  F   first       <NA>     NA
#> 17  F     pre 2018-01-27      7
#> 18  F    post       <NA>     NA
#> 19  G   first       <NA>     NA
#> 20  G     pre       <NA>     NA
#> 21  G    post 2018-01-27      7

# Desired output
df2 %>% 
  mutate(measure = factor(measure, levels = c("first", "pre", "post"))) %>% 
  arrange(id, measure, date)
#>    id measure       date result
#> 1   A   first 2018-01-27      1
#> 2   A     pre 2018-01-27      5
#> 3   A    post 2018-01-27      4
#> 4   B   first 2018-01-27      7
#> 5   B     pre 2018-01-28      8
#> 6   B    post       <NA>     NA
#> 7   C   first       <NA>     NA
#> 8   C     pre 2018-01-28      7
#> 9   C    post 2018-01-28      2
#> 10  D   first 2018-01-28      1
#> 11  D     pre       <NA>     NA
#> 12  D    post 2018-01-27      3
#> 13  E   first 2018-01-27      5
#> 14  E     pre       <NA>     NA
#> 15  E    post       <NA>     NA
#> 16  F   first       <NA>     NA
#> 17  F     pre 2018-01-27      7
#> 18  F    post       <NA>     NA
#> 19  G   first       <NA>     NA
#> 20  G     pre       <NA>     NA
#> 21  G    post 2018-01-27      7