Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/66.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/loops/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 完成。分组病例而不是观察?_R_Dplyr_Tidyr - Fatal编程技术网

R 完成。分组病例而不是观察?

R 完成。分组病例而不是观察?,r,dplyr,tidyr,R,Dplyr,Tidyr,如果我整理了数据: df = expand.grid(Name=c("Sub1","Sub2","Sub3"),Vis=c("Yes","No")) %>% mutate(KPR_mean=c(NA,1,3,2,3,2),KPR_range=c(NA,4,4,2,6,5)) %>% filter(complete.cases(.)) 我想过滤掉不完整的因子组合,留下完整的因子模型。现在,我的做法如下: df %>% unite(KPR_mea

如果我整理了数据:

df = expand.grid(Name=c("Sub1","Sub2","Sub3"),Vis=c("Yes","No")) %>%
       mutate(KPR_mean=c(NA,1,3,2,3,2),KPR_range=c(NA,4,4,2,6,5)) %>%
       filter(complete.cases(.))
我想过滤掉不完整的因子组合,留下完整的因子模型。现在,我的做法如下:

df %>% 
  unite(KPR_mean_range,KPR_mean,KPR_range) %>%
  spread(Vis,KPR_mean_range) %>%
  filter(complete.cases(.)) %>%
  gather(Win,KPR_mean_range,-Name) %>%
  separate(KPR_mean_range,c("KPR_mean","KPR_range"),sep="_")

但这似乎非常冗长,而且一旦存在多个因素和更多变量,就很难扩展。有没有办法对分组变量而不是行进行筛选?也就是说,对于每个级别的名称,如果filter(complete.cases(.))将从该组中删除一行,则改为删除整个组?

对于新数据,将您的答案扩展到所有案例,按您希望完成的案例所在的变量分组,并使用
NA
s筛选出组:

df %>% complete(Vis, Name) %>% group_by(Name) %>% filter(!any(is.na(KPR_mean)))
# Source: local data frame [4 x 4]
# Groups: Name [2]
# 
#      Vis   Name KPR_mean KPR_range
#   (fctr) (fctr)    (dbl)     (dbl)
# 1    Yes   Sub2        1         4
# 2    Yes   Sub3        3         4
# 3     No   Sub2        3         6
# 4     No   Sub3        2         5

这里有一个带有
数据表的选项。我们将'data.frame'转换为'data.table',指定键列(
setDT(df,…
),进行交叉连接,按'Name'分组,如果'KPP_range'中没有'NA'值,则将行分组

library(data.table)
setDT(df, key = c("Name", "Vis"))[CJ(Name, Vis, unique=TRUE)][,
             if(all(!is.na(KPR_mean))) .SD , Name]
#   Name Vis KPR_mean KPR_range
#1: Sub2 Yes        1         4
#2: Sub2  No        3         6
#3: Sub3 Yes        3         4
#4: Sub3  No        2         5

你能重复地共享你的数据吗(例如,使用
dput()
)?阅读你在R中发布的带有列类的示例数据是一件痛苦的事。我想这对你也不起作用,因为
unite
步骤concatenate
NA
作为
NA
complete.cases()
不适用于字符
NA\u NA
。我可能会计算出每个主题需要多少行,并筛选出行数少于该行的组。类似于
n\u expected=length(unique(df$Vis));group\u by(df,Name)%%>%filter(n()==n\u expected)
@Gregor:Oops,我从未意识到你必须这样做。我用一些相同形式的虚构数据替换了它。更改数据会更改代码的结果。