Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 基于特定条件在数据帧中包含/排除行_R_Dataframe - Fatal编程技术网

R 基于特定条件在数据帧中包含/排除行

R 基于特定条件在数据帧中包含/排除行,r,dataframe,R,Dataframe,我有一大组数据,其中包含许多个体的病理学测试数据。我提供了一个描述病例类型的缩小数据集 library(plyr) library(tidyr) library(dplyr) library(lubridate) options(stringsAsFactors = FALSE) dat <- structure(list(PersID = c("am1", "am2", "am2", "am3", "am3", "am4", "am4", "am4", "am4", "am4", "a

我有一大组数据,其中包含许多个体的病理学测试数据。我提供了一个描述病例类型的缩小数据集

library(plyr)
library(tidyr)
library(dplyr)
library(lubridate)

options(stringsAsFactors = FALSE)
dat <- structure(list(PersID = c("am1", "am2", "am2", "am3", "am3", "am4", "am4", "am4", "am4", "am4", "am4"), Sex = c("M", "F","F", "M", "M", "F", "F", "F", "F", "F", "F"), DateTested = c("21/10/2015", "9/07/2010", "24/09/2010", "23/10/2013", "25/10/2013", "28/04/2010", "23/06/2010", "21/07/2010", "20/10/2010", "4/03/2011", "2/12/2011"), Res = c("NR", "R", "R", "NR", "R", "R", "R", "R", "R", "R", "R"), Status = c("Yes", "No", "No", "Yes", "Yes", "No", "No", "No", "No", "No", "No"), DateOrder = c(1L, 1L, 2L, 1L, 2L, 1L, 2L, 3L, 4L, 5L, 6L)), .Names = c("PersID", "Sex", "DateTested", "Res", "Status", "DateOrder"), class = "data.frame", row.names = c(NA, -11L))

在base R中,我将按如下方式进行处理:

# convert the 'DateTested' column to a date-format
dat$DateTested <- as.Date(dat$DateTested, format = "%d/%m/%Y")
# calculate the difference in days with the previous observation in the group
dat$tdiff <- unlist(tapply(dat$DateTested, INDEX = dat$PersID,
                           FUN = function(x) c(0, `units<-`(diff(x), "days"))))
# filter the observations that have either a timedifference of zero or more 
dat[(dat[,"tdiff"]==0 | dat[,"tdiff"] > 30),]
setDT(dat)[, DateTested := as.Date(DateTested, format = "%d/%m/%Y")
           ][, tdiff := c(0, `units<-`(diff(DateTested), "days")), by = PersID
             ][(tdiff==0 | tdiff > 30)]

使用data.table包:

库(data.table)
#将“data.frame”转换为“data.table”
#并将“DateTested”列转换为日期格式
setDT(dat)[,DateTested:=as.Date(DateTested,format=“%d/%m/%Y”)]
#计算组内先前观察的天数差异
dat[,tdiff:=c(0,`units 30)]
这会给你同样的结果。您还可以按如下方式将其链接在一起:

# convert the 'DateTested' column to a date-format
dat$DateTested <- as.Date(dat$DateTested, format = "%d/%m/%Y")
# calculate the difference in days with the previous observation in the group
dat$tdiff <- unlist(tapply(dat$DateTested, INDEX = dat$PersID,
                           FUN = function(x) c(0, `units<-`(diff(x), "days"))))
# filter the observations that have either a timedifference of zero or more 
dat[(dat[,"tdiff"]==0 | dat[,"tdiff"] > 30),]
setDT(dat)[, DateTested := as.Date(DateTested, format = "%d/%m/%Y")
           ][, tdiff := c(0, `units<-`(diff(DateTested), "days")), by = PersID
             ][(tdiff==0 | tdiff > 30)]
setDT(dat)[,DateTested:=as.Date(DateTested,format=“%d/%m/%Y”)
][,tdiff:=c(0,`units 30)]

并使用dplyr:

库(dplyr)
dat%>%
mutate(DateTested=as.Date(DateTested,format=“%d/%m/%Y”))%>%
分组依据(PersID)%>%
突变(tdiff=c(0,`单位%
过滤器(tdiff==0 | tdiff>30)
这也会给你同样的结果。

对于版本1.9.8(2016年11月25日),数据表
包获得了
inrange()
函数,该函数使用非等联接执行范围联接

分别使用
inrange()
%inrange%
运算符,可以使用

library(data.table) # CRAN version 1.10.4-2 used
data.table(dat)[, DateTested := as.IDate(DateTested, "%d/%m/%Y")][
  , .SD[!DateTested %inrange% list(DateTested + 1L, DateTested + 30L)], by = PersID]
对于每个
PersID
,都会查找属于日期范围[第二天,30天后]的任何其他条目。这些条目将从结果中排除

排除的行可以通过以下方式显示:

data.table(dat)[, DateTested := as.IDate(DateTested, "%d/%m/%Y")][
  , .SD[DateTested %inrange% list(DateTested + 1L, DateTested + 30L)], by = PersID]

这是一个非常优雅和彻底的解决方案。我选择了dplyr版本。我真的很欣赏这一点。我认为SBista的更加简洁和优雅。OP显然使用了
dplyr
,因此这个答案有相当多的粗枝大叶。
library(data.table)
# convert the 'data.frame' to a 'data.table'
# and convert the 'DateTested' column to a date-format
setDT(dat)[, DateTested := as.Date(DateTested, format = "%d/%m/%Y")]
# calculate the difference in days with the previous observation in the group
dat[, tdiff := c(0, `units<-`(diff(DateTested), "days")), PersID]
# filter the observations that have either a timedifference of zero or more than 30 days
dat[(tdiff==0 | tdiff > 30)]
setDT(dat)[, DateTested := as.Date(DateTested, format = "%d/%m/%Y")
           ][, tdiff := c(0, `units<-`(diff(DateTested), "days")), by = PersID
             ][(tdiff==0 | tdiff > 30)]
library(dplyr)
dat %>% 
  mutate(DateTested = as.Date(DateTested, format = "%d/%m/%Y")) %>%
  group_by(PersID) %>%
  mutate(tdiff = c(0, `units<-`(diff(DateTested), "days"))) %>%
  filter(tdiff == 0 | tdiff > 30)
library(data.table) # CRAN version 1.10.4-2 used
data.table(dat)[, DateTested := as.IDate(DateTested, "%d/%m/%Y")][
  , .SD[!DateTested %inrange% list(DateTested + 1L, DateTested + 30L)], by = PersID]
   PersID Sex DateTested Res Status DateOrder
1:    am1   M 2015-10-21  NR    Yes         1
2:    am2   F 2010-07-09   R     No         1
3:    am2   F 2010-09-24   R     No         2
4:    am3   M 2013-10-23  NR    Yes         1
5:    am4   F 2010-04-28   R     No         1
6:    am4   F 2010-06-23   R     No         2
7:    am4   F 2010-10-20   R     No         4
8:    am4   F 2011-03-04   R     No         5
9:    am4   F 2011-12-02   R     No         6
data.table(dat)[, DateTested := as.IDate(DateTested, "%d/%m/%Y")][
  , .SD[DateTested %inrange% list(DateTested + 1L, DateTested + 30L)], by = PersID]
   PersID Sex DateTested Res Status DateOrder
1:    am3   M 2013-10-25   R    Yes         2
2:    am4   F 2010-07-21   R     No         3