R 在多个列中的行之间创建特定更改的指示器
我想为数据帧中的几个不同列创建连续行之间特定值之间转换的指示符 一些样本数据:R 在多个列中的行之间创建特定更改的指示器,r,R,我想为数据帧中的几个不同列创建连续行之间特定值之间转换的指示符 一些样本数据: structure(list(Year = 1998:2007, Pregnant = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"), Infection = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,
structure(list(Year = 1998:2007, Pregnant = structure(c(2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"),
Infection = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,
1L), .Label = c("Negative", "Positive"), class = "factor"),
Keep = c(0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L)), .Names = c("Year",
"Pregnant", "Infection", "Keep"), class = "data.frame", row.names = c(NA,
-10L))
# Year Pregnant Infection Keep
# 1 1998 Yes Positive 0
# 2 1999 Yes Positive 0
# 3 2000 No Negative 0
# 4 2001 No Negative 1 # Infection changes from Negative to Positive
# 5 2002 No Positive 1
# 6 2003 No Positive 0
# 7 2004 No Negative 0
# 8 2005 No Negative 1 # Pregnant changes from No to Yes
# 9 2006 Yes Negative 1
# 10 2007 Yes Negative 0
我想按特定顺序标记发生更改的行。例如
怀孕列值从“否”(第8行)更改为“是”(第9行),感染列值从“负”(第4行)更改为“正”(第5行)。所以我想标记这些行(Keep列将标记的行指示为1)
列中还有其他变化,如怀孕-是到否,感染阳性到阴性,但这些变化并不重要;我只想以特定的顺序指示值序列
Variable - Pregnant, From - 'No', To - 'Yes'
Variable - Infection, From - 'Negative', To - 'Positive'
我有20多列,我想检测每列中的某些变化,并创建相应的指标变量。类似的东西怎么样
df %>%
mutate(
grp.Preg = c(diff(as.numeric(Pregnant)) > 0, 0),
grp.Infc = c(diff(as.numeric(Infection)) > 0, 0),
flagChangePreg = abs(grp.Preg - lag(grp.Preg, default = 0)),
flagChangeInfc = abs(grp.Infc - lag(grp.Infc, default = 0))) %>%
select(-grp.Preg, -grp.Infc)
# Year Pregnant Infection Keep flagChangePreg flagChangeInfc
#1 1998 Yes Positive 0 0 0
#2 1999 Yes Positive 0 0 0
#3 2000 No Negative 0 0 0
#4 2001 No Negative 1 0 1
#5 2002 No Positive 1 0 1
#6 2003 No Positive 0 0 0
#7 2004 No Negative 0 0 0
#8 2005 No Negative 1 1 0
#9 2006 Yes Negative 1 1 0
#10 2007 Yes Negative 0 0 0
列
flagchangepeg
和flagChangeInfc
标记行中的条目,其中怀孕
从否更改为“是”
,感染
从“阴性”
更改为“阳性”
分别。首先明确地将所有因子级别设置为所需的从到顺序(而不是“希望”它们与字母排序一致;)
通过创建一个有序因子,您可以将连续的行与Thank@Maurits进行比较,您的解决方案将值的顺序设置为字母顺序('No'到'Yes'),效果很好。但是否可以指定顺序,而不是让其采用字母顺序?@JeanVuda该顺序由因子
级别的顺序决定(默认情况下为字母顺序);为了定义一个特定的顺序,我将设置一个因子
级别的特定顺序。
# select relevant columns from original data
d <- df[ , 2:3]
# or, assuming that 'Keep' is not in original data, just remove the first column 'Year'
# d <- df[ , -1]
# set factor levels in order of from-to
d$Pregnant <- factor(d$Pregnant, levels = c("No", "Yes"), ordered = TRUE)
d$Infection <- factor(d$Infection, levels = c("Negative", "Positive"), ordered = TRUE)
# check if factor levels are 'increasing' between rows
m <- d[-nrow(d), ] < d[-1, ]
# add a FALSE row to restore dimensions
m <- rbind(rep(FALSE, ncol(m)), m)
# get indices of changes
ix <- which(m, arr.ind = TRUE)
# set also preceeding rows to TRUE
m[cbind(ix[ , 1] - 1, ix[ , 2])] <- TRUE
dimnames(m) <- list(NULL, paste0(colnames(m), "_diff"))
m <- m + 0
cbind(df, Keep2 = as.integer(rowSums(m) != 0), m)
# Year Pregnant Infection Keep Keep2 Pregnant_diff Infection_diff
# 1 1998 Yes Positive 0 0 0 0
# 2 1999 Yes Positive 0 0 0 0
# 3 2000 No Negative 0 0 0 0
# 4 2001 No Negative 1 1 0 1
# 5 2002 No Positive 1 1 0 1
# 6 2003 No Positive 0 0 0 0
# 7 2004 No Negative 0 0 0 0
# 8 2005 No Negative 1 1 1 0
# 9 2006 Yes Negative 1 1 1 0
# 10 2007 Yes Negative 0 0 0 0