R 在多个列中的行之间创建特定更改的指示器_R

R 在多个列中的行之间创建特定更改的指示器

R 在多个列中的行之间创建特定更改的指示器,r,R,我想为数据帧中的几个不同列创建连续行之间特定值之间转换的指示符一些样本数据： structure(list(Year = 1998:2007, Pregnant = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"), Infection = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,

我想为数据帧中的几个不同列创建连续行之间特定值之间转换的指示符

一些样本数据：

structure(list(Year = 1998:2007, Pregnant = structure(c(2L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"), 
    Infection = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 
    1L), .Label = c("Negative", "Positive"), class = "factor"), 
    Keep = c(0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L)), .Names = c("Year", 
"Pregnant", "Infection", "Keep"), class = "data.frame", row.names = c(NA, 
-10L))

#    Year Pregnant Infection Keep
# 1  1998      Yes  Positive    0
# 2  1999      Yes  Positive    0
# 3  2000       No  Negative    0
# 4  2001       No  Negative    1 # Infection changes from Negative to Positive 
# 5  2002       No  Positive    1
# 6  2003       No  Positive    0
# 7  2004       No  Negative    0
# 8  2005       No  Negative    1 # Pregnant changes from No to Yes
# 9  2006      Yes  Negative    1
# 10 2007      Yes  Negative    0

我想按特定顺序标记发生更改的行。例如怀孕列值从“否”（第8行）更改为“是”（第9行），感染列值从“负”（第4行）更改为“正”（第5行）。所以我想标记这些行（Keep列将标记的行指示为1）

列中还有其他变化，如怀孕-是到否，感染阳性到阴性，但这些变化并不重要；我只想以特定的顺序指示值序列

Variable - Pregnant, From - 'No', To - 'Yes' 
Variable - Infection, From - 'Negative', To - 'Positive'

我有20多列，我想检测每列中的某些变化，并创建相应的指标变量。

类似的东西怎么样

df %>%
    mutate(
        grp.Preg = c(diff(as.numeric(Pregnant)) > 0, 0),
        grp.Infc = c(diff(as.numeric(Infection)) > 0, 0),
        flagChangePreg = abs(grp.Preg - lag(grp.Preg, default = 0)),
        flagChangeInfc = abs(grp.Infc - lag(grp.Infc, default = 0))) %>%
    select(-grp.Preg, -grp.Infc)
#   Year Pregnant Infection Keep flagChangePreg flagChangeInfc
#1  1998      Yes  Positive    0              0              0
#2  1999      Yes  Positive    0              0              0
#3  2000       No  Negative    0              0              0
#4  2001       No  Negative    1              0              1
#5  2002       No  Positive    1              0              1
#6  2003       No  Positive    0              0              0
#7  2004       No  Negative    0              0              0
#8  2005       No  Negative    1              1              0
#9  2006      Yes  Negative    1              1              0
#10 2007      Yes  Negative    0              0              0

列

flagchangepeg

和

flagChangeInfc

标记行中的条目，其中

怀孕

从

否更改为“是”
，感染
从“阴性”
更改为“阳性”
分别。
首先明确地将所有因子级别设置为所需的从到顺序（而不是“希望”它们与字母排序一致；）
通过创建一个有序因子，您可以将连续的行与Thank@Maurits进行比较，您的解决方案将值的顺序设置为字母顺序（'No'到'Yes'），效果很好。但是否可以指定顺序，而不是让其采用字母顺序？@JeanVuda该顺序由因子
级别的顺序决定（默认情况下为字母顺序）；为了定义一个特定的顺序，我将设置一个因子级别的特定顺序。
# select relevant columns from original data
d <- df[ , 2:3]
# or, assuming that 'Keep' is not in original data, just remove the first column 'Year'
# d <- df[ , -1]

# set factor levels in order of from-to
d$Pregnant <- factor(d$Pregnant, levels = c("No", "Yes"), ordered = TRUE)
d$Infection <- factor(d$Infection, levels = c("Negative", "Positive"), ordered = TRUE)

# check if factor levels are 'increasing' between rows
m <- d[-nrow(d), ] < d[-1, ]

# add a FALSE row to restore dimensions
m <- rbind(rep(FALSE, ncol(m)), m)

# get indices of changes
ix <- which(m, arr.ind = TRUE)

# set also preceeding rows to TRUE
m[cbind(ix[ , 1] - 1, ix[ , 2])] <- TRUE

dimnames(m) <- list(NULL, paste0(colnames(m), "_diff"))
m <- m + 0

cbind(df, Keep2 = as.integer(rowSums(m) != 0), m) 

#     Year Pregnant Infection Keep Keep2 Pregnant_diff Infection_diff
# 1  1998      Yes  Positive    0     0             0              0
# 2  1999      Yes  Positive    0     0             0              0
# 3  2000       No  Negative    0     0             0              0
# 4  2001       No  Negative    1     1             0              1
# 5  2002       No  Positive    1     1             0              1
# 6  2003       No  Positive    0     0             0              0
# 7  2004       No  Negative    0     0             0              0
# 8  2005       No  Negative    1     1             1              0
# 9  2006      Yes  Negative    1     1             1              0
# 10 2007      Yes  Negative    0     0             0              0