R 按前几行上的多个条件填充NAs
我有一个包含事务的大数据框。字段包括id(用户id)、间隔(0->的整数)、创建(交易日期)、到期(订阅到期日期)和订阅(字符为“一年”或“两年”) 我需要根据基于同一行或前一行的几个条件修改到期时缺少的值R 按前几行上的多个条件填充NAs,r,R,我有一个包含事务的大数据框。字段包括id(用户id)、间隔(0->的整数)、创建(交易日期)、到期(订阅到期日期)和订阅(字符为“一年”或“两年”) 我需要根据基于同一行或前一行的几个条件修改到期时缺少的值 df <- data.frame(id = id, interval = interval, creation = creation, expiry = expiry,
df <- data.frame(id = id,
interval = interval,
creation = creation,
expiry = expiry,
subscription = subscription)
df <- df[order(df[, 1], df[, 3]),]
#loop all rows of ordered df (by subsID and payment date)
for (i in 2:nrow(df)) {
# check NA of expiry
if (is.na(df[i, 4])) {
#if previous row ID and interval match, we treat this as change to subscription
if (df[i-1, 1] == df[i, 1] & df[i-1, 2] == df[i, 2]) {
df[i, 4] <- df[i-1, 4]
# otherwise it's one or two year new subscription so we add days to creation date
} else if (df[i, 5] == "one year") {
df[i, 4] <- df[i, 3] + 365
} else if (df[i, 5] == "two years") {
df[i, 4] <- df[i, 3] + 720
}
}
}
df我想它可能会帮助您:
df <- data.frame(id = id,
interval = interval,
creation = creation,
expiry = expiry,
subscription = subscription)
df <- df[order(df[, 1], df[, 3]),]
library(dplyr)
df$match_previous <- (df[, 1] == lag(df[, 1]) & df[, 2] == lag(df[, 2]))
df$match_previous[1] <- FALSE
df[, 4] <- ifelse(!is.na(df[, 4]),
df[, 4],
ifelse(df$match_previous,
lag(df[, 4]),
ifelse(df[, 5] == "one year",
df[, 3] + 365, df[, 3] + 730)))
df您能否提供数据的模拟以及它的外观以及您希望它的外观?谢谢Kevin,您的解决方案最终成功了,但在某些情况下,一行中有多个NAs,并且match_previous=TRUE脚本会留下一些NAs。然而,通过循环ifelse片段,它摆脱了所有这些。