R-根据同一行不同列上的值填充缺失值(空格)
我使用的是R,并且有以下数据帧示例,其中所有变量都是因子:R-根据同一行不同列上的值填充缺失值(空格),r,missing-data,autofill,R,Missing Data,Autofill,我使用的是R,并且有以下数据帧示例,其中所有变量都是因子: first second third social birth control high birth control high medical Anorexia Nervosa low medical Anorexia Nervosa low Alcoholism high family Alcohol
first second third
social birth control high
birth control high
medical Anorexia Nervosa low
medical Anorexia Nervosa low
Alcoholism high
family Alcoholism high
基本上,我需要一个函数来帮助我根据第二列和第三列中的值填充第一列中的空格。
例如,如果我在第二栏“节育”和在第三栏“高”,我需要填补在第一栏空白与“社会”。如果第二栏和第三栏分别是“酒精中毒”和“高”,我需要在第一栏的空白处填上“家庭” 一种方法是创建某种类型的查找列表(例如,使用命名向量、
因子或类似内容),然后用查找列表中的值替换任何“
值
这里有一个例子(尽管我认为您的问题没有完全定义,可能过于简化)
“data.table”方法将遵循相同的步骤,但其优点是通过引用修改而不是复制
library(data.table)
as.data.table(mydf)[
, condition := sprintf("%s_%s", second, third)][
, condition := as.character(
factor(condition,
c("birth control_high", "Anorexia Nervosa_low", "Alcoholism_high"),
c("social", "medical", "family")))][
first == "", first := condition][
, condition := NULL][]
根据显示的数据,对于“第二个”和“第三个”的每个组合,“第一个”中是否有其他值并不十分清楚。如果只有一个值,而您需要用该值替换'
,那么您可以尝试
library(data.table)
setDT(df1)[, replace(first, first=='', first[first!='']),
list(second, third)]
或者更有效的方法是
setDT(df1)[, first:= first[first!=''] , list(second, third)]
# first second third
#1: social birth control high
#2: social birth control high
#3: medical Anorexia Nervosa low
#4: medical Anorexia Nervosa low
#5: family Alcoholism high
#6: family Alcoholism high
数据
df1另一种使用@akrun的dplyr
方法非常好的解决方案
library(dplyr)
df1 %>% group_by(second, third) %>%
mutate(first=replace(first, first=='', first[first!=''])) %>% ungroup
数据
df1除此之外……发布您试图做的事情。请使用代码。您也有条件列表吗?您也可以从中找到一些灵感。我个人认为replace()
与使用:=
的方法相比效率低,可读性差。谢谢大家……每一个解决方案都对我的问题非常有帮助!
df1 <- structure(list(first = c("social", "", "medical", "medical",
"", "family"), second = c("birth control", "birth control",
"Anorexia Nervosa",
"Anorexia Nervosa", "Alcoholism", "Alcoholism"), third = c("high",
"high", "low", "low", "high", "high")), .Names = c("first", "second",
"third"), class = "data.frame", row.names = c(NA, -6L))
library(dplyr)
df1 %>% group_by(second, third) %>%
mutate(first=replace(first, first=='', first[first!=''])) %>% ungroup
df1 <- structure(list(first = c("social", "", "medical", "medical",
"", "family"), second = c("birth control", "birth control",
"Anorexia Nervosa",
"Anorexia Nervosa", "Alcoholism", "Alcoholism"), third = c("high",
"high", "low", "low", "high", "high")), .Names = c("first", "second",
"third"), class = "data.frame", row.names = c(NA, -6L))