R 基于2个数据帧中的行是否匹配创建新列
这似乎很简单,但无法理解。我想在R 基于2个数据帧中的行是否匹配创建新列,r,dataframe,dplyr,conditional-statements,mutate,R,Dataframe,Dplyr,Conditional Statements,Mutate,这似乎很简单,但无法理解。我想在df2(impute\u id)中创建一个新列,用于标识值(测量值)是否已插补,或者它是否是df1中的原始观察值。如果行匹配,则在df2中的新列中,插补id,分配字符串观察到的,如果行不匹配,则分配字符串插补的。如果可能的话,我想使用dplyr。还要注意的是,数据帧中的行可能顺序不同,即使它们在示例中也是如此 示例 原始数据 插补数据 所需输出 df2 time protocol measurement_type sample measureme
df2
(impute\u id
)中创建一个新列,用于标识值(测量值
)是否已插补,或者它是否是df1
中的原始观察值。如果行匹配,则在df2
中的新列中,插补id
,分配字符串观察到的
,如果行不匹配,则分配字符串插补的
。如果可能的话,我想使用dplyr
。还要注意的是,数据帧中的行可能顺序不同,即使它们在示例中也是如此
示例 原始数据 插补数据
所需输出
df2
time protocol measurement_type sample measurement impute_id
1 0 HPLC cis,cis-Muconic acid a 0.57561 observed
2 0 HPLC D-Glucose a 33.95529 imputed
3 0 HPLC cis,cis-Muconic acid a 0.57561 imputed
4 0 HPLC D-Glucose b 33.95529 imputed
5 0 OD600 Optical Density b 0.14430 observed
6 22 HPLC cis,cis-Muconic acid b 0.57561 imputed
7 22 HPLC D-Glucose a 33.95529 imputed
8 22 OD600 Optical Density a 0.14430 imputed
9 24 HPLC cis,cis-Muconic acid a 0.57561 imputed
10 24 HPLC D-Glucose b 33.95529 observed
可再现数据
原始数据
df1可能类似
library(dplyr)
df1 %>%
group_by(measurement_type) %>%
mutate(impute_id = ifelse(is.na(measurement), "imputed", "observed"),
measurement = min(measurement, na.rm = TRUE))
time protocol measurement_type sample measurement impute_id
1 0 HPLC cis,cis-Muconic acid a 0.57561 observed
2 0 HPLC D-Glucose a 33.95529 imputed
3 0 HPLC cis,cis-Muconic acid a 0.57561 imputed
4 0 HPLC D-Glucose b 33.95529 imputed
5 0 OD600 Optical Density b 0.14430 observed
6 22 HPLC cis,cis-Muconic acid b 0.57561 imputed
7 22 HPLC D-Glucose a 33.95529 imputed
8 22 OD600 Optical Density a 0.14430 imputed
9 24 HPLC cis,cis-Muconic acid a 0.57561 imputed
10 24 HPLC D-Glucose b 33.95529 observed
df2
time protocol measurement_type sample measurement impute_id
1 0 HPLC cis,cis-Muconic acid a 0.57561 observed
2 0 HPLC D-Glucose a 33.95529 imputed
3 0 HPLC cis,cis-Muconic acid a 0.57561 imputed
4 0 HPLC D-Glucose b 33.95529 imputed
5 0 OD600 Optical Density b 0.14430 observed
6 22 HPLC cis,cis-Muconic acid b 0.57561 imputed
7 22 HPLC D-Glucose a 33.95529 imputed
8 22 OD600 Optical Density a 0.14430 imputed
9 24 HPLC cis,cis-Muconic acid a 0.57561 imputed
10 24 HPLC D-Glucose b 33.95529 observed
df1 <- structure(list(time = c(0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 24L,
24L), protocol = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L,
1L, 1L), .Label = c("HPLC", "OD600"), class = "factor"), measurement_type = structure(c(1L,
2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("cis,cis-Muconic acid",
"D-Glucose", "Optical Density"), class = "factor"), sample = c("a",
"a", "a", "b", "b", "b", "a", "a", "a", "b"), measurement = c(0.57561,
NA, NA, NA, 0.1443, NA, NA, NA, NA, 33.95529)), row.names = c(NA,
-10L), class = "data.frame")
df2 <- structure(list(time = c(0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 24L,
24L), protocol = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L,
1L, 1L), .Label = c("HPLC", "OD600"), class = "factor"), measurement_type = structure(c(1L,
2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("cis,cis-Muconic acid",
"D-Glucose", "Optical Density"), class = "factor"), sample = c("a",
"a", "a", "b", "b", "b", "a", "a", "a", "b"), measurement = c(0.57561,
33.95529, 0.57561, 33.95529, 0.1443, 0.57561, 33.95529, 0.1443,
0.57561, 33.95529)), row.names = c(NA, -10L), class = "data.frame")
library(dplyr)
df1 %>%
group_by(measurement_type) %>%
mutate(impute_id = ifelse(is.na(measurement), "imputed", "observed"),
measurement = min(measurement, na.rm = TRUE))
time protocol measurement_type sample measurement impute_id
1 0 HPLC cis,cis-Muconic acid a 0.57561 observed
2 0 HPLC D-Glucose a 33.95529 imputed
3 0 HPLC cis,cis-Muconic acid a 0.57561 imputed
4 0 HPLC D-Glucose b 33.95529 imputed
5 0 OD600 Optical Density b 0.14430 observed
6 22 HPLC cis,cis-Muconic acid b 0.57561 imputed
7 22 HPLC D-Glucose a 33.95529 imputed
8 22 OD600 Optical Density a 0.14430 imputed
9 24 HPLC cis,cis-Muconic acid a 0.57561 imputed
10 24 HPLC D-Glucose b 33.95529 observed