Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/80.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 基于2个数据帧中的行是否匹配创建新列_R_Dataframe_Dplyr_Conditional Statements_Mutate - Fatal编程技术网

R 基于2个数据帧中的行是否匹配创建新列

R 基于2个数据帧中的行是否匹配创建新列,r,dataframe,dplyr,conditional-statements,mutate,R,Dataframe,Dplyr,Conditional Statements,Mutate,这似乎很简单,但无法理解。我想在df2(impute\u id)中创建一个新列,用于标识值(测量值)是否已插补,或者它是否是df1中的原始观察值。如果行匹配,则在df2中的新列中,插补id,分配字符串观察到的,如果行不匹配,则分配字符串插补的。如果可能的话,我想使用dplyr。还要注意的是,数据帧中的行可能顺序不同,即使它们在示例中也是如此 示例 原始数据 插补数据 所需输出 df2 time protocol measurement_type sample measureme

这似乎很简单,但无法理解。我想在
df2
impute\u id
)中创建一个新列,用于标识值(
测量值
)是否已插补,或者它是否是
df1
中的原始观察值。如果行匹配,则在
df2
中的新列中,
插补id
,分配字符串
观察到的
,如果行不匹配,则分配字符串
插补的
。如果可能的话,我想使用
dplyr
。还要注意的是,数据帧中的行可能顺序不同,即使它们在示例中也是如此


示例

原始数据

插补数据


所需输出

df2
   time protocol     measurement_type sample measurement  impute_id
1     0     HPLC cis,cis-Muconic acid      a     0.57561   observed
2     0     HPLC            D-Glucose      a    33.95529    imputed
3     0     HPLC cis,cis-Muconic acid      a     0.57561    imputed
4     0     HPLC            D-Glucose      b    33.95529    imputed
5     0    OD600      Optical Density      b     0.14430   observed
6    22     HPLC cis,cis-Muconic acid      b     0.57561    imputed
7    22     HPLC            D-Glucose      a    33.95529    imputed
8    22    OD600      Optical Density      a     0.14430    imputed
9    24     HPLC cis,cis-Muconic acid      a     0.57561    imputed
10   24     HPLC            D-Glucose      b    33.95529   observed
可再现数据

原始数据

df1可能类似

library(dplyr)

df1 %>%
  group_by(measurement_type) %>%
  mutate(impute_id = ifelse(is.na(measurement), "imputed", "observed"),
         measurement = min(measurement, na.rm = TRUE))

   time protocol     measurement_type sample measurement  impute_id
1     0     HPLC cis,cis-Muconic acid      a     0.57561 observed
2     0     HPLC            D-Glucose      a    33.95529  imputed
3     0     HPLC cis,cis-Muconic acid      a     0.57561  imputed
4     0     HPLC            D-Glucose      b    33.95529  imputed
5     0    OD600      Optical Density      b     0.14430 observed
6    22     HPLC cis,cis-Muconic acid      b     0.57561  imputed
7    22     HPLC            D-Glucose      a    33.95529  imputed
8    22    OD600      Optical Density      a     0.14430  imputed
9    24     HPLC cis,cis-Muconic acid      a     0.57561  imputed
10   24     HPLC            D-Glucose      b    33.95529 observed
df2
   time protocol     measurement_type sample measurement  impute_id
1     0     HPLC cis,cis-Muconic acid      a     0.57561   observed
2     0     HPLC            D-Glucose      a    33.95529    imputed
3     0     HPLC cis,cis-Muconic acid      a     0.57561    imputed
4     0     HPLC            D-Glucose      b    33.95529    imputed
5     0    OD600      Optical Density      b     0.14430   observed
6    22     HPLC cis,cis-Muconic acid      b     0.57561    imputed
7    22     HPLC            D-Glucose      a    33.95529    imputed
8    22    OD600      Optical Density      a     0.14430    imputed
9    24     HPLC cis,cis-Muconic acid      a     0.57561    imputed
10   24     HPLC            D-Glucose      b    33.95529   observed
df1 <- structure(list(time = c(0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 24L, 
24L), protocol = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 
1L, 1L), .Label = c("HPLC", "OD600"), class = "factor"), measurement_type = structure(c(1L, 
2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("cis,cis-Muconic acid", 
"D-Glucose", "Optical Density"), class = "factor"), sample = c("a", 
"a", "a", "b", "b", "b", "a", "a", "a", "b"), measurement = c(0.57561, 
NA, NA, NA, 0.1443, NA, NA, NA, NA, 33.95529)), row.names = c(NA, 
-10L), class = "data.frame")
df2 <- structure(list(time = c(0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 24L, 
24L), protocol = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 
1L, 1L), .Label = c("HPLC", "OD600"), class = "factor"), measurement_type = structure(c(1L, 
2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("cis,cis-Muconic acid", 
"D-Glucose", "Optical Density"), class = "factor"), sample = c("a", 
"a", "a", "b", "b", "b", "a", "a", "a", "b"), measurement = c(0.57561, 
33.95529, 0.57561, 33.95529, 0.1443, 0.57561, 33.95529, 0.1443, 
0.57561, 33.95529)), row.names = c(NA, -10L), class = "data.frame")
library(dplyr)

df1 %>%
  group_by(measurement_type) %>%
  mutate(impute_id = ifelse(is.na(measurement), "imputed", "observed"),
         measurement = min(measurement, na.rm = TRUE))

   time protocol     measurement_type sample measurement  impute_id
1     0     HPLC cis,cis-Muconic acid      a     0.57561 observed
2     0     HPLC            D-Glucose      a    33.95529  imputed
3     0     HPLC cis,cis-Muconic acid      a     0.57561  imputed
4     0     HPLC            D-Glucose      b    33.95529  imputed
5     0    OD600      Optical Density      b     0.14430 observed
6    22     HPLC cis,cis-Muconic acid      b     0.57561  imputed
7    22     HPLC            D-Glucose      a    33.95529  imputed
8    22    OD600      Optical Density      a     0.14430  imputed
9    24     HPLC cis,cis-Muconic acid      a     0.57561  imputed
10   24     HPLC            D-Glucose      b    33.95529 observed