将数据集的行与R中的另一个数据集进行比较
我有数据集1,有1400行25列,数据集2,有400行5列。两个数据集都有一个名为ID的列。作为一个小示例,我可以如下所示说明它们: 数据集1:将数据集的行与R中的另一个数据集进行比较,r,dataset,compare,R,Dataset,Compare,我有数据集1,有1400行25列,数据集2,有400行5列。两个数据集都有一个名为ID的列。作为一个小示例,我可以如下所示说明它们: 数据集1: ID c1 c2 c3 c4 12 m n 5 1/2/2015 5 c x 4 2/3/2015 45 g t 47 4/23/2015 45 j t 3 1/1/2016 61 t y 12 7/3/2015 3 r n 18 3/3/2015
ID c1 c2 c3 c4
12 m n 5 1/2/2015
5 c x 4 2/3/2015
45 g t 47 4/23/2015
45 j t 3 1/1/2016
61 t y 12 7/3/2015
3 r n 18 3/3/2015
数据集2:
ID a1 a2
45 1 1/1/2015
3 5 2/2/2016
12 12 4/29/2016
(如您所见,dataset2中的ID是dataset1中ID的子集)
我想要的是:对于dataset1的每一行,如果列ID中的值等于dataset2的列ID中的值,则将该行dataset2的列a2的相应值复制到dataset1的新列中,如下所示:
ID c1 c2 c3 c4 c5
12 m n 5 1/2/2015 4/29/2016
5 c x 4 2/3/2015 NA
45 g t 47 4/23/2015 1/1/2015
45 j t 3 1/1/2016 1/1/2015
61 t y 12 7/3/2015 NA
3 r n 18 3/3/2015 2/2/2016
我感谢你的帮助 如@42所述,您可以使用match 这是match的一个示例:
# match the ID of df1 with that of df2
# then returns the index of df2 that
# matches df1
# then subset the a2 column using the above index
# then store in a new column in df1
df1$c5 <- df2$a2[match(df1$ID, df2$ID)]
#将df1的ID与df2的ID匹配
#然后返回df2的索引
#匹配df1
#然后使用上述索引对a2列进行子集划分
#然后存储在df1中的新列中
df1$c5 df1
ID c1 c2 c3 c4 c5
112M n 5 01/02/2015 4/29/2016
2 5 c x 4 01/02/2015
3 45 g t 47 01/02/2015 01/01/2015
4 45 j t 3 01/02/2015 01/01/2015
5 61 t y 12 01/02/2015
6 3 r n 18 01/02/2015 02/02/2016
din的答案是完美的。另一种考虑方法是合并到数据帧
数据准备
ex_data1 <- data.frame(ID = c(12, 5, 45, 45, 61, 3),
c1 = c("m", "c", "g", "j", "t", "r"),
c2 = c("n", "x", "t", "t", "y", "n"),
c3 = c(5, 4, 47, 3, 12, 8),
c4 = c("1/2/2015", "2/3/2015", "4/23/2015",
"1/1/2016", "7/3/2015", "3/3/2015"),
stringsAsFactors = FALSE)
ex_data2 <- data.frame(ID = c(45, 3, 12),
a1 = c(1, 5, 12),
a2 = c("1/1/2015", "2/2/2016", "4/29/2016"), stringsAsFactors = FALSE)
欢迎来到StackOverflow!请阅读相关信息以及如何给出建议。这将使其他人更容易帮助您。可能需要
匹配
或哪个
,但我通常不会回答,除非添加了示例。请帮我回答!
ex_data1 <- data.frame(ID = c(12, 5, 45, 45, 61, 3),
c1 = c("m", "c", "g", "j", "t", "r"),
c2 = c("n", "x", "t", "t", "y", "n"),
c3 = c(5, 4, 47, 3, 12, 8),
c4 = c("1/2/2015", "2/3/2015", "4/23/2015",
"1/1/2016", "7/3/2015", "3/3/2015"),
stringsAsFactors = FALSE)
ex_data2 <- data.frame(ID = c(45, 3, 12),
a1 = c(1, 5, 12),
a2 = c("1/1/2015", "2/2/2016", "4/29/2016"), stringsAsFactors = FALSE)
ex_data3 <- ex_data2[, c("ID", "a2")]
names(ex_data3) <- c("ID", "c5")
m_data <- merge(ex_data1, ex_data3, by = "ID", all = TRUE)
library(dplyr)
m_data <- ex_data1 %>%
left_join(ex_data2, by = "ID") %>%
select(-a1, c5 = a2)