R:连接表,其中t1.key1=t2.key1和t2.key2部分匹配t2.key2
我需要在R中完成一个有趣的连接。下面是我的两个表 表1:R:连接表,其中t1.key1=t2.key1和t2.key2部分匹配t2.key2,r,regex,join,match,data-science,R,Regex,Join,Match,Data Science,我需要在R中完成一个有趣的连接。下面是我的两个表 表1: Date Name 2016-01-02 10:18:00 CARDOSO, RAMON 2016-01-02 15:02:00 HARRISON, KATHYANNE M 2016-01-02 15:02:00 PALEO, SHERI 2016-01-03 02:09:00 PHANOR, REN
Date Name
2016-01-02 10:18:00 CARDOSO, RAMON
2016-01-02 15:02:00 HARRISON, KATHYANNE M
2016-01-02 15:02:00 PALEO, SHERI
2016-01-03 02:09:00 PHANOR, RENALDY
2016-01-03 09:42:00 GUAMAN, ANGEL
2016-01-03 18:47:00 AIME, MADELINE
2016-01-03 18:47:00 CADET, GARDY
2016-01-03 19:31:00 REID, ARTHUR D
2016-01-03 22:11:00 HERNANDEZ-JONES, FREDRICK JOSHUA
2016-01-04 12:32:00 AGUERO, RAUL
表2:
Date ID Name
2016-01-02 10:18:00 16-22-AR CARDOSO, RAMON
2016-01-02 15:02:00 16-24-AR HARRISON, KATHYANNE M", " PALEO, SHERI"
2016-01-02 15:02:00 16-25-AR HARRISON, KATHYANNE M", " PALEO, SHERI"
2016-01-03 02:09:00 16-31-AR PHANOR, RENALDY
2016-01-03 09:42:00 16-32-AR GUAMAN, ANGEL
2016-01-03 18:47:00 16-39-AR AIME, MADELINE", " CADET, GARDY"
2016-01-03 18:47:00 16-40-AR AIME, MADELINE", " CADET, GARDY"
2016-01-03 19:31:00 16-42-AR REID, ARTHUR D
2016-01-03 22:11:00 16-44-AR HERNANDEZ-JONES, FREDRICK JOSHUA
2016-01-04 12:32:00 16-49-AR AGUERO, RAUL
我的目标是将表1中的ID作为它自己的列,但为了做到这一点,我需要在日期上进行连接,并以某种方式在名称上进行匹配,从表2中的表1中查找名称
更新:
原始数据集如下所示
2016-01-02 10:18:00 16-22-AR CARDOSO, RAMON
2016-01-02 15:02:00 16-24-AR, 16-25-AR HARRISON, KATHYANNE M", " PALEO, SHERI"
2016-01-03 02:09:00 16-31-AR PHANOR, RENALDY
2016-01-03 09:42:00 16-32-AR GUAMAN, ANGEL
2016-01-03 18:47:00 16-39-AR, 16-40-AR AIME, MADELINE", " CADET, GARDY"
2016-01-03 19:31:00 16-42-AR REID, ARTHUR D
2016-01-03 22:11:00 16-44-AR HERNANDEZ-JONES, FREDRICK JOSHUA
2016-01-04 12:32:00 16-49-AR AGUERO, RAUL
我们的目标是让每个名字在自己的行中有各自的ID。ID与名字的顺序相同,第一个ID与第一个名字一起
希望这一澄清能有所帮助。我想当你有
2016-01-02 15:02:00 16-24-AR HARRISON, KATHYANNE M", " PALEO, SHERI"
2016-01-02 15:02:00 16-25-AR HARRISON, KATHYANNE M", " PALEO, SHERI"
第一个ID对应于第一个名称,第二个ID对应于第二个名称。然后,一种方法是创建具有正确名称的新列
d$order <- unlist(sapply(rle(paste0(d$Date, d$Name))$lengths, seq_len))
split_names <- function(name, order = 1) {
names <- strsplit(name, '\\", \\"')[[1]] # Split
names <- gsub('^\\s|\\"', "", names) # Clean up leading space and trailing "
names[order]
}
d$Newname <- mapply(split_names, d$Name, d$order)
d[, c("Date", "ID", "Newname")]
# Date ID Newname
# 1 2016-01-02 10:18:00 16-22-AR CARDOSO, RAMON
# 2 2016-01-02 15:02:00 16-24-AR HARRISON, KATHYANNE M
# 3 2016-01-02 15:02:00 16-25-AR PALEO, SHERI
# 4 2016-01-03 02:09:00 16-31-AR PHANOR, RENALDY
# 5 2016-01-03 09:42:00 16-32-AR GUAMAN, ANGEL
# 6 2016-01-03 18:47:00 16-39-AR AIME, MADELINE
# 7 2016-01-03 18:47:00 16-40-AR CADET, GARDY
# 8 2016-01-03 19:31:00 16-42-AR REID, ARTHUR D
# 9 2016-01-03 22:11:00 16-44-AR HERNANDEZ-JONES, FREDRICK JOSHUA
# 10 2016-01-04 12:32:00 16-49-AR AGUERO, RAUL
对表使用代码块格式化
{}
。不要使用snippet。你可以分成多个列,然后在日期和名称@joel.wilson加入,我不确定你说的混淆是什么意思?我很难理解排序是如何工作的,也很难在我的代码中实现此解决方案。
structure(list(Date = c("2016-01-02 10:18:00", "2016-01-02 15:02:00",
"2016-01-02 15:02:00", "2016-01-03 02:09:00", "2016-01-03 09:42:00",
"2016-01-03 18:47:00", "2016-01-03 18:47:00", "2016-01-03 19:31:00",
"2016-01-03 22:11:00", "2016-01-04 12:32:00"), ID = c("16-22-AR",
"16-24-AR", "16-25-AR", "16-31-AR", "16-32-AR", "16-39-AR", "16-40-AR",
"16-42-AR", "16-44-AR", "16-49-AR"), Name = c("CARDOSO, RAMON",
"HARRISON, KATHYANNE M\", \" PALEO, SHERI\"", "HARRISON, KATHYANNE M\", \" PALEO, SHERI\"",
"PHANOR, RENALDY", "GUAMAN, ANGEL", "AIME, MADELINE\", \" CADET, GARDY\"",
"AIME, MADELINE\", \" CADET, GARDY\"", "REID, ARTHUR D", "HERNANDEZ-JONES, FREDRICK JOSHUA",
"AGUERO, RAUL")), .Names = c("Date", "ID", "Name"), row.names = c(NA,
-10L), class = "data.frame")