R 使用非唯一列连接两个数据帧_R_Dataframe_Dplyr

R 使用非唯一列连接两个数据帧

r dataframe

R 使用非唯一列连接两个数据帧,r,dataframe,dplyr,R,Dataframe,Dplyr,概要我已经有了一个使用for循环的解决方案，但我想知道是否有优雅的方法，可能使用dplyr或base R 现有数据 2个数据帧。两者都有确切数量的非唯一性标记的确切顺序；除了eeg有不可预测的零数。行为数据集behav具有与标记相关的刺激数stim。实际上，我在每个数据帧中有更多的列，但为了简单起见，没有包括它们 behav = data.frame( marker = c(1,2,3,1,2,3,7,13), stim = c(168,168,168,78,78,78,23,55

概要

我已经有了一个使用for循环的解决方案，但我想知道是否有优雅的方法，可能使用dplyr或base R

现有数据

2个数据帧。两者都有确切数量的非唯一性标记的确切顺序；除了eeg有不可预测的零数。行为数据集behav具有与标记相关的刺激数stim。实际上，我在每个数据帧中有更多的列，但为了简单起见，没有包括它们

behav = data.frame(
  marker = c(1,2,3,1,2,3,7,13),
  stim   = c(168,168,168,78,78,78,23,55)
)

eeg = data.frame(
  marker = c(0,0,1,0,0,2,0,0,3,0,0,1,0,0,2,0,0,3,0,7,0,13)
)

要求

我需要用behav中的刺激数标记eeg数据。必须保留行顺序

结果应该如下所示：

eeg2 = data.frame(
  marker = c(0,0,1,0,0,2,0,0,3,0,0,1,0,0,2,0,0,3,0,7,0,13),
  stim   = c(0,0,168,0,0,168,0,0,168,0,0,78,0,0,78,0,0,78,0,23,0,55)
)

我的解决方案

这就完成了工作，对于大型eeg数据集来说，性能也不错

eeg2=eeg;
eeg2$stim=NA;

lrow=1;
for(i in 1:nrow(behav)){
  behav_marker = behav[i, "marker"];

  for(j in lrow:nrow(eeg)){
    eeg_marker = eeg[j, "marker"];
    if(eeg_marker == behav_marker){
      eeg2[j,'stim'] = behav[i,'stim'];
      lrow = j+1;
      break;
    }
  }
}

问题:

使用dplyr或base R函数可以更优雅地改进我的解决方案吗？

如果问题只来自带零的行，但其余的行以完全相同的顺序相同，您可以通过将stim列定义为仅零来解决问题，然后，使用对应的behav值为marker填充非零值行：

为了完整起见，我已经提供了一个基本解决方案，下面是我使用dplyr的方法：

使用dplyr:：left_join合并eeg和behav，然后使用dplyr:：mutate将NAs填充为0：

不过，在这个特定的例子中，我建议使用magrittr的管道%>%，这会增加一些开销，但会使代码更短，流动性更好：

eeg2 <- dplyr::left_join(eeg, behav, by = c("marker")) %>% 
  dplyr::mutate(stim = dplyr::if_else(is.na(stim), 0, stim))

谢谢你，凯丝。你让事情变得如此简单，好几天来一直在敲我的头：-

eeg2 <- dplyr::left_join(eeg, behav, by = c("marker"))

eeg2 <- dplyr::mutate(eeg2, stim = dplyr::if_else(is.na(stim), 0, stim))

   marker stim
1       0    0
2       0    0
3       1  168
4       1   78
5       0    0
6       0    0
7       2  168
8       2   78
9       0    0
10      0    0
11      3  168
12      3   78
13      0    0
14      0    0
15      1  168
16      1   78
17      0    0
18      0    0
19      2  168
20      2   78
21      0    0
22      0    0
23      3  168
24      3   78
25      0    0
26      7   23
27      0    0
28     13   55

eeg2 <- dplyr::left_join(eeg, behav, by = c("marker")) %>% 
  dplyr::mutate(stim = dplyr::if_else(is.na(stim), 0, stim))