从同一个表中查找dplyr_R_Dplyr

从同一个表中查找dplyr

从同一个表中查找dplyr,r,dplyr,R,Dplyr,我有来自多个团队的团队成员相互评价的数据。每个人都有自己的id号，但也有团队和团队中的评分员号，如下所示： StudyID TeamID CATMERater Rated Rating (int) (int) (int) (dbl) (dbl) 1 2930 551 1 1 5.000000 #How rater 1 rated 1 (themselves) 2 2938 551 2 1

我有来自多个团队的团队成员相互评价的数据。每个人都有自己的id号，但也有团队和团队中的评分员号，如下所示：

  StudyID TeamID CATMERater Rated   Rating
    (int)  (int)      (int) (dbl)    (dbl)
1    2930    551          1     1 5.000000 #How rater 1 rated 1 (themselves)
2    2938    551          2     1 3.800000 #How rater 2 rated 1
3    2939    551          3     1 5.000000 #How rater 3 rated 1
4    2930    551          1     2 3.666667 #How rater 1 rated 2
5    2938    551          2     2 4.000000 #...
6    2939    551          3     2 3.866667
...

等等。我使用

tidyr

获得了这种格式，并试图获得StudyID的一个新列，其中团队ID和被评分的人是相同的。这是我尝试过的，但不起作用，因为我不确定如何引用同一个表：

edges %>% mutate(RatedStudyID = filter(edges, TeamID == TeamID & Rated == CATMERater))

希望这是有道理的，但我希望有人建议我朝着正确的方向前进。如果它是带有

left\u join

的内容，我该如何说

TeamID==TeamID

以下是我希望在最后看到的内容（主要是最后一篇专栏）：

根据@akron得出的dput结果给出了一个错误：

structure(list(StudyID = c(2930L, 2938L, 2939L, 2930L, 2938L, 
2939L, 2930L, 2938L, 2939L, 2930L, 2938L, 2939L, 2930L, 2938L, 
2939L, 2930L, 2938L, 2939L, 2920L, 2941L, 2989L, 2920L, 2941L, 
2989L, 2920L, 2941L, 2989L, 2920L, 2941L, 2989L, 2920L, 2941L, 
2989L, 2920L, 2941L, 2989L, 2922L, 2924L, 2943L, 2922L, 2924L, 
2943L, 2922L, 2924L, 2943L, 2922L, 2924L, 2943L, 2922L, 2924L
), TeamID = c(551L, 551L, 551L, 551L, 551L, 551L, 551L, 551L, 
551L, 551L, 551L, 551L, 551L, 551L, 551L, 551L, 551L, 551L, 552L, 
552L, 552L, 552L, 552L, 552L, 552L, 552L, 552L, 552L, 552L, 552L, 
552L, 552L, 552L, 552L, 552L, 552L, 553L, 553L, 553L, 553L, 553L, 
553L, 553L, 553L, 553L, 553L, 553L, 553L, 553L, 553L), CATMERater = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 
2L, 1L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L), Rated = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 
6, 6, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 1, 
1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5), Rating = c(5, 3.8, 5, 
3.66666666666667, 4, 3.86666666666667, 4.53333333333333, 4, 4.8, 
NaN, NaN, NaN, NaN, NaN, NaN, NA, NA, NA, 3.93333333333333, 5, 
5, 5, 5, 5, 5, 5, 5, NaN, NaN, NaN, NaN, NaN, NaN, NA, NA, NA, 
4, 4, 4, 4, 4, 4, 4, 3.86666666666667, 4, NaN, NaN, NaN, NaN, 
NaN)), .Names = c("StudyID", "TeamID", "CATMERater", "Rated", 
"Rating"), class = c("tbl_df", "data.frame"), row.names = c(NA, 
-50L))

带有

数据。表
library(data.table)
setDT(edges)[ , RatedStudyID := StudyID[CATMERater == Rated] , .(Rated, TeamID)]
edges
#   StudyID TeamID CATMERater Rated   Rating RatedStudyID
#1:    2930    551          1     1 5.000000         2930
#2:    2938    551          2     1 3.800000         2930
#3:    2939    551          3     1 5.000000         2930
#4:    2930    551          1     2 3.666667         2938
#5:    2938    551          2     2 4.000000         2938
#6:    2939    551          3     2 3.866667         2938

在新的数据集中，有一些组的CATMERater没有任何类似的值，并且在同一行中进行了评级。因此，我们可以使用一个异常来返回NA
setDT(df1)[, RatedStudyID :=if(!any(CATMERater==Rated)) NA_integer_
             else StudyID[CATMERater ==Rated], .(Rated, TeamID)]

我想你可以通过连接来解决这个问题
edges %>%
  select(TeamID, Rated = CATMERater, RaterStudyID = StudyID) %>%
  inner_join(edges, by = c("TeamID", "Rated"))

从评论中：
library(dplyr)
x %>%
   group_by(Rated, TeamID) %>% #group by each team/rated individual
   filter(any(CATMERater == Rated)) %>% #filter out any groups with unrated individuals
   mutate(new = StudyID[CATMERater == Rated]) #make the new column

新列是通过对每个组进行子集设置而创建的-它与整个数据帧上的x$StudyID[x$CATMERater==x$Rated]相同。只要我们有一个地方是正确的（即自评），该值就会设置为该组的每个成员。
查看共享样本数据的更好方法，以使其更易于帮助您。您能输出数据帧吗？edges%>%groupby（Rated，TeamID）%%>%mutate（new=StudyID[CATMERater==Rated]）
？@jeremycg我尝试了这个方法，但它给出了一个错误错误：不兼容的大小（0），预期为3（组大小）或1，是否可能有多行的每个额定和团队ID都具有相同的值
library(dplyr)
x %>%
   group_by(Rated, TeamID) %>% #group by each team/rated individual
   filter(any(CATMERater == Rated)) %>% #filter out any groups with unrated individuals
   mutate(new = StudyID[CATMERater == Rated]) #make the new column