R 如何轻松地将表合并到其自身的唯一元素？_R_Merge

R 如何轻松地将表合并到其自身的唯一元素？

r merge

R 如何轻松地将表合并到其自身的唯一元素？,r,merge,R,Merge,我有这个数据框： df <- data.frame(group=rep(1:3,each=3), question=c("1.1.1. question 1","1.1.1.1. question1 with conditional","2.2.2.2. question2 with condtional", "2.2.2. question2","1.1.1.10. question 1 with conditional","3.3.3. questio

我有这个数据框：

df <- data.frame(group=rep(1:3,each=3),
                 question=c("1.1.1. question 1","1.1.1.1. question1 with conditional","2.2.2.2. question2 with condtional", "2.2.2. question2","1.1.1.10. question 1 with conditional","3.3.3. question 3","3.3.3.2. question 3 with conditional","2.2.2.1. question 2 with conditional","3.3.3. Descirbe section 2.8"),
                 answer=c("yes","no","text","no","text","hello","yes","text","yes"),
                 parent_question=c("1.1.1. question 1","1.1.1. question 1","2.2.2. question2","2.2.2. question2","1.1.1. question 1","3.3.3. question 3","3.3.3. question 3","2.2.2. question2","3.3.3. Descirbe section 2.8"),
                 answer_parent=c("yes","yes","","","","","yes","","yes"))

df我们可以使用stru extract
从字符串开头提取数字模式1.1.1
或1.1.1.1
，创建逻辑“标志”，按主要问题“1.1.1”、“2.2.2”等（“grp1”）分组，通过提取“标志”为真的“问题/答案”，创建“p_q”、“p_a”。如果它们都为FALSE，则返回空白（“”
）
库（dplyr）
图书馆（stringr）
超出%
突变（grp1=STRU提取（问题，“^（[0-9]+\\）{2}[0-9]+”，
grp2=str\U删除（str\U摘录（问题“^[0-9.]+”，“\\.$”，
标志=grp1==grp2）%>%
分组依据（grp1）%>%
变异（parent_questionNew=问题[哪个（标志）[1]]）%>%
分组依据（分组，添加=真）%>%
mutate（answer_parentNew=if（any（flag））answer[which（flag&answer==“yes”）[1]]
否则替换（如.character（answer），answer！=“yes”，”“）%>%
解组
超出%>%
选择（匹配（'parent'））
#一个tibble:9x4
#家长\问题答案\家长家长\问题新答案\家长新
#                                                           
#1 1.1.1. 问题1“是”1.1.1。问题1“是”
#2 1.1.1. 问题1“是”1.1.1。问题1“是”
#3 2.2.2. 问题2“2.2.2。问题2“
#4 2.2.2. 问题2“2.2.2。问题2
#5 1.1.1. 问题1“1.1.1。问题1“
#6 3.3.3. 问题3“3.3.3。问题3
#7 3.3.3. 问题3“是”3.3.3。问题3“是”
#8 2.2.2. 问题2“2.2.2。问题2“
#9 3.3.3. 说明第2.8节“是”3.3.3。问题3“是”
您是否需要将2.2.2.1
和2.2.2.2
分组到2.2.2
您是否需要df%>%groupby（group，grp=str\u extract（question），“^（[0-9]+\）{2}[0-9]+”）%%>%排列（group，question）%%>%突变（parent\u question=first（question），parent\u-ans，2.2.2将是2.2.2.1和2.2.2.2的父级答案如果您可以使用该示例的预期列进行更新，将更容易交叉检查。有些问题类似于问题“4.4.4.描述2.7节中捕获的每个内容”，因此我不确定如何生成父级问题和父级答案字段。为什么p_q和p_a NA？@Mel你在使用更新的代码，因为我发现你更新的数据集末尾有1.1.1.
点，不匹配，所以在“grp2”中使用str_remove
。“p_q”和“p_a”是家长问题和家长_answer@Mel我更新了。您可以查看parent\u questionNew
和answer\u parentNew
的输出。在您的家长问题中，最后一个条目似乎不同。是打字错误吗？@Mel可能是您也加载了plyr
，它掩盖了dplyr
变异尝试dplyr:：mutate（parent\u questionNew=question[which（flag）[1]]）
@Mel您是说您没有得到我为parent\u questionNew
显示的输出吗。这可能是功能掩蔽的问题。因此，请为每个mutate
指定dplyr:：
作为前缀，以避免调用plyr:：mutate
library(dplyr)
library(stringr)
out <- df %>%
   mutate(grp1 = str_extract(question, "^([0-9]+\\.){2}[0-9]+"),
         grp2 = str_remove(str_extract(question, "^[0-9.]+"), "\\.$"), 
          flag = grp1 == grp2) %>%
   group_by(grp1) %>%
   mutate(parent_questionNew = question[which(flag)[1]]) %>%
   group_by(group, add = TRUE) %>%
   mutate(answer_parentNew = if(any(flag)) answer[which(flag & answer == "yes")[1]] 
         else replace(as.character(answer), answer != "yes", "")) %>%
   ungroup
out %>% 
 select(matches('parent'))
# A tibble: 9 x 4
#  parent_question             answer_parent parent_questionNew answer_parentNew
#  <fct>                       <fct>         <fct>              <chr>           
#1 1.1.1. question 1           "yes"         1.1.1. question 1  "yes"           
#2 1.1.1. question 1           "yes"         1.1.1. question 1  "yes"           
#3 2.2.2. question2            ""            2.2.2. question2   ""              
#4 2.2.2. question2            ""            2.2.2. question2    <NA>           
#5 1.1.1. question 1           ""            1.1.1. question 1  ""              
#6 3.3.3. question 3           ""            3.3.3. question 3   <NA>           
#7 3.3.3. question 3           "yes"         3.3.3. question 3  "yes"           
#8 2.2.2. question2            ""            2.2.2. question2   ""              
#9 3.3.3. Descirbe section 2.8 "yes"         3.3.3. question 3  "yes"