R 如何轻松地将表合并到其自身的唯一元素?

R 如何轻松地将表合并到其自身的唯一元素?,r,merge,R,Merge,我有这个数据框: df <- data.frame(group=rep(1:3,each=3), question=c("1.1.1. question 1","1.1.1.1. question1 with conditional","2.2.2.2. question2 with condtional", "2.2.2. question2","1.1.1.10. question 1 with conditional","3.3.3. questio

我有这个数据框:

df <- data.frame(group=rep(1:3,each=3),
                 question=c("1.1.1. question 1","1.1.1.1. question1 with conditional","2.2.2.2. question2 with condtional", "2.2.2. question2","1.1.1.10. question 1 with conditional","3.3.3. question 3","3.3.3.2. question 3 with conditional","2.2.2.1. question 2 with conditional","3.3.3. Descirbe section 2.8"),
                 answer=c("yes","no","text","no","text","hello","yes","text","yes"),
                 parent_question=c("1.1.1. question 1","1.1.1. question 1","2.2.2. question2","2.2.2. question2","1.1.1. question 1","3.3.3. question 3","3.3.3. question 3","2.2.2. question2","3.3.3. Descirbe section 2.8"),
                 answer_parent=c("yes","yes","","","","","yes","","yes"))

df我们可以使用
stru extract
从字符串开头提取数字模式
1.1.1
1.1.1.1
,创建逻辑“标志”,按主要问题“1.1.1”、“2.2.2”等(“grp1”)分组,通过提取“标志”为真的“问题/答案”,创建“p_q”、“p_a”。如果它们都为FALSE,则返回空白(
“”

库(dplyr)
图书馆(stringr)
超出%
突变(grp1=STRU提取(问题,“^([0-9]+\\){2}[0-9]+”,
grp2=str\U删除(str\U摘录(问题“^[0-9.]+”,“\\.$”,
标志=grp1==grp2)%>%
分组依据(grp1)%>%
变异(parent_questionNew=问题[哪个(标志)[1]])%>%
分组依据(分组,添加=真)%>%
mutate(answer_parentNew=if(any(flag))answer[which(flag&answer==“yes”)[1]]
否则替换(如.character(answer),answer!=“yes”,”“)%>%
解组
超出%>%
选择(匹配('parent'))
#一个tibble:9x4
#家长\问题答案\家长家长\问题新答案\家长新
#                                                           
#1 1.1.1. 问题1“是”1.1.1。问题1“是”
#2 1.1.1. 问题1“是”1.1.1。问题1“是”
#3 2.2.2. 问题2“2.2.2。问题2“
#4 2.2.2. 问题2“2.2.2。问题2
#5 1.1.1. 问题1“1.1.1。问题1“
#6 3.3.3. 问题3“3.3.3。问题3
#7 3.3.3. 问题3“是”3.3.3。问题3“是”
#8 2.2.2. 问题2“2.2.2。问题2“
#9 3.3.3. 说明第2.8节“是”3.3.3。问题3“是”

您是否需要将
2.2.2.1
2.2.2.2
分组到
2.2.2
您是否需要
df%>%groupby(group,grp=str\u extract(question),“^([0-9]+\){2}[0-9]+”)%%>%排列(group,question)%%>%突变(parent\u question=first(question),parent\u-ans,2.2.2将是2.2.2.1和2.2.2.2的父级答案如果您可以使用该示例的预期列进行更新,将更容易交叉检查。有些问题类似于问题“4.4.4.描述2.7节中捕获的每个内容”,因此我不确定如何生成父级问题和父级答案字段。为什么p_q和p_a NA?@Mel你在使用更新的代码,因为我发现你更新的数据集末尾有
1.1.1.
点,不匹配,所以在“grp2”中使用
str_remove
。“p_q”和“p_a”是家长问题和家长_answer@Mel我更新了。您可以查看
parent\u questionNew
answer\u parentNew
的输出。在您的家长问题中,最后一个条目似乎不同。是打字错误吗?@Mel可能是您也加载了
plyr
,它掩盖了
dplyr
变异尝试
dplyr::mutate(parent\u questionNew=question[which(flag)[1]])
@Mel您是说您没有得到我为
parent\u questionNew
显示的输出吗。这可能是功能掩蔽的问题。因此,请为每个
mutate
指定
dplyr::
作为前缀,以避免调用
plyr::mutate
library(dplyr)
library(stringr)
out <- df %>%
   mutate(grp1 = str_extract(question, "^([0-9]+\\.){2}[0-9]+"),
         grp2 = str_remove(str_extract(question, "^[0-9.]+"), "\\.$"), 
          flag = grp1 == grp2) %>%
   group_by(grp1) %>%
   mutate(parent_questionNew = question[which(flag)[1]]) %>%
   group_by(group, add = TRUE) %>%
   mutate(answer_parentNew = if(any(flag)) answer[which(flag & answer == "yes")[1]] 
         else replace(as.character(answer), answer != "yes", "")) %>%
   ungroup
out %>% 
 select(matches('parent'))
# A tibble: 9 x 4
#  parent_question             answer_parent parent_questionNew answer_parentNew
#  <fct>                       <fct>         <fct>              <chr>           
#1 1.1.1. question 1           "yes"         1.1.1. question 1  "yes"           
#2 1.1.1. question 1           "yes"         1.1.1. question 1  "yes"           
#3 2.2.2. question2            ""            2.2.2. question2   ""              
#4 2.2.2. question2            ""            2.2.2. question2    <NA>           
#5 1.1.1. question 1           ""            1.1.1. question 1  ""              
#6 3.3.3. question 3           ""            3.3.3. question 3   <NA>           
#7 3.3.3. question 3           "yes"         3.3.3. question 3  "yes"           
#8 2.2.2. question2            ""            2.2.2. question2   ""              
#9 3.3.3. Descirbe section 2.8 "yes"         3.3.3. question 3  "yes"