R 如何轻松地将表合并到其自身的唯一元素?
我有这个数据框:R 如何轻松地将表合并到其自身的唯一元素?,r,merge,R,Merge,我有这个数据框: df <- data.frame(group=rep(1:3,each=3), question=c("1.1.1. question 1","1.1.1.1. question1 with conditional","2.2.2.2. question2 with condtional", "2.2.2. question2","1.1.1.10. question 1 with conditional","3.3.3. questio
df <- data.frame(group=rep(1:3,each=3),
question=c("1.1.1. question 1","1.1.1.1. question1 with conditional","2.2.2.2. question2 with condtional", "2.2.2. question2","1.1.1.10. question 1 with conditional","3.3.3. question 3","3.3.3.2. question 3 with conditional","2.2.2.1. question 2 with conditional","3.3.3. Descirbe section 2.8"),
answer=c("yes","no","text","no","text","hello","yes","text","yes"),
parent_question=c("1.1.1. question 1","1.1.1. question 1","2.2.2. question2","2.2.2. question2","1.1.1. question 1","3.3.3. question 3","3.3.3. question 3","2.2.2. question2","3.3.3. Descirbe section 2.8"),
answer_parent=c("yes","yes","","","","","yes","","yes"))
df我们可以使用stru extract
从字符串开头提取数字模式1.1.1
或1.1.1.1
,创建逻辑“标志”,按主要问题“1.1.1”、“2.2.2”等(“grp1”)分组,通过提取“标志”为真的“问题/答案”,创建“p_q”、“p_a”。如果它们都为FALSE,则返回空白(“”
)
库(dplyr)
图书馆(stringr)
超出%
突变(grp1=STRU提取(问题,“^([0-9]+\\){2}[0-9]+”,
grp2=str\U删除(str\U摘录(问题“^[0-9.]+”,“\\.$”,
标志=grp1==grp2)%>%
分组依据(grp1)%>%
变异(parent_questionNew=问题[哪个(标志)[1]])%>%
分组依据(分组,添加=真)%>%
mutate(answer_parentNew=if(any(flag))answer[which(flag&answer==“yes”)[1]]
否则替换(如.character(answer),answer!=“yes”,”“)%>%
解组
超出%>%
选择(匹配('parent'))
#一个tibble:9x4
#家长\问题答案\家长家长\问题新答案\家长新
#
#1 1.1.1. 问题1“是”1.1.1。问题1“是”
#2 1.1.1. 问题1“是”1.1.1。问题1“是”
#3 2.2.2. 问题2“2.2.2。问题2“
#4 2.2.2. 问题2“2.2.2。问题2
#5 1.1.1. 问题1“1.1.1。问题1“
#6 3.3.3. 问题3“3.3.3。问题3
#7 3.3.3. 问题3“是”3.3.3。问题3“是”
#8 2.2.2. 问题2“2.2.2。问题2“
#9 3.3.3. 说明第2.8节“是”3.3.3。问题3“是”
您是否需要将2.2.2.1
和2.2.2.2
分组到2.2.2
您是否需要df%>%groupby(group,grp=str\u extract(question),“^([0-9]+\){2}[0-9]+”)%%>%排列(group,question)%%>%突变(parent\u question=first(question),parent\u-ans,2.2.2将是2.2.2.1和2.2.2.2的父级答案如果您可以使用该示例的预期列进行更新,将更容易交叉检查。有些问题类似于问题“4.4.4.描述2.7节中捕获的每个内容”,因此我不确定如何生成父级问题和父级答案字段。为什么p_q和p_a NA?@Mel你在使用更新的代码,因为我发现你更新的数据集末尾有1.1.1.
点,不匹配,所以在“grp2”中使用str_remove
。“p_q”和“p_a”是家长问题和家长_answer@Mel我更新了。您可以查看parent\u questionNew
和answer\u parentNew
的输出。在您的家长问题中,最后一个条目似乎不同。是打字错误吗?@Mel可能是您也加载了plyr
,它掩盖了dplyr
变异尝试dplyr::mutate(parent\u questionNew=question[which(flag)[1]])
@Mel您是说您没有得到我为parent\u questionNew
显示的输出吗。这可能是功能掩蔽的问题。因此,请为每个mutate
指定dplyr::
作为前缀,以避免调用plyr::mutate
library(dplyr)
library(stringr)
out <- df %>%
mutate(grp1 = str_extract(question, "^([0-9]+\\.){2}[0-9]+"),
grp2 = str_remove(str_extract(question, "^[0-9.]+"), "\\.$"),
flag = grp1 == grp2) %>%
group_by(grp1) %>%
mutate(parent_questionNew = question[which(flag)[1]]) %>%
group_by(group, add = TRUE) %>%
mutate(answer_parentNew = if(any(flag)) answer[which(flag & answer == "yes")[1]]
else replace(as.character(answer), answer != "yes", "")) %>%
ungroup
out %>%
select(matches('parent'))
# A tibble: 9 x 4
# parent_question answer_parent parent_questionNew answer_parentNew
# <fct> <fct> <fct> <chr>
#1 1.1.1. question 1 "yes" 1.1.1. question 1 "yes"
#2 1.1.1. question 1 "yes" 1.1.1. question 1 "yes"
#3 2.2.2. question2 "" 2.2.2. question2 ""
#4 2.2.2. question2 "" 2.2.2. question2 <NA>
#5 1.1.1. question 1 "" 1.1.1. question 1 ""
#6 3.3.3. question 3 "" 3.3.3. question 3 <NA>
#7 3.3.3. question 3 "yes" 3.3.3. question 3 "yes"
#8 2.2.2. question2 "" 2.2.2. question2 ""
#9 3.3.3. Descirbe section 2.8 "yes" 3.3.3. question 3 "yes"