在R中使用复杂规则从一个变量解析数据_R_Parsing

在R中使用复杂规则从一个变量解析数据

r parsing

在R中使用复杂规则从一个变量解析数据,r,parsing,R,Parsing,我正在从另一个源将数据导入R（即，我无法轻松更改即将到来的格式/值）变量中包括一个或多个以下可能值：母亲（生母、养母、继母等）父亲（生父、养父、继父等）祖父母（生物学、寄养、step等） 18岁以上的兄弟 18岁以上的姐妹其他成年人（阿姨、叔叔等）所有数据都位于同一“单元”内，因此可能的数据如下所示：样本输入数据帧（df） TIA嘿，欢迎来到堆栈溢出！这里有一些关于如何询问有关堆栈溢出的更好问题的链接，这样人们就可以很容易地帮助您（继续）关于你的问题，我做了一些假设并

我正在从另一个源将数据导入R（即，我无法轻松更改即将到来的格式/值）

变量中包括一个或多个以下可能值：

母亲（生母、养母、继母等）
父亲（生父、养父、继父等）
祖父母（生物学、寄养、step等）
18岁以上的兄弟
18岁以上的姐妹
其他成年人（阿姨、叔叔等）

所有数据都位于同一“单元”内，因此可能的数据如下所示：

样本输入数据帧（df）

TIA

嘿，欢迎来到堆栈溢出！这里有一些关于如何询问有关堆栈溢出的更好问题的链接，这样人们就可以很容易地帮助您（继续）

关于你的问题，我做了一些假设并试图解决它。正如毛里蒂斯所提到的，你需要提供一个可复制的例子，以便有人能给出一个具体的答案，在那之前，这是我能想到的最好的答案

library(tidyr)
library(dplyr)
# create nested lists with names of mothers and fathers for two ppl
mother <- list(list("bio_1","step_1","foster_1"), list("bio_2", "stp_2", "foster_2"))
father <- list(list("bio_1", "foster_1", "other_1"), list("bio_2", "stp_2", "foster_2"))

# convert to data frame
test_object <- data_frame(person = c(1,2),mother,father)

# print 
test_object

# A tibble: 2 x 3
  person mother     father    
   <dbl> <list>     <list>    
1      1 <list [3]> <list [3]>
2      2 <list [3]> <list [3]>

# first unnest the lists and get to the inner list
# then convert from wide to long form data
# do another unnnest to get the actual data in the long format
test_object %>%
  unnest(.) %>%
    gather(data = ., key = relationship, value = name, -person) %>%
      unnest() -> test_object

    test_object
# A tibble: 12 x 3
   person relationship name    
    <dbl> <chr>        <chr>   
 1      1 mother       bio_1   
 2      1 mother       step_1  
 3      1 mother       foster_1
 4      2 mother       bio_2   
 5      2 mother       stp_2   
 6      2 mother       foster_2
 7      1 father       bio_1   
 8      1 father       foster_1
 9      1 father       other_1 
10      2 father       bio_2   
11      2 father       stp_2   
12      2 father       foster_2

library（tidyr）
图书馆（dplyr）
#为两个ppl创建包含母亲和父亲姓名的嵌套列表
母亲%
unnest（）->测试对象
测试对象
#一个tibble:12x3
个人关系名称
1母亲的简历1
2 1母亲步骤1
3 1母亲寄养1
4.2母亲的简历2
5.2母stp_2
6.2母亲寄养2
7 1比奥神父1
8.1福斯特神父1
9.1其他父亲1
10 2比奥神父2
11.2父亲stp_2
12 2福斯特神父2

以下是指向和的链接，其中包含许多包和函数，可以解决大多数数据处理/争论问题。

欢迎使用堆栈溢出！这里有一些关于如何询问有关堆栈溢出的更好问题的链接，这样人们就可以很容易地帮助您（继续）

library(tidyr)
library(dplyr)
# create nested lists with names of mothers and fathers for two ppl
mother <- list(list("bio_1","step_1","foster_1"), list("bio_2", "stp_2", "foster_2"))
father <- list(list("bio_1", "foster_1", "other_1"), list("bio_2", "stp_2", "foster_2"))

# convert to data frame
test_object <- data_frame(person = c(1,2),mother,father)

# print 
test_object

# A tibble: 2 x 3
  person mother     father    
   <dbl> <list>     <list>    
1      1 <list [3]> <list [3]>
2      2 <list [3]> <list [3]>

# first unnest the lists and get to the inner list
# then convert from wide to long form data
# do another unnnest to get the actual data in the long format
test_object %>%
  unnest(.) %>%
    gather(data = ., key = relationship, value = name, -person) %>%
      unnest() -> test_object

    test_object
# A tibble: 12 x 3
   person relationship name    
    <dbl> <chr>        <chr>   
 1      1 mother       bio_1   
 2      1 mother       step_1  
 3      1 mother       foster_1
 4      2 mother       bio_2   
 5      2 mother       stp_2   
 6      2 mother       foster_2
 7      1 father       bio_1   
 8      1 father       foster_1
 9      1 father       other_1 
10      2 father       bio_2   
11      2 father       stp_2   
12      2 father       foster_2

library（tidyr）
图书馆（dplyr）
#为两个ppl创建包含母亲和父亲姓名的嵌套列表
母亲%
unnest（）->测试对象
测试对象
#一个tibble:12x3
个人关系名称
1母亲的简历1
2 1母亲步骤1
3 1母亲寄养1
4.2母亲的简历2
5.2母stp_2
6.2母亲寄养2
7 1比奥神父1
8.1福斯特神父1
9.1其他父亲1
10 2比奥神父2
11.2父亲stp_2
12 2福斯特神父2

以下是指向和的链接，其中包含了许多软件包和函数，可以解决大多数数据处理/争论问题。

这里有一个

tidyverse

选项，您可以开始使用它

library(tidyverse)
rel <- list("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")
names(rel) <- unlist(rel)
bind_cols(df[, 1, drop = F], map(rel, ~+str_detect(tolower(df[, 2]), tolower(.x))))
#  row Mother Father Brother Sister Grandparent Other adult
#1   1      1      1       1      1           1           1
#2   2      0      0       0      0           0           0
#3   3      1      0       0      1           0           0
#4   4      1      1       0      0           0           0

库（tidyverse）
rel这里有一个tidyverse
选项，可以让您开始使用
library(tidyverse)
rel <- list("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")
names(rel) <- unlist(rel)
bind_cols(df[, 1, drop = F], map(rel, ~+str_detect(tolower(df[, 2]), tolower(.x))))
#  row Mother Father Brother Sister Grandparent Other adult
#1   1      1      1       1      1           1           1
#2   2      0      0       0      0           0           0
#3   3      1      0       0      1           0           0
#4   4      1      1       0      0           0           0

库（tidyverse）
rel试试这个：
rel<-list("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")

for(i in 1:6){
  df$i<-if_else(grepl(rel[[i]],df$lives.with.whom),1,0)
  colnames(df)[i+2]<-rel[[i]]
}

rel试试这个：
rel<-list("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")

for(i in 1:6){
  df$i<-if_else(grepl(rel[[i]],df$lives.with.whom),1,0)
  colnames(df)[i+2]<-rel[[i]]
}

rel这很容易做到，最好使用特定的示例数据进行演示。请在您的帖子中加入代表和可复制粘贴的样本数据。2小时后没有跟进，没有更新意味着我投了反对票（如果您编辑/修改您的问题，将删除）。通常，我们希望您在发布问题后留下来回答任何问题/评论。我可以看到，在过去的两个小时里，你一直在反复检查，所以如果你想得到帮助，请花一些时间在你的帖子中添加关键信息。谢谢你的指导。我对它进行了编辑，试图让它更清晰。更好（删除否决票）；我在下面添加了一个解决方案，可以让您开始；请看一看，太好了tidyverse
的工作非常出色（在Ubuntu和学习曲线中安装了像libcurl4
这样的依赖项之后）。我将进一步滥用我的noob状态，衷心感谢您和@Suhas对Stack Overflow的帮助和热烈欢迎。这很容易做到，最好使用特定的示例数据进行演示。请在您的帖子中加入代表和可复制粘贴的样本数据。2小时后没有跟进，没有更新意味着我投了反对票（如果您编辑/修改您的问题，将删除）。通常，我们希望您在发布问题后留下来回答任何问题/评论。我可以看到，在过去的两个小时里，你一直在反复检查，所以如果你想得到帮助，请花一些时间在你的帖子中添加关键信息。谢谢你的指导。我对它进行了编辑，试图让它更清晰。更好（删除否决票）；我在下面添加了一个解决方案，可以让您开始；请看一看，太好了tidyverse
的工作非常出色（在Ubuntu和学习曲线中安装了像libcurl4
这样的依赖项之后）。我将进一步滥用我的noob状态，衷心感谢您和@Suhas对Stack Overflow的帮助和热烈欢迎。非常好的指导和欢迎。非常感谢。关于子集的信息非常有用——这是一个很好的补充，尽管我的问题不清楚。非常好的指导和欢迎。非常感谢。关于子集的信息非常有用——这是一个很好的补充，尽管我的问题不清楚。