R中的分裂因子_R_Categorical Data

R中的分裂因子

R中的分裂因子,r,categorical-data,R,Categorical Data,我有一个因子，其值的形式为单身（有子女），已婚（无子女），单身（无子女），等等。我想将其分为两个因子，一个是婚姻状况的多值因子，另一个是儿童的二值因子如何在R中执行此操作？一些示例数据 df <- data.frame(status=c("Domestic partners (w/children)", "Married (no children)", "Single (no children)")) 从字符串中获取子状态 df$ch <- ifelse(grepl("no

我有一个因子，其值的形式为

单身（有子女）

，

已婚（无子女）

，

单身（无子女）

，等等。我想将其分为两个因子，一个是婚姻状况的多值因子，另一个是儿童的二值因子

如何在R中执行此操作？

一些示例数据

df <- data.frame(status=c("Domestic partners (w/children)", "Married (no
  children)", "Single (no children)"))

从字符串中获取子状态

df$ch <- ifelse(grepl("no children" , df$status) , 0 , 1)

grepl

查找字符串“no children”并返回TRUE或FALSE

grepl("no children" , df$status)

我们用一个ifelse来区分

编辑

重新注释：向数据中添加一些空字符串（“”）[注：通常最好将这些字符串作为缺失字符串（NA）。您可以在读取数据时执行此操作，即在

read.table

中使用

NA.strings

参数（NA.strings=c（NA）（“”））

或者，如果您设法将空字符串设置为“缺少”

df$ch[is.na(df$status)] <- NA

df$ch[is.na（df$status）]在字符串中婚姻状况总是排在第一位，孩子排在第二位吗？@user20650:是的，有时像家庭伴侣这样的两个词
；有时只是一个空值（”
）我想将其视为NA
。你是在问如何在R中拆分字符串吗？你能提供一个可复制的示例吗？在这两列中，我都有“值，我想将其视为缺失。有没有办法修改上述内容？@raxacoricofallapatorius；如果你能用示例数据编辑你的问题（就像我在示例中所做的那样：df$status）我会尝试更新我想我会把它作为一个单独的问题来问，这样我就可以接受这个答案。（这有点像是在你已经回答了这个问题之后再把它链接起来。）
sapply(s , "[" , 1)

grepl("no children" , df$status)

    df <- data.frame(status=c("Domestic partners (w/children)", "Married 
   (no children)", "Single (no children)",""))

df$ch[df$status==""] <- NA 

df$ch[is.na(df$status)] <- NA 

#                          status           married ch
# 1 Domestic partners (w/children) Domestic partners  1
# 2          Married (no children)           Married  0
# 3           Single (no children)            Single  0
# 4                                             <NA> NA