将数据帧中的数据值重新编码为R中的组合值
我试图比较婚姻状况,我的变量有“已婚”、“未结婚”、“订婚”、“单身”和“未结婚”。我如何使这些数据只读作“已婚”和“未结婚”?(已订婚,视为已婚,单身,不视为未结婚) 样本数据集将数据帧中的数据值重新编码为R中的组合值,r,dataframe,R,Dataframe,我试图比较婚姻状况,我的变量有“已婚”、“未结婚”、“订婚”、“单身”和“未结婚”。我如何使这些数据只读作“已婚”和“未结婚”?(已订婚,视为已婚,单身,不视为未结婚) 样本数据集 data.frame(mstatus = sample(x = c("married", "not married", "engaged",
data.frame(mstatus = sample(x = c("married",
"not married",
"engaged",
"single",
"not married"),
size = 15, replace = TRUE))
这就是我目前所拥有的
df2 <- df%>%mutate(
mstatus = (tolower(mstatus))
)
df2%变异(
mstatus=(tolower(mstatus))
)
如果我们需要对“mstatus”重新编码,一个选项是forcats
library(dplyr)
library(forcats)
df2 %>%
mutate(mstatus = fct_recode(mstatus, married = "engaged",
`not married` = "single"))
# mstatus
#1 married
#2 not married
#3 married
#4 not married
#5 not married
或者,如果有许多值需要更改,请使用fct\u collapse
,它可以获取值向量
df2 %>%
mutate(mstatus = fct_collapse(mstatus, married = c('engaged'),
`not married` = c("single")))
数据
df2您可以使用dplyr
中的mutate()
df%dplyr::mutate(mstatus=case_)(
mstatus==“已婚”| mstatus==“订婚”~“已婚”,
mstatus==“未结婚”| mstatus==“单身”~“未结婚”
))
我想最简单的方法是使用ifelse
语句:
df2$mstatus_new <- ifelse(df2$mstatus=="engaged"|df2$mstatus=="married", "married", "not married")
df <- df %>% dplyr::mutate(mstatus = case_when(
mstatus == "married" | mstatus == "engaged" ~ "married",
mstatus == "not married" | mstatus == "single" ~ "not married"
))
df2$mstatus_new <- ifelse(df2$mstatus=="engaged"|df2$mstatus=="married", "married", "not married")
df2 <- data.frame(
mstatus = c("married", "not married", "engaged", "single", "nota married"))
df2
mstatus
1 married
2 not married
3 engaged
4 single
5 nota married
df2
mstatus mstatus_new
1 married married
2 not married not married
3 engaged married
4 single not married
5 nota married not married