如何为R中的多个值指定唯一因子?

如何为R中的多个值指定唯一因子?,r,R,假设我有一组从0到20的数字 我想创建一个3个不同年龄组 0~9岁、10~15岁、16~20岁 如何为一组从0到20的数字分配3个因子 与它们的特定价值相对应 例如,0到9之间的值将被指定为“0~9岁”因子 10到15岁将被指定为“10~15岁”因子,以此类推 在R中如何实现这一点?当函数起作用时,case\u。请尝试以下操作: library(tidyverse) df <- tibble(age = 1:20) df %>% mutate(age_categories

假设我有一组从0到20的数字 我想创建一个3个不同年龄组 0~9岁、10~15岁、16~20岁

如何为一组从0到20的数字分配3个因子 与它们的特定价值相对应

例如,0到9之间的值将被指定为“0~9岁”因子 10到15岁将被指定为“10~15岁”因子,以此类推


在R中如何实现这一点?

当函数起作用时,
case\u。请尝试以下操作:

library(tidyverse)

df <- tibble(age = 1:20)

df %>% 
  mutate(age_categories = case_when(age <= 9 ~ "0~9 years old",
                                    age <= 15 & age > 9 ~ "10~15 years old",
                                    age <= 20 & age > 15 ~ "16~20 years old",
                                    TRUE ~ "Other"))
df$age_categories <- factor(df$age)

levels(df$age_categories) <- list(
  "0~9 years old" = 1:9,
  "10~15 years old" = 10:15,
  "16~20 years old" = 16:20
)
库(tidyverse)
df%

当(年龄使用
base::cut
(R)/
pandas.cut
(Python)时,使用mutate(年龄>类别=大小写)


df到目前为止您做了什么,为什么添加python标记?
df$age_categories <- factor(df$age)

levels(df$age_categories) <- list(
  "0~9 years old" = 1:9,
  "10~15 years old" = 10:15,
  "16~20 years old" = 16:20
)
df <- data.frame(age = 0:20)
labels = sprintf("from %s yrs old", c("0~9","10~15","16~20")
df$groups <- cut(
  df$age, 
  breaks=c(0,9,15,20), 
  include.lowest = T, 
  labels = labels)
)
df$groups
# [1] from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old  
# [7] from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 10~15 yrs old from 10~15 yrs old
# [13] from 10~15 yrs old from 10~15 yrs old from 10~15 yrs old from 10~15 yrs old from 16~20 yrs old from 16~20 yrs old
# [19] from 16~20 yrs old from 16~20 yrs old from 16~20 yrs old
# Levels: from 0~9 yrs old from 10~15 yrs old from 16~20 yrs old
import pandas as pd
df = pd.DataFrame({'age':range(20)})
labels = ['from %s yrs old' % x for x in ['0~9','10~15','16~20']]
df.groups = pd.cut(
  df.age,
  bins = [0,9,15,20],
  include_lowest=True, labels = labels)
df.groups
#0       from 0~9 yrs old
#1       from 0~9 yrs old
#2       from 0~9 yrs old
#3       from 0~9 yrs old
#4       from 0~9 yrs old
#5       from 0~9 yrs old
#6       from 0~9 yrs old
#7       from 0~9 yrs old
#8       from 0~9 yrs old
#9       from 0~9 yrs old
#10    from 10~15 yrs old
#11    from 10~15 yrs old
#12    from 10~15 yrs old
#13    from 10~15 yrs old
#14    from 10~15 yrs old
#15    from 10~15 yrs old
#16    from 16~20 yrs old
#17    from 16~20 yrs old
#18    from 16~20 yrs old
#19    from 16~20 yrs old
#Name: age, dtype: category
#Categories (3, object): [from 0~9 yrs old < from 10~15 yrs old < from 16~20 yrs old]