R 在数据框中有条件地更改分类调查响应列的值

R 在数据框中有条件地更改分类调查响应列的值,r,object,R,Object,尝试创建将某些类别合并到变量中的对象 background <- NULL data$y11[data$y11 == "English/Welsh/Scottish/Northern Irish/British"] <-"White" data$y11[data$y11 == "Gypsy or Irish Traveller"] <-"White" data$y11[data$y11 == "Any other White background, please desc

尝试创建将某些类别合并到变量中的对象

background <- NULL

data$y11[data$y11 == "English/Welsh/Scottish/Northern Irish/British"] <-"White"

data$y11[data$y11 == "Gypsy or Irish Traveller"] <-"White"

data$y11[data$y11 == "Any other White background, please describe"] <-"White"

data$y11[data$y11 == "Irish"] <-"White"

data$y11[data$y11 == "Any other Mixed/Multiple ethnic background, please describe"] <-"Mixed"

data$y11[data$y11 == "White and Asian "] <-"Mixed"

data$y11[data$y11 == "White and Black African "] <-"Mixed"

data$y11[data$y11 == "White and Black Caribbean"] <-"Mixed"

data$y11[data$y11 == "Any other Asian background, please describe"] <-"Asian"

data$y11[data$y11 == "Bangladeshi"] <-"Asian"

data$y11[data$y11 == "Chinese"] <-"Asian"

data$y11[data$y11 == "Indian"] <-"Asian"

data$y11[data$y11 == "Pakistani"] <-"Asian"

data$y11[data$y11 == "Arab"] <-"Arab & Other"

data$y11[data$y11 == "Any other ethnic group, please describ"] <-"Arab & Other"

data$y11[data$y11 == "African"] <-"Black"

data$y11[data$y11 == "Any other Black/African/Caribbean background, please describe"] <-"Black"

data$y11[data$y11 == "Caribbean"] <-"Black"

background这意味着您的变量就是因素。您可以通过以下两种方式之一解决此问题:

  • 使用以下命令将所有因子更改为字符:


    data$y11您的主要问题是在读取数据时没有使用
    stringsAsFactors=FALSE
    (可能使用
    read.csv
    )。因此,您应该将其添加到
    read.csv
    调用中

    还有一个更好的方法来做你正在做的事情。一种方法是创建一个从一个类别到另一个类别的“查找”或“翻译”表,然后从基本R使用
    merge
    ,或从“tidyverse”使用
    left\u join
    ,自动为您进行替换,而无需所有这些条件赋值

    我们将制作翻译表:

    data.frame(
      answer = c(
        "African", "Any other Asian background, please describe",
        "Any other Black/African/Caribbean background, please describe",
        "Any other ethnic group, please describ",
        "Any other Mixed/Multiple ethnic background, please describe",
        "Any other White background, please describe", "Arab", "Bangladeshi",
        "Caribbean", "Chinese", "English/Welsh/Scottish/Northern Irish/British",
        "Gypsy or Irish Traveller", "Indian", "Irish", "Pakistani", "White and Asian ",
        "White and Black African ", "White and Black Caribbean"
      ),
      subst = c(
        "Black", "Asian", "Black", "Arab & Other", "Mixed", "White",
        "Arab & Other", "Asian", "Black", "Asian", "White", "White", "Asian",
        "White", "Asian", "Mixed", "Mixed", "Mixed"
      ),
      stringsAsFactors = FALSE
    ) -> trans_tbl
    
    现在我们将模拟一些数据(我使用
    dat
    vs
    data
    作为变量名,因为使用
    data
    最终会让您感到痛苦,因为它是一个R函数名):

    您的数据框有多个列,但您没有显示给我们,所以我只使用
    y11
    创建了一个单列数据框。现在,我们只需调用
    merge

    dat <- merge(dat, trans_tbl, by.x="y11", by.y="answer", all.x=TRUE)
    
    str(dat)
    ## 'data.frame':    100 obs. of  2 variables:
    ##  $ y11  : chr  "African" "African" "African" "African" ...
    ##  $ subst: chr  "Black" "Black" "Black" "Black" ...
    
    我们还可以使用“tidyverse”中的
    dplyr


    我们将转换后的值作为字符向量,而不是因子。

    欢迎使用!拥有您的数据、数据的一部分以及所需的输出可能会很有帮助。您绝对不想创建带有重新编码的新变量吗?因为基本上,错误消息告诉您的是,您有一个具有定义属性集的标称(也称为分类)变量,而现在您正试图创建一个具有不同属性集的新变量。下面没有提到的两个选项是使用forcats包或使用散列。是的,我想用重新编码创建一个新变量,理想情况下,现在我没有得到错误,但我仍然在新变量中得到我不想要的旧类别。然而,重新编码的变量肯定是一个因素。你可以做的第一个案例更多的是一个字符变量。非常感谢!我不再犯错误了。但是现在,当我使用table函数查看变量时,它与变量中我不想要的其他类别一起出现。我希望它只被重新编码为“白色”“混合”“亚洲”“阿拉伯和其他”和“黑色”,这有意义吗??很抱歉,如果在r:)它确实很明显是新的,请尝试以下:
    表(数据$y11[数据$y11%以%c表示(“白色”、“黑色”、“混合”、“亚洲”、“阿拉伯和其他”)]
    。这应该限制表中只包含这些选项的频率。如果我误解了,请告诉我。我明白了,但我想要的是重新编码变量,使其只包含这些类别。因为我要在图形和计算中使用它,我需要它只包含这些类别?我不明白为什么其他类别会出现,因为我“我用代码覆盖了它们??非常感谢,如果变量仍然是一个因子,那么旧值仍然是可用的因子级别,这就是它们可能出现的原因。这很难提供帮助,因为我们没有数据样本,因此无法实际复制您所说的任何内容,但有没有办法在新的v中只包含新值但是可变?我不想让旧的出现。这很奇怪,因为只有一些旧的出现,即使我试图覆盖它们以适应新的类别?
    set.seed(2018-11-30)
    data.frame(
      y11 = sample(trans_tbl$answer, 100, replace = TRUE),
      stringsAsFactors = FALSE
    ) -> dat
    
    str(dat)
    ## 'data.frame':    100 obs. of  1 variable:
    ##  $ y11: chr  "Caribbean" "Chinese" "Indian" "Any other Black/African/Caribbean background, please describe" ...
    
    dat <- merge(dat, trans_tbl, by.x="y11", by.y="answer", all.x=TRUE)
    
    str(dat)
    ## 'data.frame':    100 obs. of  2 variables:
    ##  $ y11  : chr  "African" "African" "African" "African" ...
    ##  $ subst: chr  "Black" "Black" "Black" "Black" ...
    
    dat$y11 <- dat$subst
    dat$subst <- NULL
    
    str(dat)
    ## 'data.frame':    100 obs. of  1 variable:
    ##  $ y11: chr  "Black" "Black" "Black" "Black" ...
    
    library(tidyverse)
    
    set.seed(2018-11-30)
    data_frame( # this is the `data_frame()` function from dplyr, NOT `data.frame()` from base R
      y11 = sample(trans_tbl$answer, 100, replace = TRUE)
    ) -> dat
    
    left_join(dat, trans_tbl, by = c("y11"="answer")) %>%
      select(y11 = subst)
    ## # A tibble: 100 x 1
    ##    y11         
    ##    <chr>       
    ##  1 Black       
    ##  2 Asian       
    ##  3 Asian       
    ##  4 Black       
    ##  5 Asian       
    ##  6 Mixed       
    ##  7 Arab & Other
    ##  8 Asian       
    ##  9 Arab & Other
    ## 10 Asian       
    ## # ... with 90 more rows
    
    possible_answers <- c(
      "African", "Any other Asian background, please describe",
      "Any other Black/African/Caribbean background, please describe",
      "Any other ethnic group, please describ",
      "Any other Mixed/Multiple ethnic background, please describe",
      "Any other White background, please describe", "Arab", "Bangladeshi",
      "Caribbean", "Chinese", "English/Welsh/Scottish/Northern Irish/British",
      "Gypsy or Irish Traveller", "Indian", "Irish", "Pakistani", "White and Asian ",
      "White and Black African ", "White and Black Caribbean"
    )
    
    what_they_should_be <- c(
      "Black", "Asian", "Black", "Arab & Other", "Mixed", "White",
      "Arab & Other", "Asian", "Black", "Asian", "White", "White", "Asian",
      "White", "Asian", "Mixed", "Mixed", "Mixed"
    )
    
    set.seed(2018-11-30)
    data.frame(
      y11 = sample(possible_answers, 100, replace = TRUE)
    ) -> dat
    
    dat$y11 <- as.character(factor(
      x = dat$y11,
      levels = possible_answers,
      labels = what_they_should_be
    ))
    
    str(dat)
    ## 'data.frame':    100 obs. of  1 variable:
    ##  $ y11: chr  "Black" "Asian" "Asian" "Black" ...