R 在数据框中有条件地更改分类调查响应列的值
尝试创建将某些类别合并到变量中的对象R 在数据框中有条件地更改分类调查响应列的值,r,object,R,Object,尝试创建将某些类别合并到变量中的对象 background <- NULL data$y11[data$y11 == "English/Welsh/Scottish/Northern Irish/British"] <-"White" data$y11[data$y11 == "Gypsy or Irish Traveller"] <-"White" data$y11[data$y11 == "Any other White background, please desc
background <- NULL
data$y11[data$y11 == "English/Welsh/Scottish/Northern Irish/British"] <-"White"
data$y11[data$y11 == "Gypsy or Irish Traveller"] <-"White"
data$y11[data$y11 == "Any other White background, please describe"] <-"White"
data$y11[data$y11 == "Irish"] <-"White"
data$y11[data$y11 == "Any other Mixed/Multiple ethnic background, please describe"] <-"Mixed"
data$y11[data$y11 == "White and Asian "] <-"Mixed"
data$y11[data$y11 == "White and Black African "] <-"Mixed"
data$y11[data$y11 == "White and Black Caribbean"] <-"Mixed"
data$y11[data$y11 == "Any other Asian background, please describe"] <-"Asian"
data$y11[data$y11 == "Bangladeshi"] <-"Asian"
data$y11[data$y11 == "Chinese"] <-"Asian"
data$y11[data$y11 == "Indian"] <-"Asian"
data$y11[data$y11 == "Pakistani"] <-"Asian"
data$y11[data$y11 == "Arab"] <-"Arab & Other"
data$y11[data$y11 == "Any other ethnic group, please describ"] <-"Arab & Other"
data$y11[data$y11 == "African"] <-"Black"
data$y11[data$y11 == "Any other Black/African/Caribbean background, please describe"] <-"Black"
data$y11[data$y11 == "Caribbean"] <-"Black"
background这意味着您的变量就是因素。您可以通过以下两种方式之一解决此问题:
使用以下命令将所有因子更改为字符:
data$y11您的主要问题是在读取数据时没有使用stringsAsFactors=FALSE
(可能使用read.csv
)。因此,您应该将其添加到read.csv
调用中
还有一个更好的方法来做你正在做的事情。一种方法是创建一个从一个类别到另一个类别的“查找”或“翻译”表,然后从基本R使用merge
,或从“tidyverse”使用left\u join
,自动为您进行替换,而无需所有这些条件赋值
我们将制作翻译表:
data.frame(
answer = c(
"African", "Any other Asian background, please describe",
"Any other Black/African/Caribbean background, please describe",
"Any other ethnic group, please describ",
"Any other Mixed/Multiple ethnic background, please describe",
"Any other White background, please describe", "Arab", "Bangladeshi",
"Caribbean", "Chinese", "English/Welsh/Scottish/Northern Irish/British",
"Gypsy or Irish Traveller", "Indian", "Irish", "Pakistani", "White and Asian ",
"White and Black African ", "White and Black Caribbean"
),
subst = c(
"Black", "Asian", "Black", "Arab & Other", "Mixed", "White",
"Arab & Other", "Asian", "Black", "Asian", "White", "White", "Asian",
"White", "Asian", "Mixed", "Mixed", "Mixed"
),
stringsAsFactors = FALSE
) -> trans_tbl
现在我们将模拟一些数据(我使用dat
vsdata
作为变量名,因为使用data
最终会让您感到痛苦,因为它是一个R函数名):
您的数据框有多个列,但您没有显示给我们,所以我只使用y11
创建了一个单列数据框。现在,我们只需调用merge
:
dat <- merge(dat, trans_tbl, by.x="y11", by.y="answer", all.x=TRUE)
str(dat)
## 'data.frame': 100 obs. of 2 variables:
## $ y11 : chr "African" "African" "African" "African" ...
## $ subst: chr "Black" "Black" "Black" "Black" ...
我们还可以使用“tidyverse”中的dplyr
:
我们将转换后的值作为字符向量,而不是因子。欢迎使用!拥有您的数据、数据的一部分以及所需的输出可能会很有帮助。您绝对不想创建带有重新编码的新变量吗?因为基本上,错误消息告诉您的是,您有一个具有定义属性集的标称(也称为分类)变量,而现在您正试图创建一个具有不同属性集的新变量。下面没有提到的两个选项是使用forcats包或使用散列。是的,我想用重新编码创建一个新变量,理想情况下,现在我没有得到错误,但我仍然在新变量中得到我不想要的旧类别。然而,重新编码的变量肯定是一个因素。你可以做的第一个案例更多的是一个字符变量。非常感谢!我不再犯错误了。但是现在,当我使用table函数查看变量时,它与变量中我不想要的其他类别一起出现。我希望它只被重新编码为“白色”“混合”“亚洲”“阿拉伯和其他”和“黑色”,这有意义吗??很抱歉,如果在r:)它确实很明显是新的,请尝试以下:表(数据$y11[数据$y11%以%c表示(“白色”、“黑色”、“混合”、“亚洲”、“阿拉伯和其他”)]
。这应该限制表中只包含这些选项的频率。如果我误解了,请告诉我。我明白了,但我想要的是重新编码变量,使其只包含这些类别。因为我要在图形和计算中使用它,我需要它只包含这些类别?我不明白为什么其他类别会出现,因为我“我用代码覆盖了它们??非常感谢,如果变量仍然是一个因子,那么旧值仍然是可用的因子级别,这就是它们可能出现的原因。这很难提供帮助,因为我们没有数据样本,因此无法实际复制您所说的任何内容,但有没有办法在新的v中只包含新值但是可变?我不想让旧的出现。这很奇怪,因为只有一些旧的出现,即使我试图覆盖它们以适应新的类别?
set.seed(2018-11-30)
data.frame(
y11 = sample(trans_tbl$answer, 100, replace = TRUE),
stringsAsFactors = FALSE
) -> dat
str(dat)
## 'data.frame': 100 obs. of 1 variable:
## $ y11: chr "Caribbean" "Chinese" "Indian" "Any other Black/African/Caribbean background, please describe" ...
dat <- merge(dat, trans_tbl, by.x="y11", by.y="answer", all.x=TRUE)
str(dat)
## 'data.frame': 100 obs. of 2 variables:
## $ y11 : chr "African" "African" "African" "African" ...
## $ subst: chr "Black" "Black" "Black" "Black" ...
dat$y11 <- dat$subst
dat$subst <- NULL
str(dat)
## 'data.frame': 100 obs. of 1 variable:
## $ y11: chr "Black" "Black" "Black" "Black" ...
library(tidyverse)
set.seed(2018-11-30)
data_frame( # this is the `data_frame()` function from dplyr, NOT `data.frame()` from base R
y11 = sample(trans_tbl$answer, 100, replace = TRUE)
) -> dat
left_join(dat, trans_tbl, by = c("y11"="answer")) %>%
select(y11 = subst)
## # A tibble: 100 x 1
## y11
## <chr>
## 1 Black
## 2 Asian
## 3 Asian
## 4 Black
## 5 Asian
## 6 Mixed
## 7 Arab & Other
## 8 Asian
## 9 Arab & Other
## 10 Asian
## # ... with 90 more rows
possible_answers <- c(
"African", "Any other Asian background, please describe",
"Any other Black/African/Caribbean background, please describe",
"Any other ethnic group, please describ",
"Any other Mixed/Multiple ethnic background, please describe",
"Any other White background, please describe", "Arab", "Bangladeshi",
"Caribbean", "Chinese", "English/Welsh/Scottish/Northern Irish/British",
"Gypsy or Irish Traveller", "Indian", "Irish", "Pakistani", "White and Asian ",
"White and Black African ", "White and Black Caribbean"
)
what_they_should_be <- c(
"Black", "Asian", "Black", "Arab & Other", "Mixed", "White",
"Arab & Other", "Asian", "Black", "Asian", "White", "White", "Asian",
"White", "Asian", "Mixed", "Mixed", "Mixed"
)
set.seed(2018-11-30)
data.frame(
y11 = sample(possible_answers, 100, replace = TRUE)
) -> dat
dat$y11 <- as.character(factor(
x = dat$y11,
levels = possible_answers,
labels = what_they_should_be
))
str(dat)
## 'data.frame': 100 obs. of 1 variable:
## $ y11: chr "Black" "Asian" "Asian" "Black" ...