R有条件地替换有序因子列中的值,而不丢失级别或其他属性

R有条件地替换有序因子列中的值,而不丢失级别或其他属性,r,R,背景我正在处理从Qualtrics导出的大量大型调查数据集。每个数据集都有西班牙语和英语的重复问题。参与者回答的调查问题子集取决于他们对调查中lang问题的回答。西班牙语和英语问题的答案记录在数据框的不同列中。西班牙语答案的列名具有后缀\u sp。请参见下面的示例数据框 df <- structure(list(id = c(1,2,3,4,5,6,7,8,9,10), lang = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),

背景我正在处理从Qualtrics导出的大量大型调查数据集。每个数据集都有西班牙语和英语的重复问题。参与者回答的调查问题子集取决于他们对调查中
lang
问题的回答。西班牙语和英语问题的答案记录在数据框的不同列中。西班牙语答案的列名具有后缀
\u sp
。请参见下面的示例数据框

df <- structure(list(id = c(1,2,3,4,5,6,7,8,9,10), lang = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("English / Inglés", "Spanish / Español"), class = c("ordered", "factor")), mob_1 = structure(c(5L, 2L, 6L, 1L, 6L, 8L, 8L, 8L, 8L, 8L), .Label = c("Strongly agree", "Agree", "Somewhat agree", "Neither agree nor disagree", "Somewhat disagree", "Disagree", "Strongly disagree", NA), class = c("ordered", "factor")), mob_2 = structure(c(2L, 3L, 2L, 3L, 5L, 6L, 6L, 6L, 6L, 6L), .Label = c("A lot worse", "A little worse", "The same", "A little better", "A lot better", NA), class = c("ordered", "factor")), mob_1_sp = structure(c(8L, 8L, 8L, 8L, 8L, 5L, 2L, 6L, 1L, 6L), .Label = c("Totalmente de acuerdo", "De acuerdo", "Algo de acuerdo", "Ni de acuerdo ni en desacuerdo", "Algo en desacuerdo", "En desacuerdo", "Totalmente en desacuerdo", NA), class = c("ordered", "factor")), mob_2_sp = structure(c(6L, 6L, 6L, 6L, 6L, 2L, 3L, 2L, 3L, 5L), .Label = c("Mucho peor", "Un poco peor", "Igual", "Un poco mejor", "Mucho mejor", NA), class = c("ordered", "factor"))), row.names = c(NA, -10L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))

# A tibble: 10 x 6
      id lang              mob_1             mob_2          mob_1_sp              mob_2_sp    
   <dbl> <ord>             <ord>             <ord>          <ord>                 <ord>       
 1     1 English / Inglés  Somewhat disagree A little worse NA                    NA          
 2     2 English / Inglés  Agree             The same       NA                    NA          
 3     3 English / Inglés  Disagree          A little worse NA                    NA          
 4     4 English / Inglés  Strongly agree    The same       NA                    NA          
 5     5 English / Inglés  Disagree          A lot better   NA                    NA          
 6     6 Spanish / Español NA                NA             Algo en desacuerdo    Un poco peor
 7     7 Spanish / Español NA                NA             De acuerdo            Igual       
 8     8 Spanish / Español NA                NA             En desacuerdo         Un poco peor
 9     9 Spanish / Español NA                NA             Totalmente de acuerdo Igual       
10    10 Spanish / Español NA                NA             En desacuerdo         Mucho mejor 

> str(df)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    10 obs. of  6 variables:
 $ id      : num  1 2 3 4 5 6 7 8 9 10
 $ lang    : Ord.factor w/ 2 levels "English / Inglés"<..: 1 1 1 1 1 2 2 2 2 2
 $ mob_1   : Ord.factor w/ 8 levels "Strongly agree"<..: 5 2 6 1 6 8 8 8 8 8
 $ mob_2   : Ord.factor w/ 6 levels "A lot worse"<..: 2 3 2 3 5 6 6 6 6 6
 $ mob_1_sp: Ord.factor w/ 8 levels "Totalmente de acuerdo"<..: 8 8 8 8 8 5 2 6 1 6
 $ mob_2_sp: Ord.factor w/ 6 levels "Mucho peor"<"Un poco peor"<..: 6 6 6 6 6 2 3 2 3 5
我认为级别映射的问题在于,在调查响应中有8个级别,但在我的可复制数据框架中只有4个唯一值


如果您能帮我指出哪里出了问题,如果有什么方法可以在不影响列属性的情况下将西班牙语列值插入到英语列中,我将不胜感激

如何在不更改有序因子水平的情况下,从
mob\u 1\u sp
中添加值?由于
mob_1_sp
中的级别最初并不在
mob_1
中出现。@Ronaksha
mob_1_sp
中的级别与
mob_1
中的级别相同,只是西班牙语和英语中的级别不同。我认为R不知道这些级别是相同的,唯一的区别是西班牙语和英语。@Ronaksha同意,但是,由于英文和西班牙文列之间因子的有序值在原始数据集中是相同的(例如,
mob_1==1L
的标签是
“强烈同意”
,而
mob_1_sp==1L
的标签是
“acuerdo总量”
(西班牙语翻译为“强烈同意”)我是否可以只将
mob_1_sp
的数值插入
mob_1
而不删除
mob_1
的级别/标签?
for (i in colnames(df)) {
  if(grepl("_sp", i)) {
    eng_var <- gsub("_sp","",i) #get name of english variable equivalent
    levels(df[[i]]) <- levels(df[[eng_var]]) #assign english levels to spanish variable 
    df[[eng_var]] = as.ordered(ifelse(df$lang=="Spanish / Español",as.numeric(df[[i]]),df[[eng_var]])) #conditionally replace values of english variable
    levels(df[[eng_var]]) <-  levels(df[[i]]) #re-assign english levels from spanish variable
  }
}

> df
# A tibble: 10 x 6
      id lang              mob_1                      mob_2          mob_1_sp          mob_2_sp      
   <dbl> <ord>             <ord>                      <ord>          <ord>             <ord>         
 1     1 English / Inglés  Somewhat agree             A lot worse    NA                NA            
 2     2 English / Inglés  Agree                      A little worse NA                NA            
 3     3 English / Inglés  Neither agree nor disagree A lot worse    NA                NA            
 4     4 English / Inglés  Strongly agree             A little worse NA                NA            
 5     5 English / Inglés  Neither agree nor disagree The same       NA                NA            
 6     6 Spanish / Español Somewhat agree             A lot worse    Somewhat disagree A little worse
 7     7 Spanish / Español Agree                      A little worse Agree             The same      
 8     8 Spanish / Español Neither agree nor disagree A lot worse    Disagree          A little worse
 9     9 Spanish / Español Strongly agree             A little worse Strongly agree    The same      
10    10 Spanish / Español Neither agree nor disagree The same       Disagree          A lot better