R 基于数据子集创建新的分类变量

R 基于数据子集创建新的分类变量,r,data-manipulation,data-cleaning,dummy-variable,R,Data Manipulation,Data Cleaning,Dummy Variable,我有一个如下所示的数据帧: cnt bnk qst ans 1 Country 1 Bank 1 q1 1 2 Country 2 Bank 2 q1 1 3 Country 3 Bank 3 q1 3 4 Country 4 Bank 4 q1 1 5 Country 1 Bank 1 q2 1 6 Country 2 Bank 2 q2 2 7 Country 3 Bank 3 q2 3 8 Country 4

我有一个如下所示的数据帧:

         cnt    bnk qst ans
1  Country 1 Bank 1  q1   1
2  Country 2 Bank 2  q1   1
3  Country 3 Bank 3  q1   3
4  Country 4 Bank 4  q1   1
5  Country 1 Bank 1  q2   1
6  Country 2 Bank 2  q2   2
7  Country 3 Bank 3  q2   3
8  Country 4 Bank 4  q2   4
9  Country 1 Bank 1  q3   1
10 Country 2 Bank 2  q3   1
11 Country 3 Bank 3  q3   2
12 Country 4 Bank 4  q3   1
         cnt    bnk qst ans   dummy
1  Country 1 Bank 1  q1   1  Public
2  Country 2 Bank 2  q1   1 Private
3  Country 3 Bank 3  q1   3   Mixed
4  Country 4 Bank 4  q1   1  Other'
5  Country 1 Bank 1  q2   1  Public
6  Country 2 Bank 2  q2   2 Private
7  Country 3 Bank 3  q2   3   Mixed
8  Country 4 Bank 4  q2   4  Other'
9  Country 1 Bank 1  q3   1  Public
10 Country 2 Bank 2  q3   1 Private
11 Country 3 Bank 3  q3   2   Mixed
12 Country 4 Bank 4  q3   1  Other'
请参考,
q
代表“问题”。因此,
q2
是“问题2”。类似地,
ans
是响应

现在,我想根据
q2
中的响应创建一个分类变量。我特别想指定以下类别:

  • 公开的
  • 私人的
  • 混合的
  • 其他
  • 因此,如果
    ans=1
    qst=q2
    ,这是“公共的”;如果
    ans=2
    qst=q2
    这是“私有的”,等等。因此,这之后的数据帧应该如下所示:

             cnt    bnk qst ans
    1  Country 1 Bank 1  q1   1
    2  Country 2 Bank 2  q1   1
    3  Country 3 Bank 3  q1   3
    4  Country 4 Bank 4  q1   1
    5  Country 1 Bank 1  q2   1
    6  Country 2 Bank 2  q2   2
    7  Country 3 Bank 3  q2   3
    8  Country 4 Bank 4  q2   4
    9  Country 1 Bank 1  q3   1
    10 Country 2 Bank 2  q3   1
    11 Country 3 Bank 3  q3   2
    12 Country 4 Bank 4  q3   1
    
             cnt    bnk qst ans   dummy
    1  Country 1 Bank 1  q1   1  Public
    2  Country 2 Bank 2  q1   1 Private
    3  Country 3 Bank 3  q1   3   Mixed
    4  Country 4 Bank 4  q1   1  Other'
    5  Country 1 Bank 1  q2   1  Public
    6  Country 2 Bank 2  q2   2 Private
    7  Country 3 Bank 3  q2   3   Mixed
    8  Country 4 Bank 4  q2   4  Other'
    9  Country 1 Bank 1  q3   1  Public
    10 Country 2 Bank 2  q3   1 Private
    11 Country 3 Bank 3  q3   2   Mixed
    12 Country 4 Bank 4  q3   1  Other'
    
    我试图使用ifelse,但我没能做到我想做的。有人能给我一些建议吗

    数据

    dput(df)
    structure(list(cnt = c("Country 1", "Country 2", "Country 3", 
    "Country 4", "Country 1", "Country 2", "Country 3", "Country 4", 
    "Country 1", "Country 2", "Country 3", "Country 4"), bnk = c("Bank 1", 
    "Bank 2", "Bank 3", "Bank 4", "Bank 1", "Bank 2", "Bank 3", "Bank 4", 
    "Bank 1", "Bank 2", "Bank 3", "Bank 4"), qst = structure(c(1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("q1", 
    "q2", "q3"), class = "factor"), ans = c(1L, 1L, 3L, 1L, 1L, 2L, 
    3L, 4L, 1L, 1L, 2L, 1L), dummy = c(NA, NA, NA, NA, "Public", 
    "Private", "Mixed", "Other", NA, NA, NA, NA)), .Names = c("cnt", 
    "bnk", "qst", "ans", "dummy"), row.names = c("1", "2", "3", "4", 
    "5", "6", "7", "8", "9", "10", "11", "12"), class = "data.frame")
    

    以下内容将为所有其他Qs设置
    NA

    df$dummy <- ifelse(df$ans == 1 & df$qst == 'q2', 'Public', 
                   ifelse(df$ans == 2 & df$qst == 'q2', 'Private', 
                       ifelse(df$ans == 3 & df$qst == 'q2', 'Mixed', 
                            ifelse(df$ans == 4 & df$qst == 'q2', 'Other', NA))))
    
    #         cnt    bnk qst ans   dummy
    #1  Country 1 Bank 1  q1   1    <NA>
    #2  Country 2 Bank 2  q1   1    <NA>
    #3  Country 3 Bank 3  q1   3    <NA>
    #4  Country 4 Bank 4  q1   1    <NA>
    #5  Country 1 Bank 1  q2   1  Public
    #6  Country 2 Bank 2  q2   2 Private
    #7  Country 3 Bank 3  q2   3   Mixed
    #8  Country 4 Bank 4  q2   4   Other
    #9  Country 1 Bank 1  q3   1    <NA>
    #10 Country 2 Bank 2  q3   1    <NA>
    #11 Country 3 Bank 3  q3   2    <NA>
    #12 Country 4 Bank 4  q3   1    <NA>
    

    df$dummy对于名为df的data.frame,下面类似的内容将起作用。如果没有数据,很难进行测试:

    # construct dummy variable in subset data.frame
    dfCountryQ2 <- df[df$qst=="q2", c("cnt", "ans")]
    dfCountryQ2$dummy <- factor(dfCountryQ2$ans, levels=1:4,
                                labels=c("Public", "Private", "Mixed", "Other"))
    
    # merge on by country
    df <- merge(df, dfCountryQ2[, c("cnt", "dummy")], by="cnt")
    
    #在subset data.frame中构造虚拟变量
    
    dfCountryQ2第二个条目是
    Private
    但是
    ans=1
    是,因为我不关心
    q1
    ans
    ,而是关于
    q2