R 基于来自另一个数据帧的一组规则在数据帧中创建一组变量

R 基于来自另一个数据帧的一组规则在数据帧中创建一组变量,r,R,这个问题听起来有点笼统,但我认为一个例子会更清楚: 我有以下两个数据帧 data1 group1 group2 group3 Level cat cat dog 1 dog parrot cat 1 mouse dolphin dolphin 1 red blue blue 2 green ye

这个问题听起来有点笼统,但我认为一个例子会更清楚:

我有以下两个数据帧

data1

group1     group2       group3     Level 
cat         cat          dog        1
dog         parrot       cat        1
mouse       dolphin      dolphin    1
red         blue         blue       2
green       yellow       green      2
black       purple       cat        2
数据2

var1        level    Score
cat           1        1
dog           1        1
mouse         1        1
dolphin       1        0
parrot        1        1
red           2        1
blue          2        1
green         2        1
purple        2        1
cat           2        0
black         2        0
yellow        2        1
我想根据“级别”(级别是一个因子)的级别,用我在数据2中的“分数”列中找到的值修改包含3个新列(每个group1、group2和group3一列)的数据1。所以基本上我想得到这样的东西:

group1     group2       group3     Level      var1     var2     var3
cat         cat          dog        1          1        1        1
dog         parrot       cat        1          1        1        1
mouse       dolphin      dolphin    1          1        0        0
red         blue         blue       2          1        1        1
green       yellow       green      2          1        1        1
black       purple       cat        2          0        1        0

样本数据

df1 <- structure(list(
  group1 = c("cat", "dog", "mouse", "red", "green", "black"),
  group2 = c("cat", "parrot", "dolphin", "blue", "yellow", "purple"),
  group3 = c("dog", "cat", "dolphin", "blue", "green", "cat"),
  Level = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor")),
  row.names = c(NA, -6L), class = "data.frame")

df2 <- structure(list(
  var1 = c("cat", "dog", "mouse", "dolphin", "parrot", "red", "blue", "green", "purple", "cat", "black", "yellow"),
  level = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"),
  Score = c(1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L)),
  row.names = c(NA, -12L), class = "data.frame")

df1我们可以将第一个数据集转换为“长”格式,与第二个数据集连接,然后将其转换回“宽”格式

library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
    mutate(rn = row_number()) %>%
    pivot_longer(cols  = -c(rn, Level), values_to = 'var1') %>% 
    rename(level = Level) %>% 
    left_join(df2) %>% 
    mutate(name = str_replace(name, 'group', 'varn')) %>% 
    na.omit %>%
    select(-level, -var1) %>% 
    pivot_wider(names_from = name, values_from = Score, values_fill = list(Score = 0)) %>% 
    select(-rn) %>% 
    bind_cols(df1, .)
#   group1  group2  group3 Level varn1 varn2 varn3
#1    cat     cat     dog     1     1     1     1
#2    dog  parrot     cat     1     1     0     1
#3  mouse dolphin dolphin     1     1     0     0
#4    red    blue    blue     2     1     1     1
#5  green  yellow   green     2     1     0     1
#6  black  purple     cat     2     0     1     0

我通过
purrr::reduce()
递归地将
df2
合并到
df1
三次。在这一部分中,我复制了
df2
三次,并更改了它们的第一列名称,以分别与
df1
中的名称匹配

library(tidyverse)

df2 %>%
  list %>% rep(3) %>%
  imap(~ setNames(.x, c(str_c("group", .y), "Level", str_c("Score", .y)))) %>%
  reduce(left_join, .init = df1)

#   group1  group2  group3 Level Score1 Score2 Score3
# 1    cat     cat     dog     1      1      1      1
# 2    dog  parrot     cat     1      1      1      1
# 3  mouse dolphin dolphin     1      1      0      0
# 4    red    blue    blue     2      1      1      1
# 5  green  yellow   green     2      1      1      1
# 6  black  purple     cat     2      0      1      0

@阿克伦:说得好,我改变了it@Darren蔡:应该,我无意中写的differently@DarrenTsai:另一个错误,我在数据2中包含了黄色:基本上,数据2是从数据1+手动完成的可变分数创建的:输出显示了返回表1格式的意图,该格式将可变分数与每个组相匹配