使用向量索引R中的data.frame_R_Dplyr_Purrr

使用向量索引R中的data.frame

使用向量索引R中的data.frame,r,dplyr,purrr,R,Dplyr,Purrr,我有一个data.frame，其中包含一个ID号和一次调查的缩放响应： df(responses) ID X1 X2 X3 X4 A1 1 1 2 1 B2 0 1 3 0 C3 3 3 2 0 我还有一个data.frame用作键： df(key) X Y Z 2 1 1 3 2 2 4 3 4 我试图编写一个脚本，计算每个参

我有一个data.frame，其中包含一个ID号和一次调查的缩放响应：

df(responses)

ID    X1    X2    X3    X4
A1    1     1     2     1
B2    0     1     3     0
C3    3     3     2     0

我还有一个data.frame用作键：

df(key)

X    Y    Z
2    1    1
3    2    2
4    3    4

我试图编写一个脚本，计算每个参与者的

、

和

分数，其中

分数是键中

下所列问题的回答总和

e、 g.参与者

A1

的

分数等于

A1

行

（1+2+1=4）

中

X2

、

X3

和

X4

的总和

所需输出为：

df(output)

ID    X    Y    Z
A1    4    4    3
B2    4    4    1
C3    5    8    6

但是，我目前正努力使用

键中的值为data.frame响应
编制索引。我目前的状态是：
#store scale names
scales <- c(colnames(key))
#loop over every participant
for (i in responses$ID){
    #create temporary data.frame with only participant "i"s responses
    data <- subset(responses, ID == i)
    #loop over each scale and store the relevant response numbers
    for (s in scales){
        relevantResponses <- scales[c(s)]
        #create a temporary storage for the total of each scale
        runningScore <- 0
        #index each response and add it to the total
        for (r in relevantResponses){
             runningScore <- runningScore + data[1,r]
  

有没有比嵌套循环更好的索引方法？
我们可以使用行和
循环，在键
数据列上使用lappy
，根据索引提取“响应”数字列，获取行和
将列表
转换为数据框
和cbind
，第一列为“响应”
cbind(responses[1], data.frame(lapply(key, 
     function(x) rowSums(responses[-1][, na.omit(x)], na.rm = TRUE))))

-输出
#  ID X Y Z
#1 A1 4 4 3
#2 B2 4 4 1
#3 C3 5 8 6

#  ID X Y Z
#1 A1 4 4 3
#2 B2 4 4 1
#3 C3 5 8 6


或使用tidyverse

imap(key, ~ responses %>%
     transmute(ID, !!.y :=  rowSums(select(cur_data()[-1], na.omit(.x)),
          na.rm = TRUE))) %>% 
     reduce(inner_join)

-输出
#  ID X Y Z
#1 A1 4 4 3
#2 B2 4 4 1
#3 C3 5 8 6

#  ID X Y Z
#1 A1 4 4 3
#2 B2 4 4 1
#3 C3 5 8 6


或者另一个选项是mutate
与交叉

key %>%
   mutate(across(everything(), 
       ~ rowSums(responses[-1][na.omit(.)], na.rm = TRUE)), 
          ID = responses$ID, .before = 1)
#  ID X Y Z
#1 A1 4 4 3
#2 B2 4 4 1
#3 C3 5 8 6

数据
responses这里是处理这个问题的另一种方法。我只是想用我最喜欢的解决方案来挑战自己，但这并不像亲爱的@akrun所建议的那样简洁和精彩。这是教我如何使用purr
函数族的人：
library(dplyr)
library(purrr)

responses %>% 
  select(X1:X4) %>% 
  pmap_dfr(., ~ map_dfc(1:length(key), function(x) sum(c(...)[key[, x]]))) %>%
  bind_cols(responses$ID) %>%
  set_names(c("x", "y", "z", "ID")) %>% 
  relocate(ID)

  ID        x     y     z
  <chr> <int> <int> <int>
1 A1        4     4     3
2 B2        4     4     1
3 C3        5     8     6

使用减少：
map_dfc(key, ~ responses[-1][.x] %>% reduce(`+`))

# A tibble: 3 x 3
      X     Y     Z
  <int> <int> <int>
1     4     4     3
2     4     4     1
3     5     8     6

map\u-dfc（键，~responses[-1][.x]>%reduce（`+`））
#一个tibble:3x3
X Y Z
1     4     4     3
2     4     4     1
3     5     8     6
My“key”data.frame中有NA（某些量表使用更多响应）。我认为这会导致“未定义列”错误。有办法解决这个问题吗？@Deanw1更新应该会有帮助。我认为行和方法更直接，或者你可以使用reduce
和+
一起使用，即map\u dfc（key，~responses[-1][.x]>%reduce（
+）
你一如既往地非常正确。现在我在练习我的reduce功能技能时，让我试试这种方法吧，哈哈。
map_dfc(key, ~ responses[-1][.x] %>% reduce(`+`))

# A tibble: 3 x 3
      X     Y     Z
  <int> <int> <int>
1     4     4     3
2     4     4     1
3     5     8     6