将数据帧中的字与R中的字符串匹配

将数据帧中的字与R中的字符串匹配,r,match,R,Match,我有一个回忆任务的数据框架,参与者从他们之前学习的列表中尽可能多地回忆单词。这是一个数据模型。每行为主题,每列(w1-w5)为单词: df <- data.frame(subject = 1:5, w1 = c("screen", "toad", "toad", "witch", "toad"), w2 = c("package", "tuna", "tuna", "postage", "dinosaur"), w3 = c("tuna", "postage", "toas

我有一个回忆任务的数据框架,参与者从他们之前学习的列表中尽可能多地回忆单词。这是一个数据模型。每行为主题,每列(w1-w5)为单词:

df <- data.frame(subject = 1:5,
  w1 = c("screen", "toad", "toad", "witch", "toad"), 
  w2 = c("package", "tuna", "tuna", "postage", "dinosaur"), 
  w3 = c("tuna", "postage", "toast", "athlete", "ranch"), 
  w4 = c("toad", "witch", "tuna", "package", "NA"), 
  w5 = c("windwo", "mermaid", "NA", "NA", "NA")
)
我想将生成的每个单词(w1-w5列)与正确单词的列表相匹配,这些单词是:

words <- c("screen", "package", "tuna", "toad", "window", 
  "postage", "witch", "mermaid", "toast", "dinosaur")
受试者1将得到四分,因为他们拼错了一个单词

受试者2得5分

受试者3会得到3分,因为他们重复了金枪鱼并且漏掉了一个单词

受试者4会得到三分,因为他们有一个不正确的单词和一个遗漏的单词

受试者5会得到两分,因为他们有一个不正确的单词和两个遗漏的单词

data.frame(subject = df$subject
           , nCorrect = apply(df[, -1], 1, function(x) sum(unique(x) %in% words)))

#   subject nCorrect
# 1       1        4
# 2       2        5
# 3       3        3
# 4       4        3
# 5       5        2
带有
数据。表
(相同结果)


另一个选项是以长格式转换数据。按
主题分组
使用
dplyr::summary
查找正确数量的匹配答案

library(tidyverse)

words <- c("screen", "package", "tuna", "toad", "window", 
           "postage", "witch", "mermaid", "toast", "dinosaur")

df %>% gather(key, value, -subject) %>%
  group_by(subject) %>%
  summarise(nCorrect = sum(unique(value) %in% words))
# # A tibble: 5 x 2
#   subject nCorrect
#    <int>    <int>
# 1       1        4
# 2       2        5
# 3       3        3
# 4       4        3
# 5       5        2
库(tidyverse)
单词%gather(关键字、值、主题)%%>%
分组依据(受试者)%>%
总结(nCorrect=总和(唯一值(以%字表示)%)
##tibble:5 x 2
#主题不正确
#        
# 1       1        4
# 2       2        5
# 3       3        3
# 4       4        3
# 5       5        2
data.frame(subject = df$subject
           , nCorrect = apply(df[, -1], 1, function(x) sum(unique(x) %in% words)))

#   subject nCorrect
# 1       1        4
# 2       2        5
# 3       3        3
# 4       4        3
# 5       5        2
setDT(df)

df[, sum(unique(unlist(.SD)) %in% words), by = subject]
library(tidyverse)

words <- c("screen", "package", "tuna", "toad", "window", 
           "postage", "witch", "mermaid", "toast", "dinosaur")

df %>% gather(key, value, -subject) %>%
  group_by(subject) %>%
  summarise(nCorrect = sum(unique(value) %in% words))
# # A tibble: 5 x 2
#   subject nCorrect
#    <int>    <int>
# 1       1        4
# 2       2        5
# 3       3        3
# 4       4        3
# 5       5        2