将数据帧中的字与R中的字符串匹配_R_Match

将数据帧中的字与R中的字符串匹配

将数据帧中的字与R中的字符串匹配,r,match,R,Match,我有一个回忆任务的数据框架，参与者从他们之前学习的列表中尽可能多地回忆单词。这是一个数据模型。每行为主题，每列（w1-w5）为单词： df <- data.frame(subject = 1:5, w1 = c("screen", "toad", "toad", "witch", "toad"), w2 = c("package", "tuna", "tuna", "postage", "dinosaur"), w3 = c("tuna", "postage", "toas

我有一个回忆任务的数据框架，参与者从他们之前学习的列表中尽可能多地回忆单词。这是一个数据模型。每行为主题，每列（w1-w5）为单词：

df <- data.frame(subject = 1:5,
  w1 = c("screen", "toad", "toad", "witch", "toad"), 
  w2 = c("package", "tuna", "tuna", "postage", "dinosaur"), 
  w3 = c("tuna", "postage", "toast", "athlete", "ranch"), 
  w4 = c("toad", "witch", "tuna", "package", "NA"), 
  w5 = c("windwo", "mermaid", "NA", "NA", "NA")
)

我想将生成的每个单词（w1-w5列）与正确单词的列表相匹配，这些单词是：

words <- c("screen", "package", "tuna", "toad", "window", 
  "postage", "witch", "mermaid", "toast", "dinosaur")

受试者1将得到四分，因为他们拼错了一个单词

受试者2得5分

受试者3会得到3分，因为他们重复了金枪鱼并且漏掉了一个单词

受试者4会得到三分，因为他们有一个不正确的单词和一个遗漏的单词

受试者5会得到两分，因为他们有一个不正确的单词和两个遗漏的单词

data.frame(subject = df$subject
           , nCorrect = apply(df[, -1], 1, function(x) sum(unique(x) %in% words)))

#   subject nCorrect
# 1       1        4
# 2       2        5
# 3       3        3
# 4       4        3
# 5       5        2

带有

数据。表（相同结果）
另一个选项是以长格式转换数据。按主题分组
使用dplyr:：summary
查找正确数量的匹配答案
library(tidyverse)

words <- c("screen", "package", "tuna", "toad", "window", 
           "postage", "witch", "mermaid", "toast", "dinosaur")

df %>% gather(key, value, -subject) %>%
  group_by(subject) %>%
  summarise(nCorrect = sum(unique(value) %in% words))
# # A tibble: 5 x 2
#   subject nCorrect
#    <int>    <int>
# 1       1        4
# 2       2        5
# 3       3        3
# 4       4        3
# 5       5        2

库（tidyverse）
单词%gather（关键字、值、主题）%%>%
分组依据（受试者）%>%
总结（nCorrect=总和（唯一值（以%字表示）%）
##tibble:5 x 2
#主题不正确
#        
# 1       1        4
# 2       2        5
# 3       3        3
# 4       4        3
# 5       5        2

data.frame(subject = df$subject
           , nCorrect = apply(df[, -1], 1, function(x) sum(unique(x) %in% words)))

#   subject nCorrect
# 1       1        4
# 2       2        5
# 3       3        3
# 4       4        3
# 5       5        2

setDT(df)

df[, sum(unique(unlist(.SD)) %in% words), by = subject]

library(tidyverse)

words <- c("screen", "package", "tuna", "toad", "window", 
           "postage", "witch", "mermaid", "toast", "dinosaur")

df %>% gather(key, value, -subject) %>%
  group_by(subject) %>%
  summarise(nCorrect = sum(unique(value) %in% words))
# # A tibble: 5 x 2
#   subject nCorrect
#    <int>    <int>
# 1       1        4
# 2       2        5
# 3       3        3
# 4       4        3
# 5       5        2