Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
创建一组新的变量,这些变量等于dplyr中某个因子的级别_R_Dplyr - Fatal编程技术网

创建一组新的变量,这些变量等于dplyr中某个因子的级别

创建一组新的变量,这些变量等于dplyr中某个因子的级别,r,dplyr,R,Dplyr,我有一个data.frame,它有100列,遵循惯例word和word\u-answer df <- data.frame(apple = "57%", apple_answer = "22%", dog = "82%", dog_answer = "16%") 相应的apple\u-answer列的因子级别为6 > which(levels(df$apple) == "22%")

我有一个data.frame,它有100列,遵循惯例
word
word\u-answer

df <- data.frame(apple = "57%", apple_answer = "22%", dog = "82%", dog_answer = "16%")
相应的
apple\u-answer
列的因子级别为6

> which(levels(df$apple) == "22%")
[1] 6
因此,在这种情况下,距离得分为2-6=-4


如何计算数据集中每个变量的距离分数?

您可以将数据分为两组,word和相应的答案。使用
match
获取它们的位置并从每个值中减去,然后生成新列

answer_cols <- grep('_answer', names(df))
new_cols <- paste0(names(df)[-answer_cols], '_dist')

df[new_cols] <- Map(function(x, y) match(x, levels(x)) - match(y, levels(x)),
                                     df[-answer_cols], df[answer_cols])

df
#  apple apple_answer dog dog_answer apple_dist dog_dist
#1   57%          22% 82%        16%         -4       -6
answer\u cols您也可以使用应用功能,如下所示:

df$apple_dist = apply(df[,1:2], 1, function(x) {
    which(levels(df$apple) == x[1]) - which(levels(df$apple) == x[2])
})

df$dog_dist = apply(df[,3:4], 1, function(x) {
    which(levels(df$dog) == x[1]) - which(levels(df$dog) == x[2])
})

> df
  apple apple_answer dog dog_answer apple_dist dog_dist
1   57%          22% 82%        16%         -4       -6

啊,谢谢你!非常令人印象深刻。我花了太多时间试图弄清楚如何在dplyr中实现这一点,但这是一种更简单的方法。谢谢
answer_cols <- grep('_answer', names(df))
new_cols <- paste0(names(df)[-answer_cols], '_dist')

df[new_cols] <- Map(function(x, y) match(x, levels(x)) - match(y, levels(x)),
                                     df[-answer_cols], df[answer_cols])

df
#  apple apple_answer dog dog_answer apple_dist dog_dist
#1   57%          22% 82%        16%         -4       -6
df$apple_dist = apply(df[,1:2], 1, function(x) {
    which(levels(df$apple) == x[1]) - which(levels(df$apple) == x[2])
})

df$dog_dist = apply(df[,3:4], 1, function(x) {
    which(levels(df$dog) == x[1]) - which(levels(df$dog) == x[2])
})

> df
  apple apple_answer dog dog_answer apple_dist dog_dist
1   57%          22% 82%        16%         -4       -6