R 如何计算在另一列中某个特定数字上下的震级(百分比)?
我有这个数据集R 如何计算在另一列中某个特定数字上下的震级(百分比)?,r,percentage,magnitude,R,Percentage,Magnitude,我有这个数据集 study_ID title experiment question_ID participant_ID estimate_level estimate correct_answer question type category age gender <dbl> <chr> <dbl> <chr>
study_ID title experiment question_ID participant_ID estimate_level estimate correct_answer question type category age gender
<dbl> <chr> <dbl> <chr> <int> <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <chr>
1 11 Dallacker_Parents'_co… 1 1 1 individual 3 10 How many sugar cubes does or… unlim… nutriti… 32 Female
2 11 Dallacker_Parents'_co… 1 2 1 individual 10 11.5 How many sugar cubes does a … unlim… nutriti… 32 Female
3 11 Dallacker_Parents'_co… 1 3 1 individual 7 6.5 How many sugar cubes does a … unlim… nutriti… 32 Female
4 11 Dallacker_Parents'_co… 1 4 1 individual 1 16.5 How many sugar cubes does a … unlim… nutriti… 32 Female
5 11 Dallacker_Parents'_co… 1 5 1 individual 7 11 How many sugar cubes does a … unlim… nutriti… 32 Female
6 11 Dallacker_Parents'_co… 1 6 1 individual 5 2.5 How many sugar cubes does a … unlim… nutriti… 32 Female
7 11 Dallacker_Parents'_co… 1 1 2 individual 2 10 How many sugar cubes does or… unlim… nutriti… 29 Female
8 11 Dallacker_Parents'_co… 1 2 2 individual 10 11.5 How many sugar cubes does a … unlim… nutriti… 29 Female
9 11 Dallacker_Parents'_co… 1 3 2 individual 1.5 6.5 How many sugar cubes does a … unlim… nutriti… 29 Female
10 11 Dallacker_Parents'_co… 1 4 2 individual 2 16.5 How many sugar cubes does a … unlim… nutriti… 29 Female
以下是一种
dplyr
方法:
library(dplyr)
df %>%
group_by(question_ID) %>%
summarize(prop_over = mean(estimate > correct),
prop_under = mean(estimate < correct),
prop_correct = mean(estimate == correct)
)
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 1 x 4
# question_ID prop_over prop_under prop_correct
# <dbl> <dbl> <dbl> <dbl>
# 1 1 0 1 0
库(dplyr)
df%>%
分组人(问题ID)%>%
总结(prop_over=平均值(估计值>正确值),
prop_under=平均值(估计值<正确值),
项目正确=平均值(估计值=正确)
)
#`summary()`解组输出(用`.groups`参数重写)
##A tibble:1 x 4
#问题(ID prop)prop(在prop)上prop(在prop)下prop(正确)
#
# 1 1 0 1 0
列表1请提供一个dput,但我认为一些group BY应该可以很容易地解决这个问题。数据框有1800行,因此我无法复制和粘贴整个dput。除非我做错了什么。我不是很有知识对不起!然后请做一个dput(头部(df,10))
@dampfy@ekoam我刚刚加了,希望这就是你需要的@布鲁诺:谢谢你的回复!这将是查找>,这是比较名为估算
列和名为更正
列。您共享的数据没有名为correct\u answer
的列,因此我不能告诉您太多,但我相信您可以使用此模板来比较您需要的任何列。意思是计算一个比例<当估计值大于正确值时,代码>估计值>正确值
为真
,否则为假
。对真/假值进行数学运算时,true
为1
,false
为0
。因此,真/假列的总和是true
值的计数,真/假列的平均值是true
值的比例。当然,如果你想把一个比例变成一个百分比,你可以乘以100。在许多编程语言中,sum
是计算事物的标准方法,mean
是计算比例的标准方法-只要输入是二进制的。谢谢!这起作用了。对不起,我以为我以前提到过。
library(dplyr)
df %>%
group_by(question_ID) %>%
summarize(prop_over = mean(estimate > correct),
prop_under = mean(estimate < correct),
prop_correct = mean(estimate == correct)
)
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 1 x 4
# question_ID prop_over prop_under prop_correct
# <dbl> <dbl> <dbl> <dbl>
# 1 1 0 1 0
list1 <- lapply(split(DF, DF$question_ID), function (x) {
overestimated <- 100 * length(which(x$estimate > x$correct)) / length(x$estimate)
underestimated <- 100 * length(which(x$estimate < x$correct)) / length(x$estimate)
correct <- 100 * length(which(x$estimate == x$correct)) / length(x$estimate)
data.frame(overestimated, underestimated, correct)
})
list2 <- mapply(function (x, y) {
x$question_ID <- y
return (x)
}, x = list1, y = names(list1), SIMPLIFY = F)
Percent_Data <- do.call("rbind", list2)
Percent_Data <- Percent_Data[, c(which(colnames(Percent_Data) == "question_ID"), which(colnames(Percent_Data) != "question_ID"))]
Percent_Data
# question_ID overestimated underestimated correct
# 1 1 0 100 0