R 计算给定条件下的百分比_R_Dplyr_Conditional_Percentage_Mutate

R 计算给定条件下的百分比

R 计算给定条件下的百分比,r,dplyr,conditional,percentage,mutate,R,Dplyr,Conditional,Percentage,Mutate,我对这个网站和编码都是新手。我想知道你们中是否有人能帮我我需要计算排名前5位的电影，通过评级分布，计算每部电影的4星级或更高的评级百分比到目前为止，我只能使用dplyr计算出现的次数是否可以使用dplyr（类似于我的编码）计算它我不确定我是否需要变异来找到解决方案，或者是否有其他方法可以做到这一点到目前为止，我的代码是： dfAux1 <- na.omit(dfAux) dfAux1 %>% group_by(movie) %>% summarise(tot

我对这个网站和编码都是新手。我想知道你们中是否有人能帮我

我需要计算排名前5位的电影，通过评级分布，计算每部电影的4星级或更高的评级百分比

到目前为止，我只能使用dplyr计算出现的次数

是否可以使用dplyr（类似于我的编码）计算它

我不确定我是否需要变异来找到解决方案，或者是否有其他方法可以做到这一点

到目前为止，我的代码是：

dfAux1 <- na.omit(dfAux)
dfAux1 %>%
  group_by(movie) %>%
  summarise(tot = n()) %>%
  arrange(desc(tot))%>%
  head(5)

到目前为止，这是我的结果：

# A tibble: 5 x 2
                              movie   tot
                             <fctr> <int>
1                         Toy Story    17
2          The Silence of the Lambs    16
3         Star Wars IV - A New Hope    15
4 Star Wars VI - Return of the Jedi    14
5                  Independence Day    13

我使用的是

data.table

而不是

dplyr

library(data.table)
setDT(dfAux1)  # make dfAux1 as data table by reference

# calculate total number by movies, then compute percent for `Rating >= 4` by movies and then sort `tot` by descending order and also eliminating duplicates in movies using `.SD[1]` which gives the first row in each movie. 
dfAux1[, .(Rating, tot = .N), by = movie ][Rating >= 4, .(percent = .N/tot, tot), by = movie ][order(-tot), .SD[1], by = movie]

#                                movie    percent tot
# 1:                         Toy Story 0.35294118  17
# 2:          The Silence of the Lambs 0.43750000  16
# 3:         Star Wars IV - A New Hope 0.53333333  15
# 4: Star Wars VI - Return of the Jedi 0.35714286  14
# 5:                  Independence Day 0.30769231  13
# 6:                         Gladiator 0.50000000  12
# 7:                      Total Recall 0.08333333  12
# 8:                     Groundhog Day 0.41666667  12
# 9:                        The Matrix 0.41666667  12
# 10:                  Schindler's List 0.33333333  12
# 11:                   The Sixth Sense 0.33333333  12
# 12:               Saving Private Ryan 0.36363636  11
# 13:                      Pulp Fiction 0.36363636  11
# 14:                       Stand by Me 0.36363636  11
# 15:               Shakespeare in Love 0.27272727  11
# 16:           Raiders of the Lost Ark 0.27272727  11
# 17:                      Forrest Gump 0.30000000  10
# 18:          The Shawshank Redemption 0.70000000  10
# 19:                              Babe 0.40000000  10
# 20:                      Blade Runner 0.44444444   9

概述我使用该软件包按

movie

列对数据进行分组，并根据

rating

列执行计算

在中，我创建了三个新列：

Total\u Review

：统计每部

电影的总评论数


FourPlus\u评分
：统计评分值为4或更高的审核子集
Per\u FourPlus\u评级
：将FourPlus\u评级
除以Total\u Review
然后，我根据每个FourPlus评分将日期按降序排列。最后，我调用以指定我只希望返回前5行
可复制示例
#安装必要的软件包
安装程序包（pkgs=“dplyr”）
#加载必要的包
图书馆（dplyr）
#查看前六行
水头（x=df）
#分级电影
#1《星球大战四》新希望
#2.5《星球大战四》新希望
#5 4星球大战四-新希望
#6.2《星球大战四》新希望
#8.4《星球大战四》新希望
#9.5《星球大战四》新希望
#使用
#dplyr函数
df%>%
分组人（电影）%>%
总结（总审查=n（）
，FourPlus_额定值=长度（额定值[其中（额定值>=4）]）
，Per_FourPlus_Rating=长度（Rating[which（Rating>=4）]）/n（））%>%
排列（描述（每四加评级））%>%
水头（n=5）
#一个tibble:5x4
#电影总回顾FourPlus按比率的分级…
#                                                
#1肖申克红…10 7 0.700
#2《星球大战四》A N…15 8 0.533
#3角斗士12 6 0.500
#4叶片转轮9 4 0.444
#5…的沉默16 7 0.438
#脚本结束#
使用数据的单线解决方案。表
和OP中的数据可以如下所示：
library(data.table)
setDT(dfAux1)[, .(pct = sum(Rating>=4)/.N), by=movie][order(-pct)][1:5]
                  movie        pct
1:  The Shawshank Redemption 0.7000000
2: Star Wars IV - A New Hope 0.5333333
3:                 Gladiator 0.5000000
4:              Blade Runner 0.4444444
5:  The Silence of the Lambs 0.4375000

这是一个dplyr解决方案：
    dfAuxhigh=filter(dfAux1,Rating>=4)%>%group_by(movie)%>%summarize(percentHigh=n())
dfAux=dfAux1%>%group_by(movie)%>%summarize(percentAll=n())
result<-merge(dfAuxhigh,dfAux,by="movie")%>%mutate(percentage=percentHigh/percentAll)
result<-result[order(result$percentage,decreasing = T)[1:5],c(1,4)]

dfAuxhigh=filter（dfAux1，评级>=4）%%>%groupby（movie）%%>%summary（percentHigh=n（））
dfAux=dfAux1%%>%group_by（电影）%%>%SUMMARY（percentAll=n（））
结果%变异（百分比=百分比高/百分比全）
结果<代码>库（tidyverse）
df%>%
组别（电影、评级）%>%
总结（n=n（））%>%#%#<查找perc
过滤器（额定值>=4）%>%#<所需额定值的过滤器（4或以上）
总结（freq=sum（freq））%>%#<再次总结
排名靠前的（5）%>%
排列（描述（频率））%>%
变异（freq=paste0（四舍五入（freq*100,2），“%”）
#>电影频率
#>1《肖申克的救赎》70%
#>2《星球大战四》新希望53.33%
#>3角斗士50%
#>4叶片转轮44.44%
#>5羔羊的沉默43.75%
请dput
并共享包含电影详细信息的数据帧。您应该使用dput（dfAux1）
并请共享输出str
帮不了什么忙。我帮了，看起来糟透了。请使用数据查看我的单行解决方案。表
。这并不是我想要做的，例如，《角斗士》有12篇评论，在这12篇评论中，有6篇被评为4或5，因此我要找的数字是50%。这很有效！谢谢，我只需要按前5名排序。我想这应该是（至少是我想要的）你所做的事情，唯一的问题是我需要计算发生的次数，而不是求和。例如，《角斗士》有12篇评论，其中6篇评为>=4，应该得到0.5分哦，你修好了，谢谢！它工作了，无法计算它的长度…我真的很感谢你帮助我，谢谢你分享所需的输出！我在上面加了一些解释。希望这有帮助！谢谢你的解释。可能您可以删除df的定义，该定义已包含在OP中。这将使你的帖子非常清晰易懂。谢谢@MKR！我不知道它看起来有多难看哈哈，现在干净多了！非常感谢。我不知道你只用一行代码就能做到，我是高兴还是生气呵呵
# install necessary package
install.packages( pkgs = "dplyr" )

# load necessary package
library( dplyr )


# view first six rows
head( x = df )
#   Rating                     movie
# 1      1 Star Wars IV - A New Hope
# 2      5 Star Wars IV - A New Hope
# 5      4 Star Wars IV - A New Hope
# 6      2 Star Wars IV - A New Hope
# 8      4 Star Wars IV - A New Hope
# 9      5 Star Wars IV - A New Hope

# perform calculations using 
# dplyr functions
df %>%
  group_by( movie ) %>%
  summarise( Total_Review              = n()
             , FourPlus_Rating         = length( Rating[ which( Rating >= 4 ) ] )
             , Per_FourPlus_Rating     = length( Rating[ which( Rating >= 4 ) ] ) / n() ) %>%
  arrange( desc( Per_FourPlus_Rating ) ) %>%
  head( n = 5 )
# A tibble: 5 x 4
# movie               Total_Review FourPlus_Rating Per_FourPlus_Rati…
# <fct>                      <int>           <int>              <dbl>
# 1 The Shawshank Rede…           10               7              0.700
# 2 Star Wars IV - A N…           15               8              0.533
# 3 Gladiator                     12               6              0.500
# 4 Blade Runner                   9               4              0.444
# 5 The Silence of the…           16               7              0.438

# end of script #

library(data.table)
setDT(dfAux1)[, .(pct = sum(Rating>=4)/.N), by=movie][order(-pct)][1:5]
                  movie        pct
1:  The Shawshank Redemption 0.7000000
2: Star Wars IV - A New Hope 0.5333333
3:                 Gladiator 0.5000000
4:              Blade Runner 0.4444444
5:  The Silence of the Lambs 0.4375000

    dfAuxhigh=filter(dfAux1,Rating>=4)%>%group_by(movie)%>%summarize(percentHigh=n())
dfAux=dfAux1%>%group_by(movie)%>%summarize(percentAll=n())
result<-merge(dfAuxhigh,dfAux,by="movie")%>%mutate(percentage=percentHigh/percentAll)
result<-result[order(result$percentage,decreasing = T)[1:5],c(1,4)]

library(tidyverse)

df %>% 
  group_by(movie, Rating) %>% 
  summarise(n = n()) %>%           #< get freq of movies
  mutate(freq = n/sum(n)) %>%      #< find perc for each rating, by movie
  filter(Rating >=4) %>%           #< filter for desired rating (4 or above) 
  summarise(freq = sum(freq)) %>%  #< summarize again
  top_n(5) %>%                     
  arrange(desc(freq)) %>% 
  mutate(freq = paste0(round(freq*100, 2), "%"))

#>   movie                     freq  
#> 1 The Shawshank Redemption  70%  
#> 2 Star Wars IV - A New Hope 53.33%
#> 3 Gladiator                 50%   
#> 4 Blade Runner              44.44%
#> 5 The Silence of the Lambs  43.75%