Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R/gg绘制直方图中的累积和_R_Ggplot2 - Fatal编程技术网

R/gg绘制直方图中的累积和

R/gg绘制直方图中的累积和,r,ggplot2,R,Ggplot2,我有一个带有用户ID和他们创建的对象数量的数据集。我使用ggplot绘制了直方图,现在我尝试将x值的累积和作为一条线。目的是看到垃圾箱对总数量的贡献。我尝试了以下方法: ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2)+ scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+ geom_lin

我有一个带有用户ID和他们创建的对象数量的数据集。我使用ggplot绘制了直方图,现在我尝试将x值的累积和作为一条线。目的是看到垃圾箱对总数量的贡献。我尝试了以下方法:

ggplot(data=userStats,aes(x=Num_Tours)) + geom_histogram(binwidth = 0.2)+
   scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+
   geom_line(aes(x=Num_Tours, y=cumsum(Num_Tours)/sum(Num_Tours)*3500),color="red")+
   scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./3500, name = "Cummulative percentage of routes [%]"))
这不起作用,因为我没有包括任何垃圾箱,所以情节

因此: .

这里考虑计数的总和。我想要的是箱子的计数*值的总和。然后应该对其进行规范化,以便可以在一个绘图中显示。我想说的是:

如果您有任何意见,我将不胜感激!谢谢

编辑: 作为测试数据,这应适用于:

userID <- c(1:100)
Num_Tours <- sample(1:100,100)
userStats <- data.frame(userID,Num_Tours)
userStats$cumulative <- cumsum(userStats$Num_Tours/sum(userStats$Num_Tours))

userID下面是一个对您有帮助的示例

set.seed(111)
userID <- c(1:100)
Num_Tours <- sample(1:100, 100, replace=T)
userStats <- data.frame(userID, Num_Tours)

# Sorting x data
userStats$Num_Tours <- sort(userStats$Num_Tours)
userStats$cumulative <- cumsum(userStats$Num_Tours/sum(userStats$Num_Tours))

library(ggplot2)
# Fix manually the maximum value of y-axis
ymax <- 40
ggplot(data=userStats,aes(x=Num_Tours)) + 
   geom_histogram(binwidth = 0.2, col="white")+
   scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+
   geom_line(aes(x=Num_Tours,y=cumulative*ymax), col="red", lwd=1)+
   scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./ymax, 
    name = "Cumulative percentage of routes [%]"))
set.seed(111)

请提供用户ID示例数据
set.seed(111)
userID <- c(1:100)
Num_Tours <- sample(1:100, 100, replace=T)
userStats <- data.frame(userID, Num_Tours)

# Sorting x data
userStats$Num_Tours <- sort(userStats$Num_Tours)
userStats$cumulative <- cumsum(userStats$Num_Tours/sum(userStats$Num_Tours))

library(ggplot2)
# Fix manually the maximum value of y-axis
ymax <- 40
ggplot(data=userStats,aes(x=Num_Tours)) + 
   geom_histogram(binwidth = 0.2, col="white")+
   scale_x_log10(name = 'Number of planned tours',breaks=c(1,5,10,50,100,200))+
   geom_line(aes(x=Num_Tours,y=cumulative*ymax), col="red", lwd=1)+
   scale_y_continuous(name = 'Number of users', sec.axis = sec_axis(~./ymax, 
    name = "Cumulative percentage of routes [%]"))