R 随机抽样以给出精确的总数_R_Random_Sampling

R 随机抽样以给出精确的总数

r random

R 随机抽样以给出精确的总数,r,random,sampling,R,Random,Sampling,我想对1000到100000之间的140个数字进行采样，使这140个数字的总和约为200万（2000000）：以便： sum(sample(1000:100000,140)) = 2000000 有什么建议可以帮助我实现这一目标吗？这里有一些让我接近200万的方法。希望有人能发布一个更聪明的解决方案在该选项中，我们使用prob参数使较小的值更可能出现，并通过试错选择指数。此方法严重偏向于在OP中指定的范围内选择较低的值 x1 = sample(1000:100000,140, prob=(

我想对1000到100000之间的140个数字进行采样，使这140个数字的总和约为200万（2000000）：

以便：

sum(sample(1000:100000,140)) = 2000000

有什么建议可以帮助我实现这一目标吗？

这里有一些让我接近200万的方法。希望有人能发布一个更聪明的解决方案

在该选项中，我们使用

prob

参数使较小的值更可能出现，并通过试错选择指数。此方法严重偏向于在OP中指定的范围内选择较低的值

x1 = sample(1000:100000,140, prob=(1e5:1e3)^5.5)
mean(replicate(100, sum(sample(1000:100000,140, prob=(1e5:1e3)^5.5))))

在此选项中，我们从截断的法线（在给定边界处截断）采样。我们最初将平均值设置为2e6/140=14285.71。然而，如果标准偏差足够大，以至于在下边界附近产生大量值，则截断会使平均值偏大，因此我们添加了一个通过试错选择的校正

library(truncnorm)
x2 = rtruncnorm(140, 1e3, 1e5, mean=0.82*2e6/140, sd=1e4)
mean(replicate(1000, sum(rtruncnorm(140, 1e3, 1e5, mean=0.82*2e6/140, sd=1e4))))

如果设置了较小的标准偏差，则无需进行校正。但是，通过这种方式，您得到的值远低于平均值

mean(replicate(1000, sum(rtruncnorm(140, 1e3, 1e5, mean=2e6/140, sd=0.5e4))))

在任何一种情况下，

样本

方法的指数，或对截断正态分布的校正，都可以通过自动搜索来选择，并允许平均和与200万之间的差异

以下是一些典型的输出分布：

这里是一个尝试，尝试改变上键。这样做的目的是在总和越来越大时降低上限

sup<- 100000
tir <- vector(length = 140)
for(i in 1:140){
  print(i)
  tir[i] <- sample(1000:sup,1)
  sup <- max(1001,min(sup,abs(2000000 - sum(tir,na.rm = T))/(140-i)*2))
}
sum(tir)
[1] 2001751

sup这里是一个命中率很高的方法。其基本思想是，找到140个总数为2000000的数字等于将1:2000000分成140块，这需要139个切点。另外，请注意，最小值1000有点烦人。只需从所有问题数据中减去它，然后在最后再加上：
rand.nums <- function(a,b,n,k){
  #finds n random integers in range a:b which sum to k
  while(TRUE){
    x <- sample(1:(k - n*a),n-1, replace = TRUE) #cutpoints
    x <- sort(x)
    x <- c(x,k-n*a) - c(0,x)
    if(max(x) <= b-a) return(a+x)
  }
}

rand.nums存在生成此类随机数的算法
最初为创建，它有一个R实现：

引用MATLAB脚本注释：
%   This generates an n by m array x, each of whose m columns
% contains n random values lying in the interval [a,b], but
% subject to the condition that their sum be equal to s.  The
% scalar value s must accordingly satisfy n*a <= s <= n*b.  The
% distribution of values is uniform in the sense that it has the
% conditional probability distribution of a uniform distribution
% over the whole n-cube, given that the sum of the x's is s.
%
%   The scalar v, if requested, returns with the total
% n-1 dimensional volume (content) of the subset satisfying
% this condition.  Consequently if v, considered as a function
% of s and divided by sqrt(n), is integrated with respect to s
% from s = a to s = b, the result would necessarily be the
% n-dimensional volume of the whole cube, namely (b-a)^n.
%
%   This algorithm does no "rejecting" on the sets of x's it
% obtains.  It is designed to generate only those that satisfy all
% the above conditions and to do so with a uniform distribution.
% It accomplishes this by decomposing the space of all possible x
% sets (columns) into n-1 dimensional simplexes.  (Line segments,
% triangles, and tetrahedra, are one-, two-, and three-dimensional
% examples of simplexes, respectively.)  It makes use of three
% different sets of 'rand' variables, one to locate values
% uniformly within each type of simplex, another to randomly
% select representatives of each different type of simplex in
% proportion to their volume, and a third to perform random
% permutations to provide an even distribution of simplex choices
% among like types.  For example, with n equal to 3 and s set at,
% say, 40% of the way from a towards b, there will be 2 different
% types of simplex, in this case triangles, each with its own
% area, and 6 different versions of each from permutations, for
% a total of 12 triangles, and these all fit together to form a
% particular planar non-regular hexagon in 3 dimensions, with v
% returned set equal to the hexagon's area.
%
% Roger Stafford - Jan. 19, 2006

%这将生成一个n×m的数组x，每个数组x有m列
%包含位于区间[a，b]中的n个随机值，但
%以其总和等于s为条件。这个
%标量值s必须相应地满足n*a，你从哪个分布中采样？我没有任何特定的分布。它可以是任何东西大约200万或者确切地说是？如果可以是任何东西，你需要它离你有多近。。生成从1到140的70个数字，无需替换，将这些位置的平局设置为2000000/70=28571.42857，否则将设置为0，总数将正好为2000000，而分布将为伯努利。目前问题描述中存在太多模糊性。a） 需要一个精确的和吗？b）采样是从实数还是整数进行的？我喜欢这个想法。对于某些范围n
切点是x@martini稍微调整了一下。我注意到原始算法的最小可能数是1001，而不是1000。允许某些切点相同将允许在添加1000之前选择0。可能还值得显示anyDuplicated（test$RandVecOutput）#0
说明没有重复/没有替换。有可能用替换重复吗？@Hardikgupta我错误地认为替换与此答案相关。随机向量是实数，而不是属于1000:100000@Hardikgupta的整数。该算法中没有特别限制替换的内容，但从99000中获得140个数字的副本是非常不幸的。
[1] 2008494

sup<- 100000
tir <- vector(length = 140)
for(i in 1:140){
  print(i)
  tir[i] <- sample(1000:sup,1)
  sup <- max(1001,min(sup,abs(2000000 - sum(tir,na.rm = T))/(140-i)*2))
}
sum(tir)
[1] 2001751

rand.nums <- function(a,b,n,k){
  #finds n random integers in range a:b which sum to k
  while(TRUE){
    x <- sample(1:(k - n*a),n-1, replace = TRUE) #cutpoints
    x <- sort(x)
    x <- c(x,k-n*a) - c(0,x)
    if(max(x) <= b-a) return(a+x)
  }
}

%   This generates an n by m array x, each of whose m columns
% contains n random values lying in the interval [a,b], but
% subject to the condition that their sum be equal to s.  The
% scalar value s must accordingly satisfy n*a <= s <= n*b.  The
% distribution of values is uniform in the sense that it has the
% conditional probability distribution of a uniform distribution
% over the whole n-cube, given that the sum of the x's is s.
%
%   The scalar v, if requested, returns with the total
% n-1 dimensional volume (content) of the subset satisfying
% this condition.  Consequently if v, considered as a function
% of s and divided by sqrt(n), is integrated with respect to s
% from s = a to s = b, the result would necessarily be the
% n-dimensional volume of the whole cube, namely (b-a)^n.
%
%   This algorithm does no "rejecting" on the sets of x's it
% obtains.  It is designed to generate only those that satisfy all
% the above conditions and to do so with a uniform distribution.
% It accomplishes this by decomposing the space of all possible x
% sets (columns) into n-1 dimensional simplexes.  (Line segments,
% triangles, and tetrahedra, are one-, two-, and three-dimensional
% examples of simplexes, respectively.)  It makes use of three
% different sets of 'rand' variables, one to locate values
% uniformly within each type of simplex, another to randomly
% select representatives of each different type of simplex in
% proportion to their volume, and a third to perform random
% permutations to provide an even distribution of simplex choices
% among like types.  For example, with n equal to 3 and s set at,
% say, 40% of the way from a towards b, there will be 2 different
% types of simplex, in this case triangles, each with its own
% area, and 6 different versions of each from permutations, for
% a total of 12 triangles, and these all fit together to form a
% particular planar non-regular hexagon in 3 dimensions, with v
% returned set equal to the hexagon's area.
%
% Roger Stafford - Jan. 19, 2006

test <- Surrogate::RandVec(a=1000, b=100000, s=2000000, n=140, m=1, Seed=sample(1:1000, size = 1))
sum(test$RandVecOutput)
# 2000000
hist(test$RandVecOutput)