假设上下边界均为均匀分布,如何从上下边界模拟R值?

假设上下边界均为均匀分布,如何从上下边界模拟R值?,r,statistics,R,Statistics,我有以下几点: # A tibble: 1,100 x 3 income minimum maximum <dbl> <dbl> <dbl> 1 NA NA NA 2 0 0 25 3 0 0 25 4 NA

我有以下几点:

# A tibble: 1,100 x 3
   income       minimum       maximum
    <dbl>         <dbl>         <dbl>
 1     NA            NA            NA
 2      0             0            25
 3      0             0            25
 4     NA            NA            NA
 5      4           100           200
#一个tible:1100 x 3
最低收入最高收入
1NA NA NA
2      0             0            25
3      0             0            25
4娜娜娜娜
5      4           100           200
我想从最小值和最大值模拟一个值,假设它们服从均匀分布

你知道怎么做吗?
模拟值应显示在可变收入下的右侧。

使用
apply()
尝试这种方法。您可以使用
runif()
在行级别使用
lowerboundary
upperboundary
变量生成值。对于那些带有
NA
的行,您将得到
NaN
。代码如下:

#Code
df$Salary <- apply(df[,-1],1,function(x) {y <- runif(1,x[1],x[2]); y})
使用的一些数据:

#Data
df <- structure(list(income = c(NA, 0L, 0L, NA, 4L, NA, NA, 4L, NA, 
12L), lowerboundary = c(NA, 0L, 0L, NA, 425L, NA, NA, 425L, NA, 
2400L), upperboundary = c(NA, 50L, 50L, NA, 600L, NA, NA, 600L, 
NA, 3000L)), row.names = c(NA, -10L), class = "data.frame")
#数据

df这可能就是您想要的:

df$salary <- runif(nrow(df)) * (df$upperboundary - df$lowerboundary) + df$lowerboundary
但是,也可以直接定义边界:

df$salary <- runif(nrow(df), df$lowerboundary, df$upperboundary)
让我们看一下1,手动定义一个最大值和一个最小值

默认情况下,
runif(1)
等于:

runif(1, min = 0, max = 1)
因此,它根据均匀分布返回0到1之间的随机数

要返回两个不同限制之间的随机数,例如
min=10
max=20
,可以通过以下方式执行:

runif(1, min = 10, max = 20)

如果runif的输出为1:

1 * (20 - 10) + 10
==> 20 - 10 + 10
==> 20

这里还有另一种选择,即使用
dplyr
应用
解决方案:

library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(salary = runif(1, lowerboundary, upperboundary)) %>% 
  ungroup()

这是一个速度比较。“数学”是最快的:

microbenchmark::microbenchmark(
  apply  =  apply(df[-1],1, function(x) runif(1, x[1], x[2])),
  maths  =  runif(nrow(df)) * (df$upperboundary - df$lowerboundary) + df$lowerboundary,
  maths2 =  runif(nrow(df), df$lowerboundary, df$upperboundary),
  dplyr  =  df %>% rowwise() %>% mutate(runif = runif(1, lowerboundary, upperboundary)) %>% ungroup()
)
#> Unit: microseconds
#>    expr    min      lq     mean  median      uq    max neval
#>   apply  907.1  955.90 1175.188 1023.70 1280.90 4455.0   100
#>   maths   16.8   26.05   32.651   31.25   38.65   75.0   100
#>  maths2  117.8  128.00  156.533  136.60  175.15  336.7   100
#>   dplyr 1424.2 1496.60 1821.068 1661.15 1989.20 3952.7   100

我们可以从
purrr

library(purrr)
library(dplyr)
df %>%
   mutate(salary = map2_dbl(lowerboundary, upperboundary, ~ runif(1, .x, .y)))
-输出

#   income lowerboundary upperboundary      salary
#1      NA            NA            NA         NaN
#2       0             0            50   33.771312
#3       0             0            50    3.577857
#4      NA            NA            NA         NaN
#5       4           425           600  514.912989
#6      NA            NA            NA         NaN
#7      NA            NA            NA         NaN
#8       4           425           600  516.179313
#9      NA            NA            NA         NaN
#10     12          2400          3000 2815.442543

我认为用(df,runif(nrow(df),lowerboundary,upperboundary))来做
就足够了。
对不起,我犯了一个错误。你说得对!我会编辑我的回答谢谢你的编辑;我想,如果数据的大小增加,那么Math和math2将相似。显然,NAs的存在会降低math2的速度。请检查您的数据帧是否称为df。否则,将解决方案中的df替换为数据帧的实际名称
1 * (20 - 10) + 10
==> 20 - 10 + 10
==> 20
library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(salary = runif(1, lowerboundary, upperboundary)) %>% 
  ungroup()
microbenchmark::microbenchmark(
  apply  =  apply(df[-1],1, function(x) runif(1, x[1], x[2])),
  maths  =  runif(nrow(df)) * (df$upperboundary - df$lowerboundary) + df$lowerboundary,
  maths2 =  runif(nrow(df), df$lowerboundary, df$upperboundary),
  dplyr  =  df %>% rowwise() %>% mutate(runif = runif(1, lowerboundary, upperboundary)) %>% ungroup()
)
#> Unit: microseconds
#>    expr    min      lq     mean  median      uq    max neval
#>   apply  907.1  955.90 1175.188 1023.70 1280.90 4455.0   100
#>   maths   16.8   26.05   32.651   31.25   38.65   75.0   100
#>  maths2  117.8  128.00  156.533  136.60  175.15  336.7   100
#>   dplyr 1424.2 1496.60 1821.068 1661.15 1989.20 3952.7   100
library(purrr)
library(dplyr)
df %>%
   mutate(salary = map2_dbl(lowerboundary, upperboundary, ~ runif(1, .x, .y)))
#   income lowerboundary upperboundary      salary
#1      NA            NA            NA         NaN
#2       0             0            50   33.771312
#3       0             0            50    3.577857
#4      NA            NA            NA         NaN
#5       4           425           600  514.912989
#6      NA            NA            NA         NaN
#7      NA            NA            NA         NaN
#8       4           425           600  516.179313
#9      NA            NA            NA         NaN
#10     12          2400          3000 2815.442543