Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/316.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在R中构造易于扩展的蒙特卡罗模型_Python_R_Simulation_Montecarlo - Fatal编程技术网

Python 如何在R中构造易于扩展的蒙特卡罗模型

Python 如何在R中构造易于扩展的蒙特卡罗模型,python,r,simulation,montecarlo,Python,R,Simulation,Montecarlo,我有一个简单的公司模型,它有两个农场,每个农场种两种作物(苹果和梨)。第一步是将树的数量乘以每棵树上的果实数量 模拟每棵树上的果实数量(跨农场和作物) 在R中建模时,我至少面临三个决策: 如何构造变量 如何模拟 如何将模拟变量与非模拟变量相乘 我希望即使我添加了另一种作物和/或农场,它也能工作——理想情况下,即使我添加了另一个维度,例如作物品种(Granny Smith等)。我还想按名称而不是索引号来指代农场和农作物 以下是我提出的方法。这是可行的,但很难添加另一个维度,而且代码也很多。有没

我有一个简单的公司模型,它有两个农场,每个农场种两种作物(苹果和梨)。第一步是将树的数量乘以每棵树上的果实数量

模拟每棵树上的果实数量(跨农场和作物)

在R中建模时,我至少面临三个决策:

  • 如何构造变量
  • 如何模拟
  • 如何将模拟变量与非模拟变量相乘
我希望即使我添加了另一种作物和/或农场,它也能工作——理想情况下,即使我添加了另一个维度,例如作物品种(Granny Smith等)。我还想按名称而不是索引号来指代农场和农作物

以下是我提出的方法。这是可行的,但很难添加另一个维度,而且代码也很多。有没有更整洁的方法

要构造变量,请执行以下操作:

farms <- c('Farm 1', 'Farm 2');
crops <- c('Pear', 'Apple');
params <- c('mean','sd');

numTrees <- array(0, dim=c(length(farms), length(crops)), dimnames=list(farms,crops));
fruitPerTree <- array(0, dim=c(length(farms), length(varieties), length(params)), 
                      dimnames=list(farms,varieties,params));

# input data e.g.
numTrees['Farm 1', 'Pear'] = 100
# and
fruitPerTree['Farm 1', 'Pear', 'mean'] = 50
相比之下,在Python(我比R更了解Python)中,我可以通过使用元组索引字典以及两个字典理解,在几行中执行所有这些步骤,例如:

fruit_per_tree = {}
fruit_per_tree[('Farm 1', 'Pear')]  = (50, 15) # normal params
sim_fruit_per_tree = {key: random.normal(*params, size=num_sims) 
                      for key, params in fruit_per_tree.items() }
sim_total_fruit = {key: sim_fruit_per_tree[key]*num_trees[key] for key in num_trees }

在R中也有简单的方法吗?谢谢

以下是我将如何设置这样的模拟:

#for reproducibility
set.seed(42)

#data
farms <- data.frame(farm=rep(1:2, each=2),
                    trees=sample(100, 4),
                    crop=rep(c("pear", "apple")),
                    mean=c(100, 200, 70, 120),
                    sd=c(30, 15, 25, 20))

#n
n <- 100

#simulation
fruits <- t(matrix(rnorm(n*nrow(farms), farms$mean, farms$sd), ncol=n))

#check 
colMeans(fruits)
#[1] 101.10215 200.06649  68.01185 120.05096

library(reshape2)
fruits <- melt(fruits, value.name="harvest_per_tree")
farms$i <- seq_len(nrow(farms))

farm_sim <- merge(farms, fruits, by.x="i", by.y="Var2", all=TRUE)
names(farm_sim)[7] <- "sim_i"

#multiply with number of trees
farm_sim$harvest_total <- farm_sim$harvest_per_tree * farm_sim$trees
head(farm_sim)
#   i farm trees crop mean sd sim_i harvest_per_tree harvest_total
# 1 1    1    92 pear  100 30     1        110.89385     10202.234
# 2 1    1    92 pear  100 30     2        145.34566     13371.801
# 3 1    1    92 pear  100 30     3        139.14609     12801.440
# 4 1    1    92 pear  100 30     4         96.00036      8832.033
# 5 1    1    92 pear  100 30     5         26.78599      2464.311
# 6 1    1    92 pear  100 30     6         94.84248      8725.508

library(ggplot2)
ggplot(farm_sim, aes(x=sim_i, y=harvest_total, colour=factor(farm))) +
  geom_line() +
  facet_wrap(~crop)
#用于再现性
种子(42)
#资料

农场如果我理解正确,您将对n个农场的水果总产量进行建模,每个农场都有k种作物(这里,n=k=2)。每个农场都有一些不同品种的树木,每个农场的生产力(果实/树木)是一个随机变量,分布在N(μ,σ)上,其中μ和σ取决于农场和品种

因此,对于输入,我们构造了一个数据框,
params
,包含5列:
农场、作物、树木、平均值和sd
。然后,每行包含给定农场/作物组合的树数、每棵树的平均生产率和每棵树的生产率变化。这些是输入

如果我们在树的层次上建模,那么给定农场中给定品种的每棵树的果实产量为:

rnorm(trees,mean,sd)
也就是说,输出是长度=#树的随机样本,平均值和sd适合给定的品种和农场。那么这个品种/农场所有树木的总产量就是上面向量的总和,总产量就是所有农场/作物的总和

所有这些都给了我们一次蒙特卡罗模型的迭代。为了确定总产出的分布,我们必须重复这个过程若干次。幸运的是,在R中,这相当简单:

set.seed(1)
farms  <- c('Farm 1', 'Farm 2')
crops  <- c('Pear', 'Apple')

params <- expand.grid(farms=farms,crops=crops)
params$trees<- 100
params$mean <- 50
params$sd   <- 10
n.iterations<- 1000

output <- function(i,p) {
  pp   <- p[3:5]   # trees, mean, sd for each farm/crop
  # fruit = total output for each farm/crop combination
  fruit <- colSums(apply(pp,1,function(x)rnorm(x[1],x[2],x[3])))
  return(sum(fruit))  # grand total output
}
dist   <- sapply(1:n.iterations,output,params)
print(c(mean=mean(dist),sd=sd(dist)),quotes=F,digits=4)
#    mean      sd 
# 19997.5   198.8 
hist(dist, main="Distribution of Total Output", 
     sub=paste(n.iterations,"Iterations"),xlab="Total Fruit Output")


最后,我敦促大家考虑每棵树的输出比正常情况更可能是泊松分布。如果使用
rpois(…)
而不是
rnorm(…)
重新运行模拟,则总体sd会稍低(~150而不是~200)。

以下是我的问题的一般解决方案。我从罗兰的方法开始,并对其进行了更新,使分布、参数和尺寸都可以轻松更改

distSim <- function(nSims, simName, distFn, dimList, paramList, constList) {
    #
    # Simulate from a distribution across all the dimensions.
    #
    # Args:
    #   nSims:     integer, e.g. 10000
    #   simName:   name of the output column, e.g. 'harvestPerTree'
    #   distFn:    distribution function, e.g. rnorm
    #   dimList:   list of dimensions, 
    #              e.g. list(farms=c('farm A', 'farm B'), crops=c('apple', 'pear', 'durian'))
    #   paramList: list of parameters, each of length = product(length(d) for d in dimList),
    #              to be passed to the distribution function,
    #              e.g. list(mean=c(10,20,30,5,10,15), sd=c(2,4,6,1,2,3))
    #   constList: optional vector of length = product(length(d) for d in dimList)
    #              these are included in the output dataframe
    #              e.g. list(nTrees=c(10,20,30,1,2,3))
    #
    # Returns:
    #   a dataframe with one row per iteration x product(d in dimList)
    #

    # expand out the dimensions into a dataframe grid - one row per combination
    df <- do.call(expand.grid, dimList);
    nRows <- nrow(df);
    # add the parameters, and constants, if present
    df <- cbind(df, paramList);
    if (!missing(constList)) {
        df <- cbind(df, constList);
    }
    # repeat this dataframe for each iteration of the simulation
    df <- do.call("rbind",replicate(nSims, df, simplify=FALSE));
    # add a new column giving the iteration number ('simId')
    simId <- sort(rep(seq(1:nSims),nRows));
    df <- cbind(simId, df);
    # simulate from the distribution
    df[simName] <- do.call(distFn, c(list(n=nrow(df)), df[names(paramList)]))
    return(df);
}
还要注意的是,您还可以以一种很好的索引方式定义输入值;e、 g.如果你定义

numTrees2 <- array(0, dim=c(length(farms), length(crops)), dimnames=tree_dimList);
numTrees2['Farm A','apple'] = 200; 
# etc

numTrees2循环不需要
rnorm
完全矢量化,并接受
mean
sd
的矢量。另外,我可能不会在这里使用数组。长格式的data.frame或data.table应该更容易使用。我只能重复:不需要循环。一次调用
rnorm
(或
rpois
)就足够了。谢谢你-我喜欢你的方法,尽管罗兰的方法更符合我的想法。非常感谢!谢谢罗兰。我将修改您的答案,以避免使用重塑命令:
farms
gg     <- do.call(rbind,
                  lapply(c(100,1000,10000),
                         function(n)cbind(n=n,total=sapply(1:n,output,params))))
gg     <- data.frame(gg)
library(ggplot2)
ggplot(gg)+
  geom_histogram(aes(x=total, y=..density.., fill=factor(n)))+
  scale_fill_discrete("Iterations")+
  facet_wrap(~n)
distSim <- function(nSims, simName, distFn, dimList, paramList, constList) {
    #
    # Simulate from a distribution across all the dimensions.
    #
    # Args:
    #   nSims:     integer, e.g. 10000
    #   simName:   name of the output column, e.g. 'harvestPerTree'
    #   distFn:    distribution function, e.g. rnorm
    #   dimList:   list of dimensions, 
    #              e.g. list(farms=c('farm A', 'farm B'), crops=c('apple', 'pear', 'durian'))
    #   paramList: list of parameters, each of length = product(length(d) for d in dimList),
    #              to be passed to the distribution function,
    #              e.g. list(mean=c(10,20,30,5,10,15), sd=c(2,4,6,1,2,3))
    #   constList: optional vector of length = product(length(d) for d in dimList)
    #              these are included in the output dataframe
    #              e.g. list(nTrees=c(10,20,30,1,2,3))
    #
    # Returns:
    #   a dataframe with one row per iteration x product(d in dimList)
    #

    # expand out the dimensions into a dataframe grid - one row per combination
    df <- do.call(expand.grid, dimList);
    nRows <- nrow(df);
    # add the parameters, and constants, if present
    df <- cbind(df, paramList);
    if (!missing(constList)) {
        df <- cbind(df, constList);
    }
    # repeat this dataframe for each iteration of the simulation
    df <- do.call("rbind",replicate(nSims, df, simplify=FALSE));
    # add a new column giving the iteration number ('simId')
    simId <- sort(rep(seq(1:nSims),nRows));
    df <- cbind(simId, df);
    # simulate from the distribution
    df[simName] <- do.call(distFn, c(list(n=nrow(df)), df[names(paramList)]))
    return(df);
}
dimList <- list(farms=c('farm A', 'farm B'), crops=c('apple', 'pear', 'durian'));
constList <- list(numTrees=c(10,20,30,1,2,3));
paramList <- list(mean=c(10,20,30,5,10,15), sd=c(2,4,6,1,2,3));
distSim(nSims=3, simName='harvestPerTree', distFn=rnorm, dimList=dimList, 
        paramList=paramList, constList=constList);
   simId  farms  crops mean sd numTrees harvestPerTree
1      1 farm A  apple   10  2       10       9.602849
2      1 farm B  apple   20  4       20      20.153225
3      1 farm A   pear   30  6       30      25.839825
4      1 farm B   pear    5  1        1       1.733120
5      1 farm A durian   10  2        2      13.506135
6      1 farm B durian   15  3        3      11.690910
7      2 farm A  apple   10  2       10       7.678611
8      2 farm B  apple   20  4       20      22.119560
9      2 farm A   pear   30  6       30      31.488002
10     2 farm B   pear    5  1        1       5.366725
11     2 farm A durian   10  2        2       6.333747
12     2 farm B durian   15  3        3      13.918085
13     3 farm A  apple   10  2       10      10.387194
14     3 farm B  apple   20  4       20      21.086388
15     3 farm A   pear   30  6       30      34.076926
16     3 farm B   pear    5  1        1       6.159415
17     3 farm A durian   10  2        2       8.322902
18     3 farm B durian   15  3        3       9.458085
numTrees2 <- array(0, dim=c(length(farms), length(crops)), dimnames=tree_dimList);
numTrees2['Farm A','apple'] = 200; 
# etc