如何利用numba在Python中有效地解包Monte Carlo模拟？解决了的_Python_Performance_Simulation_Numba

如何利用numba在Python中有效地解包Monte Carlo模拟？解决了的

python performance

如何利用numba在Python中有效地解包Monte Carlo模拟？解决了的,python,performance,simulation,numba,Python,Performance,Simulation,Numba,我试图有效地创建一个蒙特卡罗模拟，因为在我的用例中，我需要运行这个模拟70*10^6次。我希望有一个更有经验的人，特别是在表演方面，能为我提供一些想法，我可以尝试什么。我有以下投入：要求每列是一个产品，每行是一个月在一个确定的月份，某些产品的需求通过三角分布元组（最小值、平均值、最大值）进行估计。对于这些值，我将进行1000次蒙特卡罗模拟股票我想要的输出是找到：可用产品总量分布的中位数（np.中位数（np.sum（available_products））），中位数接收可

我试图有效地创建一个蒙特卡罗模拟，因为在我的用例中，我需要运行这个模拟70*10^6次。我希望有一个更有经验的人，特别是在表演方面，能为我提供一些想法，我可以尝试什么。我有以下投入：

要求
- 每列是一个产品，每行是一个月
- 在一个确定的月份，某些产品的需求通过三角分布元组（最小值、平均值、最大值）进行估计。对于这些值，我将进行1000次蒙特卡罗模拟
股票

我想要的输出是找到：

可用产品总量分布的中位数（np.中位数（np.sum（available_products））），中位数接收可用_产品总量的1000次模拟（available_products=库存需求）

但是，我有一些问题：

速度，我有直觉，有聪明的方法来计算杠杆矢量化函数。然而，我想不出任何循环，所以我尝试了通常的循环。如果你有任何线索，任何不同的方法，可以更快，让我知道
修复了无法为数组设置值的问题，在我的解决方案中，无法使用需求量j[指数需求量非需求量0][k]=预测需求量值[k][j]
- 解决方案中，我只需要通过demand_j[行，列]直接访问demand_j位置

下面是使用@Glauco建议的3D阵列进行需求的代码：

import numpy as np
from numba import jit


@jit(nopython=True, nogil=True, fastmath=True)
def calc_triangular_dist(demand_distribution, num_monte):
    # Calculates triangular distributions
    return np.random.triangular(demand_distribution[0], demand_distribution[1], demand_distribution[2], size=num_monte)


def demand3d():
    # Goal find distribution_of_median_of_sum_available_products(np.median(np.sum(available_products)), the median from the 1000 Monte Carlo Simulations ): available_products=stock-demand (Each demand is generated by a Monte Carlo simulation 1000 times, therefore I will have 1000 demand arrays and consequently I will have a distribution of 1000 values of available products)
    # Input
    demand_triangular = np.array(
        [
            [0.0, 0.0, 0.0, 0.0],
            [0.0, 0.0, 0.0, (4.5, 5.5, 8.25)],
            [(2.1, 3.1, 4.65), 0.0, 0.0, (4.5, 5.5, 8.25)],
        ]
    )  # Each column represents a product, each row a month. Tuples are for triangular distribution (min,mean,max)
    stock = np.array(
        [[30, 30, 30, 22], [30, 30, 30, 22], [30, 30, 30, 22]]
    )  # Stock of available products, Each column represents a product, each row a month.
    num_sim_monte_carlo = 1000

    # Problem 1) How to unpack effectively each array of demand from simulation? Given that in my real case I would have 70 tuples to perform the Monte Carlo simulation?

    row, col = demand_triangular.shape
    index_demand_not_0 = np.where(
        demand_triangular != 0
    )  # Index of values that are not zeros,therefore my tuples for triangular distribution

    demand_j = np.zeros(shape=(row, col,num_sim_monte_carlo), dtype=float)

    triangular_len = len(demand_triangular[index_demand_not_0])  # Length of rows to calculate triangular
    for k in range(0, triangular_len):  # loop per values to simulate
        demand_j[index_demand_not_0[0][k], index_demand_not_0[1][k]] = calc_triangular_dist(
            demand_triangular[index_demand_not_0][k], num_sim_monte_carlo
        )

    sums_available_simulations = np.zeros(
        shape=num_sim_monte_carlo
    )  # Stores each 1000 different sums of available, generated by unpacking the dict_demand_velues_simulations

    for j in range(0, num_sim_monte_carlo):  # loop per number of monte carlo simulations
        available = stock - demand_j[:,:,j]
        available[available < 0] = 0  # Fixes with values are negative
        sums_available_simulations[j] = np.sum(available)  # Stores available for each simulation
    print("Median of distribution of available is: ", np.median(sums_available_simulations))

if __name__ == "__main__":
    demand3d()

多亏了

可以使用数组编程+奇特的索引删除内部循环，这加快了对demand_j的分配。

另一点是，您可以在添加维度（num_sim_montecarlo）的demand_j生成一次3d数组，在循环中，您必须只读取值，避免在每个循环中创建值。

谢谢@Glauco您给了我宝贵的见解！关于这个建议，1）考虑到dict_demand_values_模拟，我没有找到使用numpy索引的方法，而是切割一个我测试过的循环，而不是逐个运行MonteCarlo而不是循环。2）测试向需求添加维度（num_sim_montecarlo）。它大大提高了性能。非常感谢

Baseline  0.4067141000000001
1) Monte Carlo per loop  0.035586100000000176
2) Demand 3D  0.017964299999999822