Python 基于GSL蒙特卡罗极小化的Cython结构指针_Python_Pointers_Struct_Cython_Gsl

Python 基于GSL蒙特卡罗极小化的Cython结构指针

python pointers struct

Python 基于GSL蒙特卡罗极小化的Cython结构指针,python,pointers,struct,cython,gsl,Python,Pointers,Struct,Cython,Gsl,我被困在这个练习中，没有足够的能力解决它。基本上，我正在为伯努利分布编写一个蒙特卡罗最大似然算法。问题是我必须将数据作为参数传递给GSL最小化（一维）算法，并且还需要传递数据的大小（因为外循环是“观察”数据的不同样本大小）。所以我尝试将这些参数作为结构传递。但是，我遇到了seg错误，我确信它来自与结构相关的代码部分，并将其视为指针 [编辑：我已更正结构及其组件的分配] %%cython #!python #cython: boundscheck=False, wraparound=False,

我被困在这个练习中，没有足够的能力解决它。基本上，我正在为伯努利分布编写一个蒙特卡罗最大似然算法。问题是我必须将数据作为参数传递给GSL最小化（一维）算法，并且还需要传递数据的大小（因为外循环是“观察”数据的不同样本大小）。所以我尝试将这些参数作为结构传递。但是，我遇到了seg错误，我确信它来自与结构相关的代码部分，并将其视为指针

[编辑：我已更正结构及其组件的分配]

%%cython

#!python
#cython: boundscheck=False, wraparound=False, nonecheck=False, cdivision=True   

from libc.stdlib cimport rand, RAND_MAX, calloc, malloc, realloc, free, abort
from libc.math cimport log

#Use the CythonGSL package to get the low-level routines
from cython_gsl cimport *

######################### Define the Data Structure ############################

cdef struct Parameters:
    #Pointer for Y data array
    double* Y
    #size of the array
    int* Size

################ Support Functions for Monte-Carlo Function ##################

#Create a function that allocates the memory and verifies integrity
cdef void alloc_struct(Parameters* data, int N, unsigned int flag) nogil:

    #allocate the data array initially
    if flag==1:
        data.Y = <double*> malloc(N * sizeof(double))
    #reallocate the data array
    else:
        data.Y = <double*> realloc(data.Y, N * sizeof(double))

    #If the elements of the struct are not properly allocated, destory it and return null
    if N!=0 and data.Y==NULL:
        destroy_struct(data)
        data = NULL     

#Create the destructor of the struct to return memory to system
cdef void destroy_struct(Parameters* data) nogil:
    free(data.Y)
    free(data)

#This function fills in the Y observed variable with discreet 0/1
cdef void Y_fill(Parameters* data, double p_true, int* N) nogil:

    cdef:
        Py_ssize_t i
        double y

    for i in range(N[0]):

        y = rand()/<double>RAND_MAX

        if y <= p_true:
            data.Y[i] = 1 
        else:
            data.Y[i] = 0
#Definition of the function to be maximized: LLF of Bernoulli
cdef double LLF(double p, void* data) nogil:

    cdef:
        #the sample structure (considered the parameter here)
        Parameters* sample

        #the total of the LLF
        double Sum = 0

        #the loop iterator
        Py_ssize_t i, n

    sample = <Parameters*> data

    n = sample.Size[0]

    for i in range(n):

        Sum += sample.Y[i]*log(p) + (1-sample.Y[i])*log(1-p)

    return (-(Sum/n))

########################## Monte-Carlo Function ##############################

def Monte_Carlo(int[::1] Samples, double[:,::1] p_hat, 
                Py_ssize_t Sims, double p_true):

    #Define variables and pointers
    cdef:
        #Data Structure
        Parameters* Data

        #iterators
        Py_ssize_t i, j
        int status, GSL_CONTINUE, Iter = 0, max_Iter = 100 

        #Variables
        int N = Samples.shape[0] 
        double start_val, a, b, tol = 1e-6

        #GSL objects and pointer
        const gsl_min_fminimizer_type* T
        gsl_min_fminimizer* s
        gsl_function F

    #Set the GSL function
    F.function = &LLF

    #Allocate the minimization routine
    T = gsl_min_fminimizer_brent
    s = gsl_min_fminimizer_alloc(T)

    #allocate the struct
    Data = <Parameters*> malloc(sizeof(Parameters))

    #verify memory integrity
    if Data==NULL: abort()

    #set the starting value
    start_val = rand()/<double>RAND_MAX

    try:

        for i in range(N):

            if i==0:
                #allocate memory to the data array
                alloc_struct(Data, Samples[i], 1)
            else:
                #reallocate the data array in the struct if 
                #we are past the first run of outer loop
                alloc_struct(Data, Samples[i], 2)

            #verify memory integrity
            if Data==NULL: abort()

            #pass the data size into the struct
            Data.Size = &Samples[i]

            for j in range(Sims):

                #fill in the struct
                Y_fill(Data, p_true, Data.Size)

                #set the parameters for the GSL function (the samples)
                F.params = <void*> Data
                a = tol
                b = 1

                #set the minimizer
                gsl_min_fminimizer_set(s, &F, start_val, a, b)

                #initialize conditions
                GSL_CONTINUE = -2
                status = -2

                while (status == GSL_CONTINUE and Iter < max_Iter):

                    Iter += 1
                    status = gsl_min_fminimizer_iterate(s)

                    start_val = gsl_min_fminimizer_x_minimum(s)
                    a = gsl_min_fminimizer_x_lower(s)
                    b = gsl_min_fminimizer_x_upper(s)

                    status = gsl_min_test_interval(a, b, tol, 0.0)

                    if (status == GSL_SUCCESS):
                        print ("Converged:\n")
                        p_hat[i,j] = start_val

    finally:
        destroy_struct(Data)
        gsl_min_fminimizer_free(s)

我已经分别测试了struct分配，它可以正常工作，做它应该做的事情。但是，在Monte Carlo上运行时，内核会被中止调用（根据Mac上的输出）杀死，而我的控制台上的Jupyter输出如下所示：

gsl: fsolver.c:39: ERROR: computed function value is infinite or NaN

已调用默认GSL错误处理程序

现在，解算器似乎不起作用。我不熟悉GSL包，只使用过一次它从gumbel发行版生成随机数（绕过scipy命令）

我将非常感谢在这方面的任何帮助！谢谢

[编辑：更改a的下限]

用指数分布重做练习，其对数似然函数只包含一个对数，我已经用

gsl\u min\u fminimizer\u集

解决了这个问题，最初在a的下限为0时进行评估，得到-INF结果（因为它在解决问题之前对问题进行评估，以生成f（下限），f（上限）其中f是我的优化功能）。当我将下限设置为非0但非常小的值时（比如我定义的公差的

tol

变量），解算算法会工作并产生正确的结果

非常感谢@DavidW给我的提示，让我到达我需要去的地方

这是一个有点推测性的答案，因为我没有安装GSL，所以很难测试它（如果它错了，请道歉！）

我认为问题在于线路

Sum += sample.Y[i]*log(p) + (1-sample.Y[i])*log(1-p)

看起来

Y[i]

可以是0或1。当

位于范围0-1的任一端时，它给出

0*-inf=nan

。如果只有所有Y相同，则该点为最小值（因此解算器将可靠地结束于无效点）。幸运的是，您应该能够重写该行以避免得到

nan

：

if sample.Y[i]:
   Sum += log(p)
else:
   Sum += log(1-p)

（将生成

nan

的情况是未执行的情况）

我发现了第二个小问题：在

alloc_struct

中，如果出现错误，您可以执行

data=NULL

。这只会影响本地指针，因此在

Monte_Carlo

中对

NULL

的测试毫无意义。您最好从

alloc\u struct

返回一个true或false标志并进行检查。我怀疑你是否犯了这个错误

编辑：另一个更好的选择是从分析角度找到最小值：

A log（p）+（1-A）log（1-p）

的导数是

A/p-（1-A）/（1-p）

。平均所有

样本.Y

s以查找

。找到导数为0的位置将给出

p=A

。（你要仔细检查我的工作！）。有了它，您可以避免使用GSL最小化例程。

您的基本问题是，在函数

Monte_Carlo

中，您永远不会

malloc

数据

，因此您最终使用的指针不会指向任何东西。我不认为这太难修复，但对我来说，其他人设置和测试它看起来并不容易…@DavidW完全正确。我重新构造代码来分配结构，然后实现函数来分配结构的元素并释放内存。结构本身就可以工作（我已经更新了代码来解释它）。但是，当我尝试运行cython代码时，内核会死掉，说调用了abort（如果结构或其任何元素没有正确分配，我会使用abort）。当我删除内存分配的验证时，代码就会在无限解上从GSL错误中杀死内核。很可能，同样是由于分配结构的问题。在代码的这一部分，“gsl_min_fminimizer_set（s，&F，start_val，a，b）”中，如何确保0，我用a=0和b=1约束间隔（或者至少，我认为它是这样做的）。我现在将分别研究代码的GSL部分的这一部分，并进行一个简单得多的练习，以更好地理解这些例程是如何工作的。事实证明，中止调用不再来自结构分配，因为这是固定的，而是因为GSL例程无法找到有界解。我认为这不是

gal\u min\u fminimizer\u集

例程中a，b的函数。我将把p转换为在0和1的范围内。谢谢你的回复！！是的，我相信问题本质上是p上有界约束的可能性，正如你之前指出的。我在考虑转换变量，使其介于0之间是的，你的分析导数是正确的，是伯努利分布的最大似然估计量。我的目标是学习如何使用GSL解算器，这就是为什么我尝试用数值方法进行解算，然后与解析解进行比较：-）

if sample.Y[i]:
   Sum += log(p)
else:
   Sum += log(1-p)