Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 正确使用scipy.optimize.fmin_bfgs所需代码与R代码_Python_R_Numpy_Scipy_Mathematical Optimization - Fatal编程技术网

Python 正确使用scipy.optimize.fmin_bfgs所需代码与R代码

Python 正确使用scipy.optimize.fmin_bfgs所需代码与R代码,python,r,numpy,scipy,mathematical-optimization,Python,R,Numpy,Scipy,Mathematical Optimization,我习惯于用R和python对所有外围任务进行统计。只是为了好玩,我尝试了一次BFGS优化,将其与普通LS结果进行比较——都是在python中使用scipy/numpy。但结果并不相符。我没有看到任何错误。我还附加了R中的等效代码(有效)。有人能纠正我对scipy.optimize.fmin_bfgs的使用以匹配OLS或R结果吗 import csv import numpy as np import scipy as sp from scipy import optimize class Da

我习惯于用R和python对所有外围任务进行统计。只是为了好玩,我尝试了一次BFGS优化,将其与普通LS结果进行比较——都是在python中使用scipy/numpy。但结果并不相符。我没有看到任何错误。我还附加了R中的等效代码(有效)。有人能纠正我对scipy.optimize.fmin_bfgs的使用以匹配OLS或R结果吗

import csv
import numpy as np
import scipy as sp
from scipy import optimize

class DataLine:
    def __init__(self,row):
        self.Y = row[0]
        self.X = [1.0] + row[2:len(row)]  
        # 'Intercept','Food','Decor', 'Service', 'Price' and remove the name
    def allDataLine(self):
        return self.X + list(self.Y) # return operator.add(self.X,list(self.Y))
    def xData(self):
        return np.array(self.X,dtype="float64")
    def yData(self):
        return np.array([self.Y],dtype="float64")
def fnRSS(vBeta, vY, mX):
  return np.sum((vY - np.dot(mX,vBeta))**2)
if __name__ == "__main__":
    urlSheatherData = "/Hans/workspace/optimsGLMs/MichelinNY.csv"
    # downloaded from "http://www.stat.tamu.edu/~sheather/book/docs/datasets/MichelinNY.csv"
    reader = csv.reader(open(urlSheatherData), delimiter=',', quotechar='"')
    headerTuple = tuple(reader.next())
    dataLines = map(DataLine, reader)
    Ys = map(DataLine.yData,dataLines)
    Xs = map(DataLine.xData,dataLines)
    # a check and an initial guess ...
    vBeta = np.array([-1.5, 0.06, 0.04,-0.01, 0.002]).reshape(5,1)
    print np.sum((Ys-np.dot(Xs,vBeta))**2)
    print fnRSS(vBeta,Ys,Xs)
    lsBetas = np.linalg.lstsq(Xs, Ys)
    print lsBetas[1]
    # prints the right numbers
    print lsBetas[0]
    optimizedBetas = sp.optimize.fmin_bfgs(fnRSS, x0=vBeta, args=(Ys,Xs))
    # completely off .. 
    print optimizedBetas
优化的结果是:

Optimization terminated successfully.
         Current function value: 6660.000006
         Iterations: 276
         Function evaluations: 448
[  4.51296549e-01  -5.64005114e-06  -3.36618459e-06   4.98821735e-06
   9.62197362e-08]
但它确实应该与lsBetas=np.linalg.lstsq(Xs,Ys)中获得的OLS结果相匹配:

以下是R代码,以防有用(它还具有能够直接从URL读取的优点):


首先,让我们从列表中生成数组:

>>> Xs = np.vstack(Xs)
>>> Ys = np.vStack(Ys)
然后,
fnRSS
被错误地翻译,它的参数beta被转置。可以用

>>> def fnRSS(vBeta, vY, vX):
...     return np.sum((vY.T - np.dot(vX, vBeta))**2)
最终结果:

>>> sp.optimize.fmin_bfgs(fnRSS, x0=vBeta, args=(Ys,Xs))
Optimization terminated successfully.
         Current function value: 26.323906
         Iterations: 9
         Function evaluations: 98
         Gradient evaluations: 14
array([-1.49208546,  0.05773327,  0.04419307, -0.01117645,  0.00179791])

SeNeNoT,考虑使用大熊猫解析器或NoMPY或<代码> ReCurrMcSv将CSV数据读入数组,而不是自定义编写的解析器。从url读取也没有问题:

>>> import pandas as pd
>>> urlSheatherData = "http://www.stat.tamu.edu/~sheather/book/docs/datasets/MichelinNY.csv"
>>> data = pd.read_csv(urlSheatherData)
>>> data[['Service','Decor', 'Food', 'Price']].head()
   Service  Decor  Food  Price
0       19     20    19     50
1       16     17    17     43
2       21     17    23     35
3       16     23    19     52
4       19     12    23     24

[5 rows x 4 columns]
>>> data['InMichelin'].head()
0    0
1    0
2    0
3    1
4    0
Name: InMichelin, dtype: int64

顺便问一下,您的模块版本和系统架构是什么?我没有成功地在所有可用的组合上使用您的代码进行优化(实际上有很多)。所有eneded
警告:由于精度损失,不一定会出现期望的错误。
我在Mac OSX 10.6.8上使用的是Enthound Canopy Python 2.7.3 | 64位,在numpy上使用的是1.7.1,在scipy上使用的是0.12.0。是相关的,基本上表明问题在于vBeta的强制转换。感谢csv提示,这很有用。
>>> sp.optimize.fmin_bfgs(fnRSS, x0=vBeta, args=(Ys,Xs))
Optimization terminated successfully.
         Current function value: 26.323906
         Iterations: 9
         Function evaluations: 98
         Gradient evaluations: 14
array([-1.49208546,  0.05773327,  0.04419307, -0.01117645,  0.00179791])
>>> import pandas as pd
>>> urlSheatherData = "http://www.stat.tamu.edu/~sheather/book/docs/datasets/MichelinNY.csv"
>>> data = pd.read_csv(urlSheatherData)
>>> data[['Service','Decor', 'Food', 'Price']].head()
   Service  Decor  Food  Price
0       19     20    19     50
1       16     17    17     43
2       21     17    23     35
3       16     23    19     52
4       19     12    23     24

[5 rows x 4 columns]
>>> data['InMichelin'].head()
0    0
1    0
2    0
3    1
4    0
Name: InMichelin, dtype: int64