Python 正确使用scipy.optimize.fmin_bfgs所需代码与R代码
我习惯于用R和python对所有外围任务进行统计。只是为了好玩,我尝试了一次BFGS优化,将其与普通LS结果进行比较——都是在python中使用scipy/numpy。但结果并不相符。我没有看到任何错误。我还附加了R中的等效代码(有效)。有人能纠正我对scipy.optimize.fmin_bfgs的使用以匹配OLS或R结果吗Python 正确使用scipy.optimize.fmin_bfgs所需代码与R代码,python,r,numpy,scipy,mathematical-optimization,Python,R,Numpy,Scipy,Mathematical Optimization,我习惯于用R和python对所有外围任务进行统计。只是为了好玩,我尝试了一次BFGS优化,将其与普通LS结果进行比较——都是在python中使用scipy/numpy。但结果并不相符。我没有看到任何错误。我还附加了R中的等效代码(有效)。有人能纠正我对scipy.optimize.fmin_bfgs的使用以匹配OLS或R结果吗 import csv import numpy as np import scipy as sp from scipy import optimize class Da
import csv
import numpy as np
import scipy as sp
from scipy import optimize
class DataLine:
def __init__(self,row):
self.Y = row[0]
self.X = [1.0] + row[2:len(row)]
# 'Intercept','Food','Decor', 'Service', 'Price' and remove the name
def allDataLine(self):
return self.X + list(self.Y) # return operator.add(self.X,list(self.Y))
def xData(self):
return np.array(self.X,dtype="float64")
def yData(self):
return np.array([self.Y],dtype="float64")
def fnRSS(vBeta, vY, mX):
return np.sum((vY - np.dot(mX,vBeta))**2)
if __name__ == "__main__":
urlSheatherData = "/Hans/workspace/optimsGLMs/MichelinNY.csv"
# downloaded from "http://www.stat.tamu.edu/~sheather/book/docs/datasets/MichelinNY.csv"
reader = csv.reader(open(urlSheatherData), delimiter=',', quotechar='"')
headerTuple = tuple(reader.next())
dataLines = map(DataLine, reader)
Ys = map(DataLine.yData,dataLines)
Xs = map(DataLine.xData,dataLines)
# a check and an initial guess ...
vBeta = np.array([-1.5, 0.06, 0.04,-0.01, 0.002]).reshape(5,1)
print np.sum((Ys-np.dot(Xs,vBeta))**2)
print fnRSS(vBeta,Ys,Xs)
lsBetas = np.linalg.lstsq(Xs, Ys)
print lsBetas[1]
# prints the right numbers
print lsBetas[0]
optimizedBetas = sp.optimize.fmin_bfgs(fnRSS, x0=vBeta, args=(Ys,Xs))
# completely off ..
print optimizedBetas
优化的结果是:
Optimization terminated successfully.
Current function value: 6660.000006
Iterations: 276
Function evaluations: 448
[ 4.51296549e-01 -5.64005114e-06 -3.36618459e-06 4.98821735e-06
9.62197362e-08]
但它确实应该与lsBetas=np.linalg.lstsq(Xs,Ys)中获得的OLS结果相匹配:
以下是R代码,以防有用(它还具有能够直接从URL读取的优点):
首先,让我们从列表中生成数组:
>>> Xs = np.vstack(Xs)
>>> Ys = np.vStack(Ys)
然后,fnRSS
被错误地翻译,它的参数beta被转置。可以用
>>> def fnRSS(vBeta, vY, vX):
... return np.sum((vY.T - np.dot(vX, vBeta))**2)
最终结果:
>>> sp.optimize.fmin_bfgs(fnRSS, x0=vBeta, args=(Ys,Xs))
Optimization terminated successfully.
Current function value: 26.323906
Iterations: 9
Function evaluations: 98
Gradient evaluations: 14
array([-1.49208546, 0.05773327, 0.04419307, -0.01117645, 0.00179791])
SeNeNoT,考虑使用大熊猫解析器或NoMPY或<代码> ReCurrMcSv将CSV数据读入数组,而不是自定义编写的解析器。从url读取也没有问题:
>>> import pandas as pd
>>> urlSheatherData = "http://www.stat.tamu.edu/~sheather/book/docs/datasets/MichelinNY.csv"
>>> data = pd.read_csv(urlSheatherData)
>>> data[['Service','Decor', 'Food', 'Price']].head()
Service Decor Food Price
0 19 20 19 50
1 16 17 17 43
2 21 17 23 35
3 16 23 19 52
4 19 12 23 24
[5 rows x 4 columns]
>>> data['InMichelin'].head()
0 0
1 0
2 0
3 1
4 0
Name: InMichelin, dtype: int64
顺便问一下,您的模块版本和系统架构是什么?我没有成功地在所有可用的组合上使用您的代码进行优化(实际上有很多)。所有eneded
警告:由于精度损失,不一定会出现期望的错误。
我在Mac OSX 10.6.8上使用的是Enthound Canopy Python 2.7.3 | 64位,在numpy上使用的是1.7.1,在scipy上使用的是0.12.0。是相关的,基本上表明问题在于vBeta的强制转换。感谢csv提示,这很有用。
>>> sp.optimize.fmin_bfgs(fnRSS, x0=vBeta, args=(Ys,Xs))
Optimization terminated successfully.
Current function value: 26.323906
Iterations: 9
Function evaluations: 98
Gradient evaluations: 14
array([-1.49208546, 0.05773327, 0.04419307, -0.01117645, 0.00179791])
>>> import pandas as pd
>>> urlSheatherData = "http://www.stat.tamu.edu/~sheather/book/docs/datasets/MichelinNY.csv"
>>> data = pd.read_csv(urlSheatherData)
>>> data[['Service','Decor', 'Food', 'Price']].head()
Service Decor Food Price
0 19 20 19 50
1 16 17 17 43
2 21 17 23 35
3 16 23 19 52
4 19 12 23 24
[5 rows x 4 columns]
>>> data['InMichelin'].head()
0 0
1 0
2 0
3 1
4 0
Name: InMichelin, dtype: int64