Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/294.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python sklearn LinearRegression.Predict()问题_Python_Scikit Learn_Linear Regression_Predict - Fatal编程技术网

Python sklearn LinearRegression.Predict()问题

Python sklearn LinearRegression.Predict()问题,python,scikit-learn,linear-regression,predict,Python,Scikit Learn,Linear Regression,Predict,我试图根据各种其他因素预测呼叫中心的呼叫量。我有一个相当干净的数据集,也相当小,但足够了。我能够训练和测试历史数据,并获得分数、总结等。我一生都无法弄清楚如何使用预测因子数据预测未来通话。我的数据如下: Date DayNum factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 factor9 VariableToPredict 9/17/2014 1 592 83686.46 0 0 25

我试图根据各种其他因素预测呼叫中心的呼叫量。我有一个相当干净的数据集,也相当小,但足够了。我能够训练和测试历史数据,并获得分数、总结等。我一生都无法弄清楚如何使用预测因子数据预测未来通话。我的数据如下:

Date    DayNum  factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 factor9 VariableToPredict
9/17/2014   1   592 83686.46    0   0   250 15911.8 832 99598.26    177514  72
9/18/2014   2   1044    79030.09    0   0   203 23880.55    1238    102910.64   205064  274
9/19/2014   3   707 84207.27    0   0   180 8143.32 877 92350.59    156360  254
9/20/2014   4   707 97577.78    0   0   194 16688.95    891 114266.73   196526  208
9/21/2014   5   565 83084.57    0   0   153 13097.04    713 96181.61    143678  270
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.linear_model import LinearRegression
import pandas as pd

d = pd.read_csv("H://My Documents//Python Scripts//RawData//Q2917.csv", "r", delimiter=",")
e = pd.read_csv("H://My Documents//Python Scripts//RawData//FY16q2917Test.csv", "r", delimiter=",")
#print(d)
#b = pd.DataFrame.as_matrix(d)
#print(b)
x = d.as_matrix(['factor2', 'factor4', 'factor5', 'factor6'])    
y = d.as_matrix(['VariableToPredict'])
x1 = e.as_matrix(['factor2', 'factor4', 'factor5', 'factor6'])
y1 = e.as_matrix(['VariableToPredict'])
#print(len(train))
#print(target)
#use scaler
scalerX = StandardScaler()
train = scalerX.fit_transform(x1)
scalerY = StandardScaler()
target = scalerY.fit_transform(y1)

clf = LinearRegression(fit_intercept=True)
cv = KFold(len(train), 10, shuffle=True, random_state=33)


#decf = LinearRegression.decision_function(train, target)
test = LinearRegression.predict(train, target)
score = cross_val_score(clf,train, target,cv=cv )

print("Score: {}".format(score.mean()))
我目前掌握的代码如下:

Date    DayNum  factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 factor9 VariableToPredict
9/17/2014   1   592 83686.46    0   0   250 15911.8 832 99598.26    177514  72
9/18/2014   2   1044    79030.09    0   0   203 23880.55    1238    102910.64   205064  274
9/19/2014   3   707 84207.27    0   0   180 8143.32 877 92350.59    156360  254
9/20/2014   4   707 97577.78    0   0   194 16688.95    891 114266.73   196526  208
9/21/2014   5   565 83084.57    0   0   153 13097.04    713 96181.61    143678  270
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.linear_model import LinearRegression
import pandas as pd

d = pd.read_csv("H://My Documents//Python Scripts//RawData//Q2917.csv", "r", delimiter=",")
e = pd.read_csv("H://My Documents//Python Scripts//RawData//FY16q2917Test.csv", "r", delimiter=",")
#print(d)
#b = pd.DataFrame.as_matrix(d)
#print(b)
x = d.as_matrix(['factor2', 'factor4', 'factor5', 'factor6'])    
y = d.as_matrix(['VariableToPredict'])
x1 = e.as_matrix(['factor2', 'factor4', 'factor5', 'factor6'])
y1 = e.as_matrix(['VariableToPredict'])
#print(len(train))
#print(target)
#use scaler
scalerX = StandardScaler()
train = scalerX.fit_transform(x1)
scalerY = StandardScaler()
target = scalerY.fit_transform(y1)

clf = LinearRegression(fit_intercept=True)
cv = KFold(len(train), 10, shuffle=True, random_state=33)


#decf = LinearRegression.decision_function(train, target)
test = LinearRegression.predict(train, target)
score = cross_val_score(clf,train, target,cv=cv )

print("Score: {}".format(score.mean()))
这当然给了我一个错误,y值中有空值,这是因为它是空的,我试图预测它。这里的问题是,我对python还不够熟悉,根本上我误解了应该如何构建它。即使它是这样工作的,它也不正确,在建立预测未来的模型时,它没有考虑到过去的数据。
我是否需要将这些文件放在同一个文件中?如果是这样的话,我如何告诉它从行A到行B来考虑这3个列,预测同一行的依赖列,然后应用该模型来分析未来数据的这三个列,并预测未来的调用。我不希望这里有完整的答案,这是我的工作,但任何小线索都将不胜感激。

为了建立回归模型,您需要训练数据和训练分数。这些允许您为问题拟合一组回归参数

然后要预测,你需要的是预测数据,而不是预测分数,因为你没有这些——你在试图预测它们

例如,下面的代码将运行:

from sklearn.linear_model import LinearRegression
import numpy as np

trainingData = np.array([ [2.3,4.3,2.5], [1.3,5.2,5.2], [3.3,2.9,0.8], [3.1,4.3,4.0]  ])
trainingScores = np.array([3.4,7.5,4.5,1.6])

clf = LinearRegression(fit_intercept=True)
clf.fit(trainingData,trainingScores)

predictionData = np.array([ [2.5,2.4,2.7], [2.7,3.2,1.2] ])
clf.predict(predictionData)
看起来您在
predict()
调用中输入了错误数量的参数-请看一下我的代码片段,您应该能够找到如何更改它的方法


出于兴趣,您可以在之后运行以下行以访问回归适合数据的参数:
print repr(clf.coef)

我应该澄清,上面的数据只是一个片段。我不是试图基于几行进行预测。对于训练数据,您需要您想要预测的变量。这就是问题吗?不完全是。对于过去的数据行,我想训练回归模型,因为我有自变量和因变量的实际数据。然后,使用其余行中的自变量(预测的未来值)来预测未来的因变量。问题是什么?
predict
函数不接受目标。你的代码中有几个错误,也许这就是你的问题?看看这些例子。您需要实例化线性回归模型,调用clf.fit(train,target),然后调用clf.predict(test)。