Python 多重预测_Python_Datetime_Scikit Learn_Prediction

Python 多重预测

python datetime scikit-learn

Python 多重预测,python,datetime,scikit-learn,prediction,Python,Datetime,Scikit Learn,Prediction,我有一个df，我需要预测未来7天内每天的因变量（数值）。列车数据如下： df.head() Date X1 X2 X3 Y 2004-11-20 453.0 654 989 716 # row 1 2004-11-21 716.0 878 886 605 20

我有一个df，我需要预测未来7天内每天的因变量（数值）。

列车

数据如下：

df.head()
Date                   X1                X2             X3    Y
2004-11-20          453.0               654            989  716   # row 1
2004-11-21          716.0               878            886  605
2004-11-22          605.0               433            775  555
2004-11-23          555.0               453            564  680
2004-11-24          680.0               645            734  713

具体而言，对于第1行中的日期

2004-11-20

，我需要一个

预测值，用于接下来7天的每一天，而不仅仅是今天（变量

），考虑到要预测从2004-11-20开始的第5天，我不会有从2004-11-20开始的接下来4天的数据

我一直在考虑再创建7个变量（

“Y+1day”

，

“Y+2day

”等等），但我需要为每天创建一个训练df，因为机器学习技术只返回一个变量作为输出。有没有更简单的方法

我正在使用skikit学习库进行建模。

您完全可以在

sklearn

中训练模型以预测多个输出。熊猫非常灵活。在下面的示例中，我将日期列转换为日期时间索引，然后使用

shift

实用程序获取更多Y值

import io
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# Read from stackoverflow artifacts
s = """Date  X1  X2   X3   Y
2004-11-20          453.0               654            989  716  
2004-11-21          716.0               878            886  605
2004-11-22          605.0               433            775  555
2004-11-23          555.0               453            564  680
2004-11-24          680.0               645            734  713"""
text = io.StringIO(s)
df = pd.read_csv(text, sep='\\s+')

# Datetime index
df["Date"] = pd.to_datetime(df["Date"], format="%Y/%m/%d")
df = df.set_index("Date")

# Shifting for Y@Day+N   
df['Y1'] = df.shift(1)['Y'] # One day later
df['Y2'] = df.shift(2)['Y'] # Two...

当我们使用shift时，我们必须估算或删除结果的NaN。在大型数据集中，这可能只会导致时间范围边缘的插补或删除数据。例如，如果您希望移动7天，则数据集将丢失7天，具体取决于数据的结构和移动方式

df.dropna(inplace=True) # Drop two rows

train, test = train_test_split(df)
# Get two training rows
trainX = train.drop(["Y", "Y1", "Y2"], axis=1)
trainY = train.drop(["X1", "X2", "X3"], axis=1)

# Get the test row
X = test.drop(["Y", "Y1", "Y2"], axis=1)
Y = test.drop(["X1", "X2", "X3"], axis=1)

现在我们可以从sklearn实例化一个分类器并进行预测

from sklearn.linear_model import LinearRegression

clf = LinearRegression()
model = clf.fit(trainX, trainY)
model.predict(X) # Array of three numbers
model.score(X, Y) # Predictably abysmal score

使用sklearn版本

0.20.1

，这些都运行得很好。当然，我从中得到了一个糟糕的评分结果，但是模型确实进行了训练，预测方法确实为每个Y列返回了一个预测，评分方法返回了一个评分。

您使用什么库进行建模？（例如sklearn、keras、statsmodels）我正在使用sklearn（后期编辑）