Python 数据帧分割_Python_Pandas_Numpy

Python 数据帧分割

python pandas numpy

Python 数据帧分割,python,pandas,numpy,Python,Pandas,Numpy,我想通过移动数据将时间序列数据拆分为X和y。虚拟数据帧看起来像： i、 e.如果时间步长等于2，X和y看起来像：X=[3,0]->y=[5] X=[0,5]->y=[7]（这应该应用于整个样本（行））我编写了下面的函数，但是当我将数据帧传递给函数时，它返回空矩阵 def create_dataset(dataset, time_step=1): dataX, dataY = [], [] for i in range (len(dataset)-time_step-1): a = d

我想通过移动数据将时间序列数据拆分为X和y。虚拟数据帧看起来像：

i、 e.如果时间步长等于2，X和y看起来像：X=[3,0]->y=[5]

X=[0,5]->y=[7]（这应该应用于整个样本（行））

我编写了下面的函数，但是当我将数据帧传递给函数时，它返回空矩阵

def create_dataset(dataset, time_step=1):
dataX, dataY = [], []
for i in range (len(dataset)-time_step-1):
    a = dataset.iloc[:,i:(i+time_step)]
    dataX.append(a)
    dataY.append(dataset.iloc[:, i + time_step ])
return np.array(dataX), np.array(dataY)

感谢您提供的任何解决方案。

您的意思是这样的：

df = df.shift(periods=-2, axis='columns')

# you can also pass a fill values parameter
df = df.shift(periods=-2, axis='columns', fill_value = 0)

下面是一个复制示例IIUC的示例：

import pandas as pd

# function to process each row
def process_row(s):
    assert isinstance(s, pd.Series)
    return pd.concat([
        s.rename('timestep'),
        s.shift(-1).rename('x_1'),
        s.shift(-2).rename('x_2'),
        s.shift(-3).rename('y')
    ], axis=1).dropna(how='any', axis=0).astype(int)

# test case for the example
process_row( pd.Series([2, 3, 0, 5, 6]) )

# type in first two rows of the data frame
df = pd.DataFrame(
    {'x-2': [3, 2], 'x-1': [0, 3], 
     'x0': [5, 0], 'x1': [7, 5], 'x2': [1, 6]})

# perform the transformation
ts = list()

for idx, row in df.iterrows():
    t = process_row(row)
    t.index = [idx] * t.index.size
    ts.append(t)
    
print(pd.concat(ts))

# results
   timestep  x_1  x_2  y
0         3    0    5  7
0         0    5    7  1
1         2    3    0  5   <-- first part of expected results
1         3    0    5  6   <-- second part

将熊猫作为pd导入
#函数来处理每一行
def处理行：
断言isinstance（s，pd.系列）
返回pd.concat([
s、 重命名（'timestep'），
s、 移位（-1）。重命名（'x_1'），
s、 移位（-2）。重命名（'x_2'），
s、 移位（-3）。重命名（'y'）
]，axis=1）.dropna（how='any'，axis=0）.astype（int）
#示例的测试用例
流程（pd.系列（[2,3,0,5,6]））
#输入数据框的前两行
df=pd.DataFrame(
{'x-2'：[3,2]，'x-1'：[0,3]，
‘x0’：[5,0]，‘x1’：[7,5]，‘x2’：[1,6]}）
#执行转换
ts=列表（）
对于idx，df.iterrows（）中的行：
t=工艺流程\行（行）
t、 index=[idx]*t.index.size
ts.append（t）
打印（局部浓度（ts））
#结果
时间步长x_1 x_2 y
0         3    0    5  7
0         0    5    7  1
12305不太可能。我的任务是预测数据序列。所以我决定使用LSTM，现在考虑到特征的顺序（X-2，X-1，…），我想将我的数据划分为X_序列，y_序列，X_测试和y_测试