Python 将多个Keras TimeseriesGenerator对象合并或附加到一个_Python_Tensorflow_Keras_Lstm

Python 将多个Keras TimeseriesGenerator对象合并或附加到一个

python tensorflow keras

Python 将多个Keras TimeseriesGenerator对象合并或附加到一个,python,tensorflow,keras,lstm,Python,Tensorflow,Keras,Lstm,我正在尝试制作一个LSTM模型。数据来自包含多个股票值的csv文件我不能使用文件中显示的所有行生成序列，因为每个序列仅在其自身股票的上下文中相关，因此我需要为每个股票选择行并基于此生成序列我有这样的想法： for stock in stocks: stock_df = df.loc[(df['symbol'] == stock)].copy() target = stock_df.pop('price') x = np.array(stock_df.values)

我正在尝试制作一个LSTM模型。数据来自包含多个股票值的csv文件

我不能使用文件中显示的所有行生成序列，因为每个序列仅在其自身股票的上下文中相关，因此我需要为每个股票选择行并基于此生成序列

我有这样的想法：

for stock in stocks:

    stock_df = df.loc[(df['symbol'] == stock)].copy()
    target = stock_df.pop('price')

    x = np.array(stock_df.values)
    y = np.array(target.values)

    sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)

class seq_generator():

  def __init__(self, list_of_filepaths):
    self.usedDict = dict()
    for path in list_of_filepaths:
      self.usedDict[path] = []

  def generate(self):
    while True: 
      path = np.random.choice(list(self.usedDict.keys()))
      stock_array = np.load(path) 
      random_sequence = np.random.randint(stock_array.shape[0])
      if random_sequence not in self.usedDict[path]:
        self.usedDict[path].append(random_sequence)
        yield stock_array[random_sequence, :, :]

train_generator = seq_generator(list_of_filepaths)

train_dataset = tf.data.Dataset.from_generator(seq_generator.generate(),
                                               output_types=(tf.float32, tf.float32), 
                                               output_shapes=(n_timesteps, n_features)) 

train_dataset = train_dataset.batch(batch_size)

这很好，但是我想把每个序列合并成一个更大的序列，我将用于训练，其中包含所有股票的数据

无法使用append或merge，因为函数返回的是生成器对象，而不是numpy数组。

对于该场景，您希望将这些序列合并为一个较大的序列，其中包含所有股票的数据，并将用于培训

stock_timegenerators = [] for stock in stocks: stock_df = stock.copy() features = stock_df.pop('symbol') target = stock_df.pop('price') x = np.array(stock_df.values) y = np.array(target.values) # sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1) stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))

[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>, <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>, <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
您可以将创建的TimeSeriesGenerators追加到Python列表中

stock_timegenerators = [] for stock in stocks: stock_df = stock.copy() features = stock_df.pop('symbol') target = stock_df.pop('price') x = np.array(stock_df.values) y = np.array(target.values) # sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1) stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))

[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>, <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>, <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
此操作的输出将是一个附加的TimeSeriesGenerator，您可以通过迭代列表或引用索引来使用它

stock_timegenerators = [] for stock in stocks: stock_df = stock.copy() features = stock_df.pop('symbol') target = stock_df.pop('price') x = np.array(stock_df.values) y = np.array(target.values) # sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1) stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))

[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>, <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>, <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
这将输出附加的模型列表，并使用索引方便地引用这些模型

[<tensorflow.python.keras.engine.sequential.Sequential at 0x7eff62c7b748>, <tensorflow.python.keras.engine.sequential.Sequential at 0x7eff6100e160>, <tensorflow.python.keras.engine.sequential.Sequential at 0x7eff63dc94a8>]

[,， , ]
通过这种方式，您可以创建多个LSTM模型，这些模型对于不同的股票具有不同的时间序列生成器

stock_timegenerators = [] for stock in stocks: stock_df = stock.copy() features = stock_df.pop('symbol') target = stock_df.pop('price') x = np.array(stock_df.values) y = np.array(target.values) # sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1) stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))

[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>, <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>, <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
希望这对您有所帮助。
编辑：新答案：
因此，我最终要做的是手动执行所有预处理，并为每个包含预处理序列的股票保存一个.npy文件，然后使用手动创建的生成器生成如下批处理：

for stock in stocks: stock_df = df.loc[(df['symbol'] == stock)].copy() target = stock_df.pop('price') x = np.array(stock_df.values) y = np.array(target.values) sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)

class seq_generator(): def __init__(self, list_of_filepaths): self.usedDict = dict() for path in list_of_filepaths: self.usedDict[path] = [] def generate(self): while True: path = np.random.choice(list(self.usedDict.keys())) stock_array = np.load(path) random_sequence = np.random.randint(stock_array.shape[0]) if random_sequence not in self.usedDict[path]: self.usedDict[path].append(random_sequence) yield stock_array[random_sequence, :, :] train_generator = seq_generator(list_of_filepaths) train_dataset = tf.data.Dataset.from_generator(seq_generator.generate(), output_types=(tf.float32, tf.float32), output_shapes=(n_timesteps, n_features)) train_dataset = train_dataset.batch(batch_size)
其中，
文件路径列表
只是预处理.npy数据的路径列表

这将：

加载随机股票的预处理.npy数据

随机挑选一个序列

检查序列索引是否已在
useddit

如果没有：

将该序列的索引附加到
useddit
以跟踪，从而不向模型提供两次相同的数据

产生序列

这意味着生成器将在每次“调用”时从随机库存中馈送一个唯一的序列，使我能够使用Tensorflows类型中的
.from_生成器（）
和
.batch（）
方法

原始答案： 我认为@TF_Support的答案有点漏洞百出。如果我理解你的问题，那就不是说你想训练一个模型的股票，你想在整个数据集上训练一个模型

stock_timegenerators = [] for stock in stocks: stock_df = stock.copy() features = stock_df.pop('symbol') target = stock_df.pop('price') x = np.array(stock_df.values) y = np.array(target.values) # sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1) stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))

[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>, <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>, <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
如果有足够的内存，可以手动创建序列并将整个数据集保存在内存中。我面临的问题与此类似，我无法将所有内容都保存在内存中：

相反，我正在探索单独预处理每个股票的所有数据的可能性，保存为.npy文件，然后使用生成器加载这些.npy文件的随机样本以将数据批处理到模型中，但我还不完全确定如何实现这一点。
这与我最后所做的类似，因此我将其标记为正确。我最终编写了自己的生成器，因为它似乎没有办法在这个用例中使用TimeseriesGenerator（）。你是对的，我想做的是使用整个数据集进行培训。