Python 将多个Keras TimeseriesGenerator对象合并或附加到一个

Python 将多个Keras TimeseriesGenerator对象合并或附加到一个,python,tensorflow,keras,lstm,Python,Tensorflow,Keras,Lstm,我正在尝试制作一个LSTM模型。数据来自包含多个股票值的csv文件 我不能使用文件中显示的所有行生成序列,因为每个序列仅在其自身股票的上下文中相关,因此我需要为每个股票选择行并基于此生成序列 我有这样的想法: for stock in stocks: stock_df = df.loc[(df['symbol'] == stock)].copy() target = stock_df.pop('price') x = np.array(stock_df.values)

我正在尝试制作一个LSTM模型。数据来自包含多个股票值的csv文件

我不能使用文件中显示的所有行生成序列,因为每个序列仅在其自身股票的上下文中相关,因此我需要为每个股票选择行并基于此生成序列

我有这样的想法:

for stock in stocks:

    stock_df = df.loc[(df['symbol'] == stock)].copy()
    target = stock_df.pop('price')

    x = np.array(stock_df.values)
    y = np.array(target.values)

    sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
class seq_generator():

  def __init__(self, list_of_filepaths):
    self.usedDict = dict()
    for path in list_of_filepaths:
      self.usedDict[path] = []

  def generate(self):
    while True: 
      path = np.random.choice(list(self.usedDict.keys()))
      stock_array = np.load(path) 
      random_sequence = np.random.randint(stock_array.shape[0])
      if random_sequence not in self.usedDict[path]:
        self.usedDict[path].append(random_sequence)
        yield stock_array[random_sequence, :, :]

train_generator = seq_generator(list_of_filepaths)

train_dataset = tf.data.Dataset.from_generator(seq_generator.generate(),
                                               output_types=(tf.float32, tf.float32), 
                                               output_shapes=(n_timesteps, n_features)) 

train_dataset = train_dataset.batch(batch_size)
这很好,但是我想把每个序列合并成一个更大的序列,我将用于训练,其中包含所有股票的数据


无法使用append或merge,因为函数返回的是生成器对象,而不是numpy数组。

对于该场景,您希望将这些序列合并为一个较大的序列,其中包含所有股票的数据,并将用于培训

stock_timegenerators = []
for stock in stocks:
    stock_df = stock.copy()
    features = stock_df.pop('symbol')
    target = stock_df.pop('price')
  
    x = np.array(stock_df.values)
    y = np.array(target.values)

    # sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
    stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))
[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>,
 <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>,
 <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
您可以将创建的TimeSeriesGenerators追加到Python列表中

stock_timegenerators = []
for stock in stocks:
    stock_df = stock.copy()
    features = stock_df.pop('symbol')
    target = stock_df.pop('price')
  
    x = np.array(stock_df.values)
    y = np.array(target.values)

    # sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
    stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))
[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>,
 <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>,
 <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
此操作的输出将是一个附加的TimeSeriesGenerator,您可以通过迭代列表引用索引来使用它

stock_timegenerators = []
for stock in stocks:
    stock_df = stock.copy()
    features = stock_df.pop('symbol')
    target = stock_df.pop('price')
  
    x = np.array(stock_df.values)
    y = np.array(target.values)

    # sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
    stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))
[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>,
 <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>,
 <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
这将输出附加的模型列表,并使用索引方便地引用这些模型

[<tensorflow.python.keras.engine.sequential.Sequential at 0x7eff62c7b748>,
 <tensorflow.python.keras.engine.sequential.Sequential at 0x7eff6100e160>,
 <tensorflow.python.keras.engine.sequential.Sequential at 0x7eff63dc94a8>]
[,,
,
]
通过这种方式,您可以创建多个LSTM模型,这些模型对于不同的股票具有不同的时间序列生成器

stock_timegenerators = []
for stock in stocks:
    stock_df = stock.copy()
    features = stock_df.pop('symbol')
    target = stock_df.pop('price')
  
    x = np.array(stock_df.values)
    y = np.array(target.values)

    # sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
    stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))
[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>,
 <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>,
 <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
希望这对您有所帮助。

编辑:新答案:
因此,我最终要做的是手动执行所有预处理,并为每个包含预处理序列的股票保存一个.npy文件,然后使用手动创建的生成器生成如下批处理:

for stock in stocks:

    stock_df = df.loc[(df['symbol'] == stock)].copy()
    target = stock_df.pop('price')

    x = np.array(stock_df.values)
    y = np.array(target.values)

    sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
class seq_generator():

  def __init__(self, list_of_filepaths):
    self.usedDict = dict()
    for path in list_of_filepaths:
      self.usedDict[path] = []

  def generate(self):
    while True: 
      path = np.random.choice(list(self.usedDict.keys()))
      stock_array = np.load(path) 
      random_sequence = np.random.randint(stock_array.shape[0])
      if random_sequence not in self.usedDict[path]:
        self.usedDict[path].append(random_sequence)
        yield stock_array[random_sequence, :, :]

train_generator = seq_generator(list_of_filepaths)

train_dataset = tf.data.Dataset.from_generator(seq_generator.generate(),
                                               output_types=(tf.float32, tf.float32), 
                                               output_shapes=(n_timesteps, n_features)) 

train_dataset = train_dataset.batch(batch_size)
其中,
文件路径列表
只是预处理.npy数据的路径列表


这将:

  • 加载随机股票的预处理.npy数据
  • 随机挑选一个序列
  • 检查序列索引是否已在
    useddit
  • 如果没有:
    • 将该序列的索引附加到
      useddit
      以跟踪,从而不向模型提供两次相同的数据
    • 产生序列
这意味着生成器将在每次“调用”时从随机库存中馈送一个唯一的序列,使我能够使用Tensorflows类型中的
.from_生成器()
.batch()
方法


原始答案: 我认为@TF_Support的答案有点漏洞百出。如果我理解你的问题,那就不是说你想训练一个模型的股票,你想在整个数据集上训练一个模型

stock_timegenerators = []
for stock in stocks:
    stock_df = stock.copy()
    features = stock_df.pop('symbol')
    target = stock_df.pop('price')
  
    x = np.array(stock_df.values)
    y = np.array(target.values)

    # sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
    stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))
[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>,
 <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>,
 <tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
如果有足够的内存,可以手动创建序列并将整个数据集保存在内存中。我面临的问题与此类似,我无法将所有内容都保存在内存中:


相反,我正在探索单独预处理每个股票的所有数据的可能性,保存为.npy文件,然后使用生成器加载这些.npy文件的随机样本以将数据批处理到模型中,但我还不完全确定如何实现这一点。

这与我最后所做的类似,因此我将其标记为正确。我最终编写了自己的生成器,因为它似乎没有办法在这个用例中使用TimeseriesGenerator()。你是对的,我想做的是使用整个数据集进行培训。