Python 将多个Keras TimeseriesGenerator对象合并或附加到一个
我正在尝试制作一个LSTM模型。数据来自包含多个股票值的csv文件 我不能使用文件中显示的所有行生成序列,因为每个序列仅在其自身股票的上下文中相关,因此我需要为每个股票选择行并基于此生成序列 我有这样的想法:Python 将多个Keras TimeseriesGenerator对象合并或附加到一个,python,tensorflow,keras,lstm,Python,Tensorflow,Keras,Lstm,我正在尝试制作一个LSTM模型。数据来自包含多个股票值的csv文件 我不能使用文件中显示的所有行生成序列,因为每个序列仅在其自身股票的上下文中相关,因此我需要为每个股票选择行并基于此生成序列 我有这样的想法: for stock in stocks: stock_df = df.loc[(df['symbol'] == stock)].copy() target = stock_df.pop('price') x = np.array(stock_df.values)
for stock in stocks:
stock_df = df.loc[(df['symbol'] == stock)].copy()
target = stock_df.pop('price')
x = np.array(stock_df.values)
y = np.array(target.values)
sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
class seq_generator():
def __init__(self, list_of_filepaths):
self.usedDict = dict()
for path in list_of_filepaths:
self.usedDict[path] = []
def generate(self):
while True:
path = np.random.choice(list(self.usedDict.keys()))
stock_array = np.load(path)
random_sequence = np.random.randint(stock_array.shape[0])
if random_sequence not in self.usedDict[path]:
self.usedDict[path].append(random_sequence)
yield stock_array[random_sequence, :, :]
train_generator = seq_generator(list_of_filepaths)
train_dataset = tf.data.Dataset.from_generator(seq_generator.generate(),
output_types=(tf.float32, tf.float32),
output_shapes=(n_timesteps, n_features))
train_dataset = train_dataset.batch(batch_size)
这很好,但是我想把每个序列合并成一个更大的序列,我将用于训练,其中包含所有股票的数据
无法使用append或merge,因为函数返回的是生成器对象,而不是numpy数组。对于该场景,您希望将这些序列合并为一个较大的序列,其中包含所有股票的数据,并将用于培训
stock_timegenerators = []
for stock in stocks:
stock_df = stock.copy()
features = stock_df.pop('symbol')
target = stock_df.pop('price')
x = np.array(stock_df.values)
y = np.array(target.values)
# sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))
[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>,
<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>,
<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
您可以将创建的TimeSeriesGenerators追加到Python列表中
stock_timegenerators = []
for stock in stocks:
stock_df = stock.copy()
features = stock_df.pop('symbol')
target = stock_df.pop('price')
x = np.array(stock_df.values)
y = np.array(target.values)
# sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))
[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>,
<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>,
<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
此操作的输出将是一个附加的TimeSeriesGenerator,您可以通过迭代列表或引用索引来使用它
stock_timegenerators = []
for stock in stocks:
stock_df = stock.copy()
features = stock_df.pop('symbol')
target = stock_df.pop('price')
x = np.array(stock_df.values)
y = np.array(target.values)
# sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))
[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>,
<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>,
<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
这将输出附加的模型列表,并使用索引方便地引用这些模型
[<tensorflow.python.keras.engine.sequential.Sequential at 0x7eff62c7b748>,
<tensorflow.python.keras.engine.sequential.Sequential at 0x7eff6100e160>,
<tensorflow.python.keras.engine.sequential.Sequential at 0x7eff63dc94a8>]
[,,
,
]
通过这种方式,您可以创建多个LSTM模型,这些模型对于不同的股票具有不同的时间序列生成器
stock_timegenerators = []
for stock in stocks:
stock_df = stock.copy()
features = stock_df.pop('symbol')
target = stock_df.pop('price')
x = np.array(stock_df.values)
y = np.array(target.values)
# sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))
[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>,
<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>,
<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
希望这对您有所帮助。编辑:新答案:
因此,我最终要做的是手动执行所有预处理,并为每个包含预处理序列的股票保存一个.npy文件,然后使用手动创建的生成器生成如下批处理:
for stock in stocks:
stock_df = df.loc[(df['symbol'] == stock)].copy()
target = stock_df.pop('price')
x = np.array(stock_df.values)
y = np.array(target.values)
sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
class seq_generator():
def __init__(self, list_of_filepaths):
self.usedDict = dict()
for path in list_of_filepaths:
self.usedDict[path] = []
def generate(self):
while True:
path = np.random.choice(list(self.usedDict.keys()))
stock_array = np.load(path)
random_sequence = np.random.randint(stock_array.shape[0])
if random_sequence not in self.usedDict[path]:
self.usedDict[path].append(random_sequence)
yield stock_array[random_sequence, :, :]
train_generator = seq_generator(list_of_filepaths)
train_dataset = tf.data.Dataset.from_generator(seq_generator.generate(),
output_types=(tf.float32, tf.float32),
output_shapes=(n_timesteps, n_features))
train_dataset = train_dataset.batch(batch_size)
其中,文件路径列表
只是预处理.npy数据的路径列表
这将:
- 加载随机股票的预处理.npy数据
- 随机挑选一个序列
- 检查序列索引是否已在
useddit
- 如果没有:
- 将该序列的索引附加到
以跟踪,从而不向模型提供两次相同的数据useddit
- 产生序列
- 将该序列的索引附加到
.from_生成器()
和.batch()
方法
原始答案: 我认为@TF_Support的答案有点漏洞百出。如果我理解你的问题,那就不是说你想训练一个模型的股票,你想在整个数据集上训练一个模型
stock_timegenerators = []
for stock in stocks:
stock_df = stock.copy()
features = stock_df.pop('symbol')
target = stock_df.pop('price')
x = np.array(stock_df.values)
y = np.array(target.values)
# sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)
stock_timegenerators.append(TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1))
[<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c699b0>,
<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c6eba8>,
<tensorflow.python.keras.preprocessing.sequence.TimeseriesGenerator at 0x7eff62c782e8>]
如果有足够的内存,可以手动创建序列并将整个数据集保存在内存中。我面临的问题与此类似,我无法将所有内容都保存在内存中:
相反,我正在探索单独预处理每个股票的所有数据的可能性,保存为.npy文件,然后使用生成器加载这些.npy文件的随机样本以将数据批处理到模型中,但我还不完全确定如何实现这一点。这与我最后所做的类似,因此我将其标记为正确。我最终编写了自己的生成器,因为它似乎没有办法在这个用例中使用TimeseriesGenerator()。你是对的,我想做的是使用整个数据集进行培训。