Python LSTM时基数据准备的误区
我试图复制Chevalier的算法,当我意识到我的方法与该算法不匹配时,遇到了一个问题。作为后续行动,我能够通过以下方法为Python LSTM时基数据准备的误区,python,python-3.x,csv,machine-learning,tensorflow,Python,Python 3.x,Csv,Machine Learning,Tensorflow,我试图复制Chevalier的算法,当我意识到我的方法与该算法不匹配时,遇到了一个问题。作为后续行动,我能够通过以下方法为load_X生成一个结果: [0]中的: def load_X(X_signals_paths): X_signals = [] for signal_type_path in X_signals_paths: with open(signal_type_path, 'r') as csvfile: reader = c
load_X
生成一个结果:
[0]中的:
def load_X(X_signals_paths):
X_signals = []
for signal_type_path in X_signals_paths:
with open(signal_type_path, 'r') as csvfile:
reader = csv.reader(csvfile)
next(reader)
for serie in [row[1:2] for row in reader]:
#X_signals.append([np.array([row[1:2] for row in reader],dtype=np.float32) for row in reader])
X_signals.append(np.array(serie, dtype=np.int32))
file.close()
return (np.transpose(np.transpose(X_signals), (1, 0)))
X_train_signals_paths = [
DATASET_PATH + TRAIN + signal + "_train.csv" for signal in INPUT_SIGNAL_TYPES
]
X_test_signals_paths = [
DATASET_PATH + TEST + signal + "_test.csv" for signal in INPUT_SIGNAL_TYPES
]
X_train = load_X(X_train_signals_paths)
X_test = load_X(X_test_signals_paths)
print(X_train)
[[ 6]
[ 6]
...,
[13]
[13]
[13]]
Out[0]:
def load_X(X_signals_paths):
X_signals = []
for signal_type_path in X_signals_paths:
with open(signal_type_path, 'r') as csvfile:
reader = csv.reader(csvfile)
next(reader)
for serie in [row[1:2] for row in reader]:
#X_signals.append([np.array([row[1:2] for row in reader],dtype=np.float32) for row in reader])
X_signals.append(np.array(serie, dtype=np.int32))
file.close()
return (np.transpose(np.transpose(X_signals), (1, 0)))
X_train_signals_paths = [
DATASET_PATH + TRAIN + signal + "_train.csv" for signal in INPUT_SIGNAL_TYPES
]
X_test_signals_paths = [
DATASET_PATH + TEST + signal + "_test.csv" for signal in INPUT_SIGNAL_TYPES
]
X_train = load_X(X_train_signals_paths)
X_test = load_X(X_test_signals_paths)
print(X_train)
[[ 6]
[ 6]
...,
[13]
[13]
[13]]
然而,我更仔细地研究了一下契瓦利埃的方法,当我做len(X_-train[0])
和len(X_-train[0][0])
时,我发现了一些有趣的事情。似乎我格式化x值的方式与Chevalier的x值有很大不同。可以找到我的原始CSV文件,也可以找到骑士X_列车的原始txt文件。以下是Chevalier的代码,用于与我的代码进行比较:
def load_X(X_signals_paths):
X_signals = []
for signal_type_path in X_signals_paths:
file = open(signal_type_path, 'r')
# Read dataset from disk, dealing with text files' syntax
X_signals.append(
[np.array(serie, dtype=np.float32) for serie in [
row.replace(' ', ' ').strip().split(' ') for row in file
]]
)
file.close()
return np.transpose(np.array(X_signals), (1, 2, 0))
X_train_signals_paths = [
DATASET_PATH + TRAIN + "Inertial Signals/" + signal + "train.txt" for signal in INPUT_SIGNAL_TYPES
]
X_test_signals_paths = [
DATASET_PATH + TEST + "Inertial Signals/" + signal + "test.txt" for signal in INPUT_SIGNAL_TYPES
]
X_train = load_X(X_train_signals_paths)
X_test = load_X(X_test_signals_paths)
以下内容摘自Chevalier的“附加参数”部分,是我感到困惑的主要原因:
training_data_count = len(X_train) # 7352 training series (with 50% overlap between each serie)
test_data_count = len(X_test) # 2947 testing series
n_steps = len(X_train[0]) # 128 timesteps per series
n_input = len(X_train[0][0]) # 9 input parameters per timestep
我观察到的是,这50%的重叠意味着单独评估的时间间隔重叠,如0-64、32-96、64-128、96等。我确实知道的一个事实是7352是X_train.txt中的行数。[0]
和[0][0]
表示它分别选择X\U列数组的第0列和X\U列的第0列和第0行。我的代码目前正在做的是分别转置我的每个数据点。这就是为什么当我评估len(X_列[0])
时,我收到了一个1,而使用len(X_列[0][0])
时,我收到了一个错误:
TypeError Traceback (most recent call last)
<ipython-input-255-14523e544e49> in <module>()
2 test_data_count = len(list(X_test))
3 n_steps = len(X_train[0])
----> 4 n_input = len(list(X_train)[0][0])
5 print(training_data_count, test_data_count, n_steps, n_input)
TypeError: object of type 'numpy.int32' has no len()
TypeError回溯(最近一次调用)
在()
2测试数据计数=len(列表(X测试))
3个n_步数=len(X_列[0])
---->4 n_输入=len(列表(X_列)[0][0])
5打印(训练数据计数、测试数据计数、n步数、n输入)
TypeError:类型为“numpy.int32”的对象没有len()
我想知道我应该如何重新格式化我的数据,以匹配txt文件中Chevalier的预期格式?Chevalier git“附加参数”部分中的数字是什么意思?我如何根据当前模型调整它们 看起来Chevalier的数据文件在每一行上都有一个事件(561个测量点),而您的数据看起来都是单个事件(893个点)。这是正确的观察吗?此外,你的是单调递增的,而他们的似乎是相对的测量。我们不知道您的数据中有什么,所以我们无法真正猜测如何重新格式化它。您的Python代码存在一些风格问题。您不能用
with
关闭用with
打开的文件句柄,因为with
上下文管理器会为您处理此问题。如果你想跳过CSV的第一行,你可以通过传递一个参数告诉阅读器这一点。读取然后转置似乎效率很低。相反,只需将数据点附加到内部列表。@Triplee您的观察是正确的。我认为如果你写出一个完整的答案会更好。看起来,Chevalier的数据文件在每一行上都有一个事件(561个测量点),而你的数据看起来都是单个事件(893个点)。这是正确的观察吗?此外,你的是单调递增的,而他们的似乎是相对的测量。我们不知道您的数据中有什么,所以我们无法真正猜测如何重新格式化它。您的Python代码存在一些风格问题。您不能用with
关闭用with
打开的文件句柄,因为with
上下文管理器会为您处理此问题。如果你想跳过CSV的第一行,你可以通过传递一个参数告诉阅读器这一点。读取然后转置似乎效率很低。相反,只需将数据点附加到内部列表。@Triplee您的观察是正确的。我想你最好写一个完整的答案