Python中神经网络的数据加载_Python_Neural Network

Python中神经网络的数据加载

python neural-network

Python中神经网络的数据加载,python,neural-network,Python,Neural Network,我必须处理两个文本文件，其中有几篇来自酒店的评论。每个评论旁边都有一个值，指示它是真实评论还是欺骗性评论。为了处理测试和培训集，我有这部分代码： import csv x_train = list() y_train = list() with open('TRAINING_ALL.txt', encoding='utf-8') as infile: reader = csv.reader(infile, delimiter='\t') for row in reader:

我必须处理两个文本文件，其中有几篇来自酒店的评论。每个评论旁边都有一个值，指示它是真实评论还是欺骗性评论。为了处理测试和培训集，我有这部分代码：

import csv
x_train = list()
y_train = list()
with open('TRAINING_ALL.txt', encoding='utf-8') as infile:
    reader = csv.reader(infile, delimiter='\t')
    for row in reader:
        x_train.append(row[0])
        y_train.append(int(row[1]))



x_test = list()
y_test = list()
with open('TEST_ALL.txt', encoding='utf-8') as infile:
reader = csv.reader(infile, delimiter='\t')
for row in reader:
    x_test.append(row[0])
    y_test.append(int(row[1]))

然后我必须使用神经网络进行分类。但是，在加载数据部分，我遇到了问题：

print('Loading data...')
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

我得到：

Loading data...
480 train sequences
320 test sequences
Pad sequences (samples x time)

到目前为止还不错。它读取正确数量的序列。然后是错误：

ValueError: invalid literal for int() with base 10: "ould take a quick dip in the pool. I toured the hotel as my niece is planning her wedding and just so happens to live close to the hotel. The ' Chagall Ballroom ', was elegant enough for such an occa

这段代码的正确输入是什么

请注意，代码最初工作正常，如下所示（从imdb获取数据集）：

可能x_-train和x_-test的格式不正确？

当您从csv文件加载数据时，您也正在加载包含列名的第一行，您可以通过查看x_-train和x_-test中的第一个元素轻松检查。如果是这样的话，你可以像这样跳过第一行

import csv
x_train = list()
y_train = list()
with open('TRAINING_ALL.txt', encoding='utf-8') as infile:
    reader = csv.reader(infile, delimiter='\t')
    next(reader)
    for row in reader:
        x_train.append(row[0])
        y_train.append(int(row[1]))



x_test = list()
y_test = list()
with open('TEST_ALL.txt', encoding='utf-8') as infile:
    reader = csv.reader(infile, delimiter='\t')
    next(reader)
    for row in reader:
        x_test.append(row[0])
        y_test.append(int(row[1]))

您能给出示例输入文件的前几行吗？评论或其他列是否有新行字符？原始输入文件如下（在x_列之前，y_列分离）：然后x_列是包含所有评论的列表，即第一列。您还可以提供完整的堆栈跟踪吗？哪一行代码导致错误？应该使用调试器或

print（）

语句查看for循环中

行[0]

和

行[1]

的值。它们不是你期望的，仍然不起作用。文件中没有列名，它只是由一个包含评论的列和另一个包含0或1的列组成。错误消息告诉您，您正在尝试转换，您将在池中快速地浸入。我参观了酒店，因为我的侄女正在计划她的婚礼，正好住在酒店附近。“Chagall舞厅”非常优雅，可以让occ输入一个数字，尝试在CSV文件中搜索该字符串，或者确保您指向正确的文件（如果该文件不包含任何字符串）

import csv
x_train = list()
y_train = list()
with open('TRAINING_ALL.txt', encoding='utf-8') as infile:
    reader = csv.reader(infile, delimiter='\t')
    next(reader)
    for row in reader:
        x_train.append(row[0])
        y_train.append(int(row[1]))



x_test = list()
y_test = list()
with open('TEST_ALL.txt', encoding='utf-8') as infile:
    reader = csv.reader(infile, delimiter='\t')
    next(reader)
    for row in reader:
        x_test.append(row[0])
        y_test.append(int(row[1]))