Python 如何将自己的数据集提供给keras image_ocr_Python_Machine Learning_Artificial Intelligence_Keras_Ocr

Python 如何将自己的数据集提供给keras image_ocr

python machine-learning artificial-intelligence keras

Python 如何将自己的数据集提供给keras image_ocr,python,machine-learning,artificial-intelligence,keras,ocr,Python,Machine Learning,Artificial Intelligence,Keras,Ocr,我知道keras图像ocr模型。它使用图像生成器生成图像，但是，我面临一些困难，因为我试图将自己的数据集提供给模型进行训练。vi 回购链接为：我已经创建了数组：x和y。我的图像路径及其对应的gt位于csv文件中 x表示图像的尺寸，如下所示： [铌铀样品，钨、氢、碳] y是一个字符串，即gt 以下是我用于预处理的代码： for i in range(0,len(read_file)): path = read_file['path'][i] label = read_file['

我知道keras图像ocr模型。它使用图像生成器生成图像，但是，我面临一些困难，因为我试图将自己的数据集提供给模型进行训练。vi

回购链接为：

我已经创建了数组：x和y。我的图像路径及其对应的gt位于csv文件中

x表示图像的尺寸，如下所示： [铌铀样品，钨、氢、碳]

y是一个字符串，即gt

以下是我用于预处理的代码：

for i in range(0,len(read_file)):
    path = read_file['path'][i]
    label = read_file['gt'][i]
    path = path.strip('\n')
    img = cv2.imread(path,0)
    #Re-sizing the images
    #height = 64, width = 128
    #res_img = cv2.resize(img, (128,64))
    #cv2.imwrite(i,res_img)
    h,w =  img.shape
    x.append(img)
    y.append(label)
    size = img.size
    """
    print "Height: ", h #Height
    print "Width: ", w #Width
    print "Channel: ", c #Channel
    print "Size: ", size
    print "\n"
    """
print "H: ", h
print "W: ", w
print "S: ", size

x = np.array(x).astype(np.float32)
y = np.array(y)

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3,random_state=42)

x_train = np.array(x_train).astype(np.float32)
y_train = np.array(y_train)
x_train = np.array(x_train)
x_test = np.array(x_test)
y_test = np.array(y_test)

print "Printing the shapes. \n"
print "X_train shape: ", x_train.shape
print "Y_train shape: ", y_train.shape
print "X_test shape: ", x_test.shape
print "Y_test shape: ", y_test.shape
print "\n"

然后是keras图像_ocr代码。总代码如下：

运行此操作时出现的错误是：

`Traceback (most recent call last):
 File "preprocess.py", line 323, in <module>
 train(run_name, 0, 20, w)
 File "preprocess.py", line 314, in train
 model.fit(next_train(x_train), y_train, batch_size=7, epochs=20,       verbose=1, validation_split=0.1, shuffle=True, initial_epoch=0)
 File "/home/kamranjanjua/anaconda2/lib/python2.7/site-  packages/keras/engine/training.py", line 1358, in fit
batch_size=batch_size)
 File "/home/kamranjanjua/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1234, in _standardize_user_data
exception_prefix='input')
 File "/home/kamranjanjua/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 100, in _standardize_input_data
'Found: ' + str(data)[:200] + '...')
 TypeError: Error when checking model input: data should be a Numpy array, or list/dict of Numpy arrays. Found: <generator object next_train at 0x7f8752671640>...`

`回溯（最近一次呼叫最后一次）：
文件“preprocess.py”，第323行，在
列车（运行名称，0，20，w）
文件“preprocess.py”，第314行，列车中
模型.fit（下一个序列（x序列），y序列，批量大小=7，历元=20，详细=1，验证分割=0.1，随机=True，初始历元=0）
文件“/home/kamranjanjua/anaconda2/lib/python2.7/site-packages/keras/engine/training.py”，第1358行，适合
批次大小=批次大小）
文件“/home/kamranjanjua/anaconda2/lib/python2.7/site packages/keras/engine/training.py”，第1234行，在用户数据中
异常（前缀为“输入”）
文件“/home/kamranjanjua/anaconda2/lib/python2.7/site packages/keras/engine/training.py”，第100行，输入数据
'找到：'+str（数据）[:200]+'…'）
TypeError：检查模型输入时出错：数据应为Numpy数组，或Numpy数组的列表/目录。发现：`

任何帮助都将不胜感激

如果仔细查看代码，您将能够看到模型需要一个字典作为其输入

inputs = {'the_input': X_data,'the_labels': labels, 'input_length': input_length,'label_length': label_length,'source_str': source_str}

outputs = {'ctc': np.zeros([size])}  # dummy data for dummy loss function

对于输入： 1） X_数据是培训示例 2）标签是相应培训示例的标签 3） label_length是标签的长度 4） Input_Length是输入的长度 5）源字符串是它不是强制性的，它只是用于解码

输出是CTC损失函数的虚拟数据