Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/340.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将自己的数据集提供给keras image_ocr_Python_Machine Learning_Artificial Intelligence_Keras_Ocr - Fatal编程技术网

Python 如何将自己的数据集提供给keras image_ocr

Python 如何将自己的数据集提供给keras image_ocr,python,machine-learning,artificial-intelligence,keras,ocr,Python,Machine Learning,Artificial Intelligence,Keras,Ocr,我知道keras图像ocr模型。它使用图像生成器生成图像,但是,我面临一些困难,因为我试图将自己的数据集提供给模型进行训练。vi 回购链接为: 我已经创建了数组:x和y。我的图像路径及其对应的gt位于csv文件中 x表示图像的尺寸,如下所示: [铌铀样品,钨、氢、碳] y是一个字符串,即gt 以下是我用于预处理的代码: for i in range(0,len(read_file)): path = read_file['path'][i] label = read_file['

我知道keras图像ocr模型。它使用图像生成器生成图像,但是,我面临一些困难,因为我试图将自己的数据集提供给模型进行训练。vi

回购链接为:

我已经创建了数组:x和y。我的图像路径及其对应的gt位于csv文件中

x表示图像的尺寸,如下所示: [铌铀样品,钨、氢、碳]

y是一个字符串,即gt

以下是我用于预处理的代码:

for i in range(0,len(read_file)):
    path = read_file['path'][i]
    label = read_file['gt'][i]
    path = path.strip('\n')
    img = cv2.imread(path,0)
    #Re-sizing the images
    #height = 64, width = 128
    #res_img = cv2.resize(img, (128,64))
    #cv2.imwrite(i,res_img)
    h,w =  img.shape
    x.append(img)
    y.append(label)
    size = img.size
    """
    print "Height: ", h #Height
    print "Width: ", w #Width
    print "Channel: ", c #Channel
    print "Size: ", size
    print "\n"
    """
print "H: ", h
print "W: ", w
print "S: ", size

x = np.array(x).astype(np.float32)
y = np.array(y)

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3,random_state=42)

x_train = np.array(x_train).astype(np.float32)
y_train = np.array(y_train)
x_train = np.array(x_train)
x_test = np.array(x_test)
y_test = np.array(y_test)

print "Printing the shapes. \n"
print "X_train shape: ", x_train.shape
print "Y_train shape: ", y_train.shape
print "X_test shape: ", x_test.shape
print "Y_test shape: ", y_test.shape
print "\n"
然后是keras图像_ocr代码。总代码如下:

运行此操作时出现的错误是:

`Traceback (most recent call last):
 File "preprocess.py", line 323, in <module>
 train(run_name, 0, 20, w)
 File "preprocess.py", line 314, in train
 model.fit(next_train(x_train), y_train, batch_size=7, epochs=20,       verbose=1, validation_split=0.1, shuffle=True, initial_epoch=0)
 File "/home/kamranjanjua/anaconda2/lib/python2.7/site-  packages/keras/engine/training.py", line 1358, in fit
batch_size=batch_size)
 File "/home/kamranjanjua/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1234, in _standardize_user_data
exception_prefix='input')
 File "/home/kamranjanjua/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 100, in _standardize_input_data
'Found: ' + str(data)[:200] + '...')
 TypeError: Error when checking model input: data should be a Numpy array, or list/dict of Numpy arrays. Found: <generator object next_train at 0x7f8752671640>...`
`回溯(最近一次呼叫最后一次):
文件“preprocess.py”,第323行,在
列车(运行名称,0,20,w)
文件“preprocess.py”,第314行,列车中
模型.fit(下一个序列(x序列),y序列,批量大小=7,历元=20,详细=1,验证分割=0.1,随机=True,初始历元=0)
文件“/home/kamranjanjua/anaconda2/lib/python2.7/site-packages/keras/engine/training.py”,第1358行,适合
批次大小=批次大小)
文件“/home/kamranjanjua/anaconda2/lib/python2.7/site packages/keras/engine/training.py”,第1234行,在用户数据中
异常(前缀为“输入”)
文件“/home/kamranjanjua/anaconda2/lib/python2.7/site packages/keras/engine/training.py”,第100行,输入数据
'找到:'+str(数据)[:200]+'…')
TypeError:检查模型输入时出错:数据应为Numpy数组,或Numpy数组的列表/目录。发现:`

任何帮助都将不胜感激

如果仔细查看代码,您将能够看到模型需要一个字典作为其输入

inputs = {'the_input': X_data,'the_labels': labels, 'input_length': input_length,'label_length': label_length,'source_str': source_str}

outputs = {'ctc': np.zeros([size])}  # dummy data for dummy loss function
对于输入: 1) X_数据是培训示例 2) 标签是相应培训示例的标签 3) label_length是标签的长度 4) Input_Length是输入的长度 5) 源字符串是它不是强制性的,它只是用于解码

输出是CTC损失函数的虚拟数据

现在在您的代码中,您只生成了X_列、y_列,但缺少其他输入。您需要根据模型的预期输入和输出准备数据集