Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/284.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将ocr数据转换为数据帧_Python_Python 3.x_Pandas_Dataframe - Fatal编程技术网

Python 如何将ocr数据转换为数据帧

Python 如何将ocr数据转换为数据帧,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有一个单元格边界框 [[23, 19, 1346, 63], [23, 67, 137, 110], [141, 67, 344, 110], [348, 67, 635, 110], [639, 67, 1346, 110], [23, 114, 137, 287], [141, 114, 344, 287], [348, 114, 635, 287], [639, 114, 1346, 287], [23, 291, 137, 507], [141, 291, 344,

我有一个单元格边界框

[[23, 19, 1346, 63],
 [23, 67, 137, 110],
 [141, 67, 344, 110],
 [348, 67, 635, 110],
 [639, 67, 1346, 110],
 [23, 114, 137, 287],
 [141, 114, 344, 287],
 [348, 114, 635, 287],
 [639, 114, 1346, 287],
 [23, 291, 137, 507],
 [141, 291, 344, 507],
 [348, 291, 635, 507],
 [639, 291, 1346, 507]]
我已经完成了ocr的输出

[([604, 28, 764, 58], '4th Quarter'),
 ([42, 78, 118, 103], 'Sr No'),
 ([217, 78, 266, 103], 'PID'),
 ([439, 78, 543, 104], 'PName'),
 ([849, 76, 1133, 107], 'Product Description'),
 ([69, 126, 90, 151], '1'),
 ([152, 124, 331, 151], 'IDXY100234'),
 ([386, 123, 595, 151], 'SQRT-XUIP-34'),
 ([655, 122, 1332, 155], 'si Jandarmeriei in scopul prevenirii delincventei'),
 ([655, 165, 1289, 197], 'realizarii unei orientari vocationale adecvate'),
 ([653, 209, 1189, 241], 'contactele cu diverse institutii pentru'),
 ([68, 302, 90, 328], '2'),
 ([155, 300, 335, 329], 'IDXY100346'),
 ([364, 301, 615, 328], 'MAPK-QKGAP-09'),
 ([651, 299, 1279, 330], 'introducerea elevilor in mediul comunitar si'),
 ([650, 343, 1267, 375], 'semestrial-comisia de prevenire a violentei'),
 ([654, 387, 1276, 418], 'Reprezentativ al Parintilor, suplimentate de'),
 ([653, 429, 1127, 462], 'consultatii individuale cu parintii;')]
我想把它转换成一个合适的数据帧,如下图所示

表中的输出数据帧应相同。在将每个单元格转换为列之后,我不知道该如何做。 我的代码

df=pd.DataFrame(提取的ocr数据)
#寻找质心
df[2]=df[0].应用(λx:((x[0]+x[-4])/2,((x[1]+x[-3])/2)))
col_df=pd.DataFrame([])
#比较质心坐标与单元坐标后将单元转换为列
对于提取的\u单元格\u bb中的bbox:
表_df[“Cols Bool”]=df[2]。应用(如果x[0]>=bbox[0]和x[0]=bbox[1]和x[1],则lambda x:True)
df = pd.DataFrame(extracted_ocr_data)

# Finding centroid
df[2] = df[0].apply(lambda x: ((x[0] + x[-4]) / 2, ((x[1] + x[-3]) / 2)))

col_df = pd.DataFrame([])

# Converting cell into columns after comparing centroid to cell co-ordinates
for bbox in extracted_cells_bb:
     table_df["Cols Bool"] = df[2].apply(lambda x: True if x[0] >= bbox[0] and x[0] <= (bbox[0] + bbox[-3]) and x[1] >= bbox[1] and x[1] <= (bbox[1] + bbox[-2]) else False)
     col_df = pd.concat([col_df, pd.DataFrame(df[df["Cols Bool"]][1]).reset_index()], axis = 1, ignore_index=True)