Python Camelot:table_区域和table_区域未按预期工作
好几天来,我一直试图让Camelot在pdf页面的特定区域工作,但这一直让我感到困惑。我查看并尝试了文档建议和一些bug报告,但没有结果。我需要一些帮助 我从文档中选取了一个示例,因为它有多个表。我修改了原始命令,仅从以下两个表中提取一个:Python Camelot:table_区域和table_区域未按预期工作,python,pdf,python-camelot,Python,Pdf,Python Camelot,好几天来,我一直试图让Camelot在pdf页面的特定区域工作,但这一直让我感到困惑。我查看并尝试了文档建议和一些bug报告,但没有结果。我需要一些帮助 我从文档中选取了一个示例,因为它有多个表。我修改了原始命令,仅从以下两个表中提取一个: tables=camelot.read\u pdf('12s0324.pdf',flavor='stream',strip\u text='.\n') 致: tables=camelot.read\u pdf('12s0324.pdf',flavor='st
tables=camelot.read\u pdf('12s0324.pdf',flavor='stream',strip\u text='.\n')
致:
tables=camelot.read\u pdf('12s0324.pdf',flavor='stream',strip\u text='\n',table\u area=['33297386,65'],pages='1')
鉴于:
- 我更改了正则表达式,因为它消除了单词之间的空格
- 使用
代替文档的table_area
,因为前者触发细化,而第二个触发错误(错误已解释,文档似乎仍然错误)table_area
- 尝试提取两个表,并使用中解释的camelot的绘图功能检查各自的区域,因此它们应该是正确的
- 也尝试过使用
,它至少拉出了一个表而不是两个表,但仍然相当不准确(请参见下面的注释)表\u区域
'35591385343'
PDF区域(顶部表格)上使用表格区域
只有一个表,显然,相同的问题在所选区域之外有不需要的文本
第三:在'33297386,65'
PDF区域(底部表格)上使用表格区域
更好的是,它会像上面那样拾取不需要的文本
我非常重视建议或建议。提前谢谢 table_area(而不是table_area)关键字参数工作良好,应该使用(我使用Camelot 0.7.3)
返回:
这似乎是对的。我现在更困惑了。我已经多次使用该命令(我刚刚检查过,我的Python控制台中仍然有一个实例),但它从未起作用。现在它是:P好的,感谢您确认表_区域
是正确的,与我所指的链接相反。我想知道他们为什么还把桌上的地方到处乱放。
>>> tables = camelot.read_pdf('12s0324.pdf', flavor='stream', strip_text='\n', table_area=['35,591,385,343'], pages = '1')
>>> tables
<TableList n=2>
>>> tables[0].df
0 1 2 3 4 5 6 7 8 9
0 Program. Represents arrests reported (not char...
1 by the FBI. Some persons may be arrested more ...
2 could represent multiple arrests of the same p...
3 Total Male Female
4 Offense charged Under 18 18 years Under 18 18 years Under 18 18 years
5 Total years and over Total years and over Total years and over
6 Total . . . . . . . . . . . . . . . ... 11,062 .6 1,540 .0 9,522 .6 8,263 .3 1,071 .6 7,191 .7 2,799 .2 468 .3 2,330 .9
7 Violent crime . . . . . . . . . . . ... 467 .9 69 .1 398 .8 380 .2 56 .5 323 .7 87 .7 12 .6 75 .2
8 Murder and nonnegligent
9 manslaughter . . . . . . . .. .. .. .. .. 10.0 0.9 9.1 9.0 0.9 8.1 1.1 – 1.0
10 Forcible rape . . . . . . . .. .. .. .. .. . 17.5 2.6 14.9 17.2 2.5 14.7 – – –
11 Robbery . . . .. .. . .. . ... . ... . ... 102.1 25.5 76.6 90.0 22.9 67.1 12.1 2.5 9.5
....
34 Disorderly conduct . .. . . . . . .. .. .. . 529.5 136.1 393.3 387.1 90.8 296.2 142.4 45.3 97.1
35 Vagrancy . . . .. . . . ... .... .... ... 26.6 2.2 24.4 20.9 1.6 19.3 5.7 0.6 5.1
36 All other offenses (except traffic) . . .. 306.1 263.4 2,800.8 2,337.1 194.2 2,142.9 727.0 69.2 657.9
37 Suspicion . . . .. . . .. .. .. .. .. .. . .. 1.6 – 1.4 1.2 – 1.0 – – –
38 Curfew and loitering law violations .. 91.0 91.0 (X) 63.1 63.1 (X) 28.0 28.0 (X)
39 Runaways . . . . . . . .. .. .. .. .. .... 75.8 75.8 (X) 34.0 34.0 (X) 41.8 41.8 (X)
40 – Represents zero. X Not applicable. 1 Buying,...
>>> tables = camelot.read_pdf('12s0324.pdf', flavor='stream', strip_text='\n', table_regions=['35,591,385,343'], pages = '1')
>>> tables
<TableList n=1>
>>> tables[0].df
0 1 2 3 4 5 6 7 8 9
0 Program. Represents arrests reported (not char...
1 by the FBI. Some persons may be arrested more ...
2 could represent multiple arrests of the same p...
3 Total Male Female
4 Offense charged Under 18 18 years Under 18 18 years Under 18 18 years
5 Total years and over Total years and over Total years and over
6 Total . . . . . . . . . . . . . . . ... 11,062 .6 1,540 .0 9,522 .6 8,263 .3 1,071 .6 7,191 .7 2,799 .2 468 .3 2,330 .9
7 Violent crime . . . . . . . . . . . ... 467 .9 69 .1 398 .8 380 .2 56 .5 323 .7 87 .7 12 .6 75 .2
8 Murder and nonnegligent
9 manslaughter . . . . . . . .. .. .. .. .. 10.0 0.9 9.1 9.0 0.9 8.1 1.1 – 1.0
10 Forcible rape . . . . . . . .. .. .. .. .. . 17.5 2.6 14.9 17.2 2.5 14.7 – – –
11 Robbery . . . .. .. . .. . ... . ... . ... 102.1 25.5 76.6 90.0 22.9 67.1 12.1 2.5 9.5
....
34 Disorderly conduct . .. . . . . . .. .. .. . 529.5 136.1 393.3 387.1 90.8 296.2 142.4 45.3 97.1
35 Vagrancy . . . .. . . . ... .... .... ... 26.6 2.2 24.4 20.9 1.6 19.3 5.7 0.6 5.1
36 All other offenses (except traffic) . . .. 306.1 263.4 2,800.8 2,337.1 194.2 2,142.9 727.0 69.2 657.9
37 Suspicion . . . .. . . .. .. .. .. .. .. . .. 1.6 – 1.4 1.2 – 1.0 – – –
38 Curfew and loitering law violations .. 91.0 91.0 (X) 63.1 63.1 (X) 28.0 28.0 (X)
39 Runaways . . . . . . . .. .. .. .. .. .... 75.8 75.8 (X) 34.0 34.0 (X) 41.8 41.8 (X)
40 – Represents zero. X Not applicable. 1 Buying,...
>>> tables = camelot.read_pdf('12s0324.pdf', flavor='stream', strip_text='\n', table_area=['33,297,386,65'], pages = '1')
>>> tables
<TableList n=2>
>>> tables[0].df
0 1 2 3 4 5 6 7 8 9
0 Program. Represents arrests reported (not char...
1 by the FBI. Some persons may be arrested more ...
2 could represent multiple arrests of the same p...
3 Total Male Female
4 Offense charged Under 18 18 years Under 18 18 years Under 18 18 years
5 Total years and over Total years and over Total years and over
6 Total . . . . . . . . . . . . . . . ... 11,062 .6 1,540 .0 9,522 .6 8,263 .3 1,071 .6 7,191 .7 2,799 .2 468 .3 2,330 .9
7 Violent crime . . . . . . . . . . . ... 467 .9 69 .1 398 .8 380 .2 56 .5 323 .7 87 .7 12 .6 75 .2
8 Murder and nonnegligent
9 manslaughter . . . . . . . .. .. .. .. .. 10.0 0.9 9.1 9.0 0.9 8.1 1.1 – 1.0
10 Forcible rape . . . . . . . .. .. .. .. .. . 17.5 2.6 14.9 17.2 2.5 14.7 – – –
11 Robbery . . . .. .. . .. . ... . ... . ... 102.1 25.5 76.6 90.0 22.9 67.1 12.1 2.5 9.5
....
34 Disorderly conduct . .. . . . . . .. .. .. . 529.5 136.1 393.3 387.1 90.8 296.2 142.4 45.3 97.1
35 Vagrancy . . . .. . . . ... .... .... ... 26.6 2.2 24.4 20.9 1.6 19.3 5.7 0.6 5.1
36 All other offenses (except traffic) . . .. 306.1 263.4 2,800.8 2,337.1 194.2 2,142.9 727.0 69.2 657.9
37 Suspicion . . . .. . . .. .. .. .. .. .. . .. 1.6 – 1.4 1.2 – 1.0 – – –
38 Curfew and loitering law violations .. 91.0 91.0 (X) 63.1 63.1 (X) 28.0 28.0 (X)
39 Runaways . . . . . . . .. .. .. .. .. .... 75.8 75.8 (X) 34.0 34.0 (X) 41.8 41.8 (X)
40 – Represents zero. X Not applicable. 1 Buying,...
>>> tables = camelot.read_pdf('12s0324.pdf', flavor='stream', strip_text='\n', table_regions=['33,297,386,65'], pages = '1')
>>> tables
<TableList n=1>
>>> tables[0].df
0 1 2 3 4 5
0 Table 325. Arrests by Race: 2009
1 [Based on Uniform Crime Reporting (UCR) Progra...
2 with a total population of 239,839,971 as esti...
3 American
4 Offense charged Indian/Alaskan Asian Pacific
5 Total White Black Native Islander
6 Total . . . . . . . . . . . . . . . . ... 10,690,561 7,389,208 3,027,153 150,544 123,656
7 Violent crime . . . . . . . . . . . ... 456,965 268,346 177,766 5,608 5,245
8 Murder and nonnegligent manslaughter . .. ... . 9,739 4,741 4,801 100 97
9 Forcible rape . . . . . . . .. .. .. .. .... .... 16,362 10,644 5,319 169 230
10 Robbery . . . . .. . . . ... . ... . .... ....... 100,496 43,039 55,742 726 989
11 Aggravated assault . . . . . . . .. .. ......... 330,368 209,922 111,904 4,613 3,929
....
34 All other offenses (except traffic) . .. .. ..... 2,929,217 1,937,221 911,670 43,880 36,446
35 Suspicion . . .. . . . .. .. .. .. .. .. .. ..... 1,513 677 828 1 7
36 Curfew and loitering law violations . .. ... ... 89,578 54,439 33,207 872 1,060
37 Runaways . . . . . . . .. .. .. .. .. .. ....... 73,616 48,343 19,670 1,653 3,950
38 1 Except forcible rape and prostitution.
tables = camelot.read_pdf('12s0324.pdf', flavor='stream', strip_text='\n', table_areas=['35,591,385,343'], pages = '1')