Python 在pdfminer中输入坐标并获得结果

Python 在pdfminer中输入坐标并获得结果,python,pdfminer,Python,Pdfminer,我试图通过输入坐标在pdf miner中提取文本,我在互联网上搜索过,但找不到任何与此相关的文档或代码。到目前为止,我找到了一个提取文本并输出坐标的代码 LTTextBoxHorizontal (317.564, 91.32756, 580.93228, 116.24235999999999) SHOULD ANY OF THE ABOVE DESCRIBED POLICIES BE CANCELLED BEFORE THE EXPIRATION DATE THEREOF,

我试图通过输入坐标在pdf miner中提取文本,我在互联网上搜索过,但找不到任何与此相关的文档或代码。到目前为止,我找到了一个提取文本并输出坐标的代码

LTTextBoxHorizontal
(317.564, 91.32756, 580.93228, 116.24235999999999)
SHOULD ANY OF THE ABOVE DESCRIBED POLICIES BE CANCELLED BEFORE
THE    EXPIRATION   DATE    THEREOF,    NOTICE   WILL   BE   DELIVERED   IN
ACCORDANCE   WITH   THE   POLICY   PROVISIONS.
这是我获得的输出坐标和文本之一。我也尝试了pdfquery,但是我有很多错误

File "C:\Python27\lib\site-packages\pyquery-1.2.11-py2.7.egg\pyquery\pyquery.py", line 268, in __call__
    result = self._copy(*args, parent=self, **kwargs)
  File "C:\Python27\lib\site-packages\pyquery-1.2.11-py2.7.egg\pyquery\pyquery.py", line 253, in _copy
    return self.__class__(*args, **kwargs)
  File "C:\Python27\lib\site-packages\pyquery-1.2.11-py2.7.egg\pyquery\pyquery.py", line 239, in __init__
    xpath = self._css_to_xpath(selector)
  File "C:\Python27\lib\site-packages\pyquery-1.2.11-py2.7.egg\pyquery\pyquery.py", line 249, in _css_to_xpath
    return self._translator.css_to_xpath(selector, prefix)
  File "build\bdist.win32\egg\cssselect\xpath.py", line 192, in css_to_xpath
  File "build\bdist.win32\egg\cssselect\parser.py", line 355, in parse
  File "build\bdist.win32\egg\cssselect\parser.py", line 370, in parse_selector_group
  File "build\bdist.win32\egg\cssselect\parser.py", line 378, in parse_selector
  File "build\bdist.win32\egg\cssselect\parser.py", line 437, in parse_simple_selector
  File "build\bdist.win32\egg\cssselect\parser.py", line 535, in parse_attrib
cssselect.parser.SelectorSyntaxError: Expected string or ident, got <NUMBER '1' at 14> 
文件“C:\Python27\lib\site packages\pyquery-1.2.11-py2.7.egg\pyquery\pyquery.py”,第268行,在调用中__
结果=self.\u复制(*args,parent=self,**kwargs)
文件“C:\Python27\lib\site packages\pyquery-1.2.11-py2.7.egg\pyquery\pyquery.py”,第253行,复制
返回self.\uuuuuuuuuuuuuuuuuuuuuuu类(*参数,**kwargs)
文件“C:\Python27\lib\site packages\pyquery-1.2.11-py2.7.egg\pyquery\pyquery.py”,第239行,在uu init中__
xpath=self.\u css\u to\u xpath(选择器)
文件“C:\Python27\lib\site packages\pyquery-1.2.11-py2.7.egg\pyquery\pyquery.py”,第249行,在_css_to_xpath中
返回self.\u translator.css\u到\u xpath(选择器,前缀)
文件“build\bdist.win32\egg\cssselect\xpath.py”,第192行,在css\u到\u xpath中
解析中第355行的文件“build\bdist.win32\egg\cssselect\parser.py”
文件“build\bdist.win32\egg\cssselect\parser.py”,第370行,在parse_selector_组中
解析选择器中第378行的文件“build\bdist.win32\egg\cssselect\parser.py”
文件“build\bdist.win32\egg\cssselect\parser.py”,第437行,在parse\u simple\u选择器中
parse_attrib中第535行的文件“build\bdist.win32\egg\cssselect\parser.py”
cssselect.parser.SelectorSyntaxError:应为字符串或标识符,已获取

有人能帮我吗?

当你不逃避pageid值时就会发生这种情况

试试看:

LTPage[pageid=\'1\']

当您不转义pageid值时,就会发生这种情况

试试看:

LTPage[pageid=\'1\']