Python 使用XPath获取样式值大于阈值的元素_Python_Python 3.x_Pdf_Lxml

Python 使用XPath获取样式值大于阈值的元素

python python-3.x pdf

Python 使用XPath获取样式值大于阈值的元素,python,python-3.x,pdf,lxml,Python,Python 3.x,Pdf,Lxml,因此，简而言之，给定以下html（额外的星号标记是我自己添加的）：福酒吧我想利用X-Path来获得left属性小于阈值的所有节点，并获得left属性小于给定阈值的所有节点，例如：/div[@style（“left”）

因此，简而言之，给定以下html（额外的星号标记是我自己添加的）：


福


酒吧

我想利用X-Path来获得

left

属性小于阈值的所有节点，并获得

left

属性小于给定阈值的所有节点，例如：

/div[@style（“left”）<300]

环顾四周，这似乎是不可能的，我已经设法找到的最接近的东西是沿着的线，但是我想避免使用正则表达式来匹配数字数据，因为阈值可以变化

我试图通过Python（

lxml

module）提取这些信息。基本上，我有一个左右各列的pdf，我想将页面分为2个部分（单独获取左侧的所有内容，单独获取右侧的所有内容）。

尝试以下方法：

import lxml.html
foo = """
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:66px; top:1892px; width:91px; height:10px;">
    <span style="font-family: Times-Roman; font-size:10px">FOO  
    <br>
    </span>
</div>
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:514px; top:1892px; width:20px; height:10px;">
    <span style="font-family: Times-Roman; font-size:10px">BAR
    <br>
    </span>
</div> """

doc = lxml.html.fromstring(foo)
doc.xpath("//div[number(substring-before(substring-after(@style, 'left:'),'px;')) < 300]")[0]

import lxml.html
foo=”“”
福


酒吧


"""
doc=lxml.html.fromstring（foo）
doc.xpath（//div[number（前面的子字符串（后面的子字符串（@style，'left:'），'px；'））<300]）[0]

这选择了第一个

环顾四周这似乎不可能我很困惑，那么你的问题是什么？

import lxml.html
foo = """
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:66px; top:1892px; width:91px; height:10px;">
    <span style="font-family: Times-Roman; font-size:10px">FOO  
    <br>
    </span>
</div>
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:514px; top:1892px; width:20px; height:10px;">
    <span style="font-family: Times-Roman; font-size:10px">BAR
    <br>
    </span>
</div> """

doc = lxml.html.fromstring(foo)
doc.xpath("//div[number(substring-before(substring-after(@style, 'left:'),'px;')) < 300]")[0]