Python：访问新的<；tr>；而在另一个<；tr>；与美丽的4_Python_Html_Python 3.x_Beautifulsoup_Typeerror

Python：访问新的<；tr>；而在另一个<；tr>；与美丽的4

python html python-3.x

Python：访问新的<；tr>；而在另一个<；tr>；与美丽的4,python,html,python-3.x,beautifulsoup,typeerror,Python,Html,Python 3.x,Beautifulsoup,Typeerror,我正试图通过使用BeautifulSoup4在本地HTML文件中创建WebScrap来收集一些数据。问题是，我试图获取的信息位于具有相同类标记的不同行上。我不知道如何访问它们。下面的html屏幕截图包含我正在访问的两行，其中突出显示了我需要的数据（敏感信息被涂鸦）我目前拥有的代码是： def find_data(fileName): with open(fileName) as html_file: soup = bs(html_file, "lxml") h

我正试图通过使用BeautifulSoup4在本地HTML文件中创建WebScrap来收集一些数据。问题是，我试图获取的信息位于具有相同类标记的不同行上。我不知道如何访问它们。下面的html屏幕截图包含我正在访问的两行，其中突出显示了我需要的数据（敏感信息被涂鸦）

我目前拥有的代码是：

def find_data(fileName):
    with open(fileName) as html_file:
         soup = bs(html_file, "lxml")
    hline1 = soup.find("td", class_="headerTableEntry")
    hline2 = hline1.find_next_sibling("td")
    hline3 = hline2.find_next_sibling("td")
    hline4 = hline3.find_next_sibling("td", class_="headerTableEntry")

    line1 = hline1.text
    line2 = hline2.text
    line3 = hline3.text
    #Nothing yet for lines 4,5,6

前三行非常有效，分别给出了13%、39%和33.3%的合理比例。但是对于第4行（应该是class=headerTableEntry的第二个标记和第一个标记），我得到一个错误“'NoneType'对象不可调用”

我的问题是，是否有其他方法可以访问所有6个数据单元，或者是否有方法编辑我编写的第4行的工作方式？谢谢你的帮助，非常感谢

标记

不在另一个

标记内，因为您可以看到第一个

标记与

一起关闭，因此下一个

不是上一个的同级，因此它不返回任何值。它在下一个

标记中

Pandas是一个解析html

标记（这是一个很好的包）的好包。它实际上在引擎盖下使用了beautifulsoup。只需获取完整的表，并根据需要的列对表进行切片：

html_file = '''<table>
<tr>
<td class="headerName">File:</td>
<td class="HeaderValue">Some Value</td>
<td></td>
<td class="headerName">Lines:</td>
<td class="headerTableEntry">13</td>
<td class="headerTableEntry">39</td>
<td class="headerTableEntry" style="back-ground-color:LightPink">33.3 %</td>
</tr>
<tr>
<td class="headerName">Date:</td>
<td class="HeaderValue">2020-06-18 11:15:19</td>
<td></td>
<td class="headerName">Branches:</td>
<td class="headerTableEntry">10</td>
<td class="headerTableEntry">12</td>
<td class="headerTableEntry" style="back-ground-color:#FFFF55">83.3 %</td>
</tr>
</table>'''



import pandas as pd

df = pd.read_html(html_file)[0]
df = df.iloc[:,3:]

输出：

def find_data(fileName):
    with open(fileName) as html_file:
        df = pd.read_html(html_file)[0].iloc[:,3:]
        print (df)

print (df)
           3   4   5       6
0     Lines:  13  39  33.3 %
1  Branches:  10  12  83.3 %

请用实际的html而不是图片编辑您的问题。