Python 如何提取HTML表并导出json?

Python 如何提取HTML表并导出json?,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我有一个由表组成的HTML页面,我想导出JSON。 我正在使用BeautifulSoup解析HTML文件。 HTML文件 <tbody> <td align="center" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b> Pod 0</b></font></td> <td al

我有一个由表组成的HTML页面,我想导出JSON。 我正在使用BeautifulSoup解析HTML文件。 HTML文件

<tbody>
<td align="center" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>
Pod 0</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Response Score</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Retries</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Clear Retries</b></font></td>
</tr>
<tr>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>Disk 1</b></font></td>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>2.21</b></font></td><td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>0</b></font></td><td align="center"><input type="checkbox" name="clr0" value="1"></td>
</tr>
<tr>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>Disk 2</b></font></td>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>2.01</b></font></td><td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>0</b></font></td><td align="center"><input type="checkbox" name="clr1" value="1"></td>
</tr>

<td align="center" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>
Pod 1</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Response Score</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Retries</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Clear Retries</b></font></td>
</tr>
<tr>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>Disk 1</b></font></td>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>1.89</b></font></td><td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>0</b></font></td><td align="center"><input type="checkbox" name="clr16" value="1"></td>
</tr>
<tr>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>Disk 2</b></font></td>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>1.00</b></font></td><td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>0</b></font></td><td align="center"><input type="checkbox" name="clr17" value="1"></td>
</tr>

<td align="center" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>
Pod 2</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Response Score</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Retries</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Clear Retries</b></font></td>
</tr>
<tr>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>Disk 1</b></font></td>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>2.08</b></font></td><td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>0</b></font></td><td align="center"><input type="checkbox" name="clr32" value="1"></td>
</tr>
<tr>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>Disk 2</b></font></td>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>2.15</b></font></td><td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>0</b></font></td><td align="center"><input type="checkbox" name="clr33" value="1"></td>
</tr>
</td>
</tr>
</tbody></table><br>

我相信这就是你想要的:

import json
from bs4 import BeautifulSoup as bs
from collections import defaultdict

soup = bs(your_html_above, 'html.parser')
items = soup.find_all("b")
target_dict = defaultdict(dict)

for item in items:
    if 'Pod' in item.text:
        pod = item.text.strip()    
    if 'Disk' in item.text:
        disk = item.text
        bas = item.find_next('b')
        target_dict[pod][disk] = bas.text
dict_json = json.dumps(target_dict)
print(dict_json)
输出:

{"Pod 0": {"Disk 1": "2.21", "Disk 2": "2.01"}, "Pod 1": {"Disk 1": "1.89", "Disk 2": "1.00"}, "Pod 2": {"Disk 1": "2.08", "Disk 2": "2.15"}}

提供您迄今为止尝试过的代码
{"Pod 0": {"Disk 1": "2.21", "Disk 2": "2.01"}, "Pod 1": {"Disk 1": "1.89", "Disk 2": "1.00"}, "Pod 2": {"Disk 1": "2.08", "Disk 2": "2.15"}}