Python中的解析表标记_Python_Beautifulsoup

Python中的解析表标记

python

Python中的解析表标记,python,beautifulsoup,Python,Beautifulsoup,我正在尝试使用python从HTML文件中提取数据。我正在尝试从文件中提取表内容以下是表格的HTML内容： <table class="radiobutton" id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay" onclick="return false;"> <tbody> <tr> <td> <input id="c

我正在尝试使用python从HTML文件中提取数据。我正在尝试从文件中提取表内容

以下是表格的HTML内容：

    <table class="radiobutton" id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay" onclick="return false;">
   <tbody>
      <tr>
         <td>
            <input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_0" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="1" />
            <label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_0">Fitting</label>
         </td>
      </tr>
      <tr>
         <td>
            <input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_1" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="2" />
            <label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_1">Material</label>
         </td>
      </tr>
      <tr>
         <td>
            <input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_2" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="4" />
            <label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_2">Appliance</label>
         </td>
      </tr>
      <tr>
         <td>
            <input checked="checked" id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_3" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="8" />
            <label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_3">Apparatus</label>
         </td>
      </tr>
      <tr>
         <td>
            <input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_4" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="16" />
            <label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_4">Other procedures</label>
         </td>
      </tr>
      <tr>
         <td>
            <input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_5" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="32" />
            <label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_5">Alternative fuel oils</label>
         </td>
      </tr>
      <tr>
         <td>
            <input id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_6" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="64" />
            <label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_6">Other compliance method:</label>
         </td>
      </tr>
   </tbody>
</table>

如何打印单选按钮的标签以及选中的属性

示例：对于下面的标签，它应该打印：Fitting 和“选中”以下所述的输入标签：

<label for="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_0">Fitting</label>

<input checked="checked" id="ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay_3" name="ctl00$bodyPlaceHolder$ctl00$Reg$rblTypeDisplay" type="radio" value="8"/>

配件

以下代码有效，但需要更好的解决方案：

from bs4 import BeautifulSoup
    from pyparsing import makeHTMLTags
    with open('.\ABC.html', 'r') as read_file:
        data = read_file.read()

    soup = BeautifulSoup(data, 'html.parser')
    table = soup.find("table", attrs={"id":"ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay"})

    spotterTag, spotterEndTag = makeHTMLTags("input")

    for spotter in spotterTag.searchString(table):
        if spotter.checked == 'checked':
            label = soup.find("label", attrs={"for":spotter.id})
            print(str(label)[str(label).find('>')+1:str(label).find('<',2)])
            print(spotter.checked)

从bs4导入美化组
从pyparsing导入makeHTMLTags
打开（'.\ABC.html'，r'）作为读取文件：
data=read_file.read（）
soup=BeautifulSoup（数据'html.parser'）
table=soup.find（“table”，attrs={“id”：“ctl00\u bodyPlaceHolder\u ctl00\u Reg\u rblTypeDisplay”}）
spotterTag，spotterEndTag=makeHTMLTags（“输入”）
对于spotterTag.searchString（表）中的观察者：
如果spotter.checked==“checked”：
label=soup.find（“label”，attrs={“for”：spotter.id}）
print（str（label）[str（label）.find（'>'）+1:str（label）.find（'我不确定是否理解正确，但是否要将输入和标签压缩在一起？如果是，可以使用zip（）
函数。例如（数据
是HTML字符串）：
from bs4 import BeautifulSoup
    from pyparsing import makeHTMLTags
    with open('.\ABC.html', 'r') as read_file:
        data = read_file.read()

    soup = BeautifulSoup(data, 'html.parser')
    table = soup.find("table", attrs={"id":"ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay"})

    spotterTag, spotterEndTag = makeHTMLTags("input")

    for spotter in spotterTag.searchString(table):
        if spotter.checked == 'checked':
            label = soup.find("label", attrs={"for":spotter.id})
            print(str(label)[str(label).find('>')+1:str(label).find('<',2)])
            print(spotter.checked)

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

print('{:^25} {:^15} {:^15}'.format('Text', 'Value', 'Checked'))
for inp, lbl in zip(soup.select('table#ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay input'),
                    soup.select('table#ctl00_bodyPlaceHolder_ctl00_Reg_rblTypeDisplay label')):
    print('{:<25} {:^15} {:^15}'.format(lbl.text, inp['value'], 'checked' if 'checked' in inp.attrs else '-'))

          Text                 Value          Checked    
Fitting                          1               -       
Material                         2               -       
Appliance                        4               -       
Apparatus                        8            checked    
Other procedures                16               -       
Alternative fuel oils           32               -       
Other compliance method:        64               -