Python 无法从html代码中提取表_Python_Beautifulsoup_Html Table_Html Parser

Python 无法从html代码中提取表

python

Python 无法从html代码中提取表,python,beautifulsoup,html-table,html-parser,Python,Beautifulsoup,Html Table,Html Parser,我正在解析下面给出的一个html表（它是完整html代码的一部分），但代码不起作用。有人能帮我一下吗？有一个错误说“表没有findall属性”。代码是： import re import HTMLParser from urllib2 import urlopen import urllib2 from bs4 import BeautifulSoup url = 'http://164.100.47.132/LssNew/Members/Biography.aspx?mpsno=4064'

我正在解析下面给出的一个html表（它是完整html代码的一部分），但代码不起作用。有人能帮我一下吗？有一个错误说“表没有findall属性”。代码是：

import re
import HTMLParser
from urllib2 import urlopen
import urllib2
from bs4 import BeautifulSoup

url = 'http://164.100.47.132/LssNew/Members/Biography.aspx?mpsno=4064'


url_data = urlopen(url).read()
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)
title = soup.title
final_tit = title.string

table = soup.find('table',id = "ctl00_ContPlaceHolderMain_Bioprofile1_Datagrid1")

tr = table.findall('tr')
for tr in table:
  cols = tr.findAll('td')
  for td in cols:
      text = ''.join(td.find(text=True))
      print text+"|",
  print



<table style="WIDTH: 565px">
        <tr>
            <td vAlign="top" align="left"><img id="ctl00_ContPlaceHolderMain_Bioprofile1_Image1" src="http://164.100.47.132/mpimage/photo/4064.jpg" style="height:140px;border-width:0px;" /></td>
            <td vAlign="top"><table cellspacing="0" rules="all" border="2" id="ctl00_ContPlaceHolderMain_Bioprofile1_Datagrid1" style="border-color:#FAE3C3;border-width:2px;border-style:Solid;width:433px;border-collapse:collapse;">
        <tr>
            <td>
                                <table align="center" height="30px">
                                    <tr valign="top">
                                        <td align="center" valign="top" class="gridheader1">Aaroon Rasheed,Shri J.M.</td>
                                    </tr>
                                </table>
                                <table height="110px">
                                    <tr>
                                        <td align="left" class="darkerb" width="133px" valign="top">Constituency&nbsp;&nbsp;&nbsp;:</td>
                                        <td align="left" valign="top" class="griditem2" width="300px">Theni      (Tamil Nadu                                                  )</td>
                                    </tr>
                                    <tr>
                                        <td align="left" width="133px" class="darkerb" valign="top">
                                            Party Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;:</td>
                                        <td align="left" width="300px" valign="top" class="griditem2">Indian National Congress(INC)</td>
                                    </tr>
                                    <tr>
                                        <td align="left" class="darkerb" valign="top" width="133px">
                                            Email Address :
                                        </td>
                                        <td align="left" valign="top" class="griditem2" width="300px">jm.aaronrasheed@sansad.nic.in</td>
                                    </tr>
                                </table>
                            </td>
        </tr>
    </table></td>
        </tr>
    </table>

重新导入
导入HTMLPasser
从urllib2导入urlopen
导入urllib2
从bs4导入BeautifulSoup
url='1〕http://164.100.47.132/LssNew/Members/Biography.aspx?mpsno=4064'
url\u data=urlopen（url）.read（）
html_page=urllib2.urlopen（url）
soup=BeautifulSoup（html_页面）
title=soup.title
final_tit=title.string
table=soup.find（'table'，id=“ctl00\u ContPlaceHolderMain\u Bioprofile1\u Datagrid1”）
tr=table.findall（'tr'）
对于表中的tr：
cols=tr.findAll（'td'）
对于cols中的td：
text=''.join（td.find（text=True））
打印文本+“|”，
打印
阿龙·拉希德，Shri J.M。
选区：
泰米尔纳德邦（泰米尔纳德邦）
缔约方名称：
印度国民大会（公司）
电邮地址：
吉咪。aaronrasheed@sansad.nic.in

调用该方法，而不是

findall

：

tr = table.find_all('tr')