Python 靓汤：从<；p>；标签_Python_Web Scraping_Beautifulsoup_Bs4

Python 靓汤：从<；p>；标签

python web-scraping

Python 靓汤：从<；p>；标签,python,web-scraping,beautifulsoup,bs4,Python,Web Scraping,Beautifulsoup,Bs4,我的HTML页面如下所示： data = <section class="otln" itemscope="" itemtype="http://microformats.org/wiki/hCard"> <header> <h3 class="org">Website:</h3> </header> <p><a href="http://www.abilityone.gov">U.S. Ability

我的HTML页面如下所示：

 data = <section class="otln" itemscope="" itemtype="http://microformats.org/wiki/hCard">
 <header>
 <h3 class="org">Website:</h3>
 </header>
 <p><a href="http://www.abilityone.gov">U.S. AbilityOne Commission </a></p> </section>,
 <section class="otln" itemscope="" itemtype="http://microformats.org/wiki/hCard">
 <header>
 <h3 itemprop="name">Main Address:</h3>
 </header>
 <p class="spk street-address">1401 S. Clark Street<br/>Suite 715<br/><span class="locality">Arlington</span>, <span class="region">VA</span> <span class="postal-code">22202-3259</span></p> </section>,
 <section class="otln" itemscope="" itemtype="http://microformats.org/wiki/hCard">
 <header>
 <h3 itemprop="name">Phone Number:</h3>
 </header>
 <p>1-703-603-7740</p> </section>,
 <section class="otln" itemscope="" itemtype="http://microformats.org/wiki/hCard">
 <header>
 <h3 class="org">Government branch:</h3>
 </header>
 <p>Executive Department Sub-Office/Agency/Bureau</p>
 </section>

上述获取“href”的尝试抛出

TypeError:“NoneType”对象不可订阅

我有一个有效的解决方案来获取主地址、电话号码和政府部门。如果我能得到网站的“href”就更好了，即“

显示您尝试过的内容…显示您的代码并尝试此解决方案提供的链接中的解决方案将为我提供带有锚定标记的所有href，但我只希望在编辑的版本中显示特定的href。您知道这是“微数据”格式而不是HTML格式吗？显示您尝试过的内容…显示您的代码并尝试此解决方案在提供的链接中，我将看到所有带有锚定标记的href，但我只希望在编辑的版本中显示特定的href。您知道这是“微数据”格式，而不是HTML格式吗？

soup = BeautifulSoup(data,'lxml')
 website.append([l.find('a')['href'] for l in soup.find_all('section',class_='otln')])

soup = BeautifulSoup(data, 'lxml')
for h, p in zip(soup.findAll('h3'), soup.findAll('p')):
    # h is the header, p is the paragraph
    a = p.find('a') # is it the website ?
    print('%-20s\t%s' % (h.text, a['href'] if bool(a) else p.text))