Python 提取特定对象的无序列表<；部门>；：美丽之群_Python_Html_Beautifulsoup

Python 提取特定对象的无序列表<；部门>；：美丽之群

python html

Python 提取特定对象的无序列表<；部门>；：美丽之群,python,html,beautifulsoup,Python,Html,Beautifulsoup,我正在抓取我的android应用程序所需的这个。我想做的是从href属性中提取国家。这个和这个一样这是我的密码： from bs4 import BeautifulSoup import urllib2 import re html_page = urllib2.urlopen("http://www.howtocallabroad.com/a.html") soup = BeautifulSoup(html_page) li = soup.select("ul > li > a

我正在抓取我的android应用程序所需的这个。我想做的是从

href

属性中提取国家。这个和这个一样

这是我的密码：

from bs4 import BeautifulSoup
import urllib2
import re

html_page = urllib2.urlopen("http://www.howtocallabroad.com/a.html")
soup = BeautifulSoup(html_page)
li = soup.select("ul > li > a")
for link in li:
    print link.get('href')

我遇到的问题是，结果返回所有

标记，包括来自其他

div

s的标记

afghanistan/
albania/
algeria/
american-samoa/
andorra/
angola/
anguilla/
antigua/
argentina/
armenia/
aruba/
ascension/
australia/
austria/
azerbaijan/
codes.html  # not needed
nanp.html   # not needed
qa/         # not needed
forums/     # not needed

我想知道需要什么功能来完成这项工作。我只想在

中过滤

href

s。他们没有太多的信息

抱歉，这是我第一次编写python。

试试看

li = soup.select("#content ul > li > a")

而不是

li = soup.select("ul > li > a")

试一试

而不是

li = soup.select("ul > li > a")

使用

findAll（）

：

soup.find（'div'，{'id'：'content}）

按它所说的做。它查找具有

content

的

id

（将匹配

）的div标记

.findAll（）

。。。找到全部

'a'

用作查找所有a标记的参数。它返回每个标签的列表

然后我只需打印每个a标签的

href

使用

findAll（）

：

soup.find（'div'，{'id'：'content}）