Python Webscraper不会迭代
此代码:Python Webscraper不会迭代,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,此代码: from bs4 import BeautifulSoup from urllib2 import urlopen f = urlopen("http://www.groupon.co.uk/").read() bs = BeautifulSoup(f) for tag in bs.find_all('ul', {'id': 'jCitiesSelectBox'}): print tag.li['onclick'] 只打印第一个标记,不打印jCitiesSelectBo
from bs4 import BeautifulSoup
from urllib2 import urlopen
f = urlopen("http://www.groupon.co.uk/").read()
bs = BeautifulSoup(f)
for tag in bs.find_all('ul', {'id': 'jCitiesSelectBox'}):
print tag.li['onclick']
只打印第一个标记,不打印jCitiesSelectBox中的所有标记,我看不出原因。选择器可能是向后的。每个文档只允许一个标签具有特定的
id
。您指定的是“仅当标记位于
标记中时,才使用id=“jCitiesSelectBox
查找标记”
编辑:
您可能希望在带有id=“jCitiesSelectBox”
的标记中找到所有
标记,类似于:
cities_list = bs.find('ul', {'id': 'jCitiesSelectBox'})
for tag in cities_list.find_all('li'):
print tag['onclick']
(未测试)
已测试,对我有效。这不会返回任何内容。基本上,jcitiseselectbox是div的id,其中有我要刮取的整个列表
#!/usr/bin/python
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
from urllib2 import urlopen
f = urlopen("http://www.groupon.co.uk/").read()
bs = BeautifulSoup(f)
f = urlopen("http://www.groupon.co.uk/").read()
bs = soup(f)
tags = bs.findAll('ul', attrs={'id' : 'jCitiesSelectBox'})
for tag in tags:
lip = tag.findAll('li')
for li in lip:
print li['onclick']