刮取的Span返回None Get_Text()Python

刮取的Span返回None Get_Text()Python,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我已经抓取了到汽车的链接,现在希望跟踪这些链接并抓取关于每辆汽车的一些数据,但是我的代码返回一个空数组(如果我单独打印,则没有)。有没有办法解决这个问题 import bs4 as bs import urllib source = urllib.request.urlopen('http://www.25thstauto.com/inventory.aspx?cursort=asc&pagesize=500').read() soup = bs.BeautifulSoup(sourc

我已经抓取了到汽车的链接,现在希望跟踪这些链接并抓取关于每辆汽车的一些数据,但是我的代码返回一个空数组(如果我单独打印,则没有)。有没有办法解决这个问题

import bs4 as bs
import urllib

source = urllib.request.urlopen('http://www.25thstauto.com/inventory.aspx?cursort=asc&pagesize=500').read()
soup = bs.BeautifulSoup(source, 'lxml')

car = soup.select('a[id*=ctl00_cphBody_inv1_rptInventoryNew]')         
for a in car:
    source2 = urllib.request.urlopen('http://www.25thstauto.com/'+a.get('href')).read()
    price.append(soup.find('span', {'id': 'ctl00_cphBody_inv1_lblPrice'}))
    print(price)
输出:

[$2995]
[$2,995, $2,995]
[$2,995, $2,995, $2,995]

有两件事:您是否打印出
源变量来确认您收到的是实际页面?(我曾多次尝试刮取一个页面,但却发现没有得到正确的HTML响应。这通常可以通过在请求更好地复制浏览器的同时包含用户代理来解决。)下一步,您是否已确认正确安装并配置了
lxml
HTML解析器?(请参阅。)再次刮王击!你有github吗?@Sean Kelly没有,我在github不活跃。请接受答案。
import bs4 as bs
import urllib

source = urllib.request.urlopen('http://www.25thstauto.com/inventory.aspx?cursort=asc&pagesize=500').read()
soup = bs.BeautifulSoup(source, 'lxml')
price = []
car = soup.select('a[id*=ctl00_cphBody_inv1_rptInventoryNew]')         
for a in car:
    source2 = urllib.request.urlopen('http://www.25thstauto.com/'+a.get('href')).read()
    # make a new soup baesd on the link, do not use old soup
    soup2 = bs.BeautifulSoup(source2, 'lxml')
    price.append(soup2.find('span', {'id': 'ctl00_cphBody_inv1_lblPrice'}))
    print(price)
[<span id="ctl00_cphBody_inv1_lblPrice">$2,995</span>]
[<span id="ctl00_cphBody_inv1_lblPrice">$2,995</span>, <span id="ctl00_cphBody_inv1_lblPrice">$2,995</span>]
[<span id="ctl00_cphBody_inv1_lblPrice">$2,995</span>, <span id="ctl00_cphBody_inv1_lblPrice">$2,995</span>, <span id="ctl00_cphBody_inv1_lblPrice">$2,995</span>]