Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/334.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 网站抓取发现未移动到下一个项目_Python_Beautifulsoup - Fatal编程技术网

Python 网站抓取发现未移动到下一个项目

Python 网站抓取发现未移动到下一个项目,python,beautifulsoup,Python,Beautifulsoup,这个词的用法是总结在kijiji出售的所有不同种类的商品,并将它们与价格配对。 但我似乎无论如何都找不到用一类价格来增加靓汤的价格,我只能坚持第一个价格。Find_all也不起作用,因为它只是打印出整个blob,而不是将其与每个项目组合在一起 from bs4 import BeautifulSoup import requests def kijiji(): source = requests.get('https://www.kijiji.ca/b-mens-shoes/mark

这个词的用法是总结在kijiji出售的所有不同种类的商品,并将它们与价格配对。 但我似乎无论如何都找不到用一类价格来增加靓汤的价格,我只能坚持第一个价格。Find_all也不起作用,因为它只是打印出整个blob,而不是将其与每个项目组合在一起

from bs4 import BeautifulSoup
import requests


def kijiji():
    source = requests.get('https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274').text
    soup = BeautifulSoup(source,'lxml')
    b = soup.find('div', class_='price')
    for link in soup.find_all('a',class_ = 'title'):
        a = link.get('href')
        fulllink = 'http://kijiji.ca'+a
        print(fulllink)
        b = soup.find('div', class_='price')
        print(b.prettify())
kijiji()

我在这个问题上坚持了一个小时,当我把这个贴在堆栈上时,我立刻想出了一个主意,乱七八糟的代码,但很管用

如果您有Beautiful soup 4.7.1或更高版本,您可以使用以下css选择器
select()
,这要快得多

代码:

from bs4 import BeautifulSoup
import requests


def kijiji():
    source = requests.get('https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274').text
    soup = BeautifulSoup(source,'lxml')
    b = soup.find('div', class_='price')
    for link in soup.find_all('a',class_ = 'title'):
        a = link.get('href')
        fulllink = 'http://kijiji.ca'+a
        print(fulllink)
        print(b.prettify())
        b = b.find_next('div', class_='price')
kijiji()

或者使用
find_all()
使用下面的代码块

import requests
from bs4 import BeautifulSoup

res=requests.get("https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274").text
soup=BeautifulSoup(res,'html.parser')
for item in soup.select('.info-container'):
    fulllink = 'http://kijiji.ca' + item.find_next('a', class_='title')['href']
    print(fulllink)
    price=item.select_one('.price').text.strip()
    print(price)

恭喜你找到了答案。我会给你另一个解决方案,仅供参考

import requests
from bs4 import BeautifulSoup

res=requests.get("https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274").text
soup=BeautifulSoup(res,'html.parser')
for item in soup.find_all('div',class_='info-container'):
    fulllink = 'http://kijiji.ca' + item.find_next('a', class_='title')['href']
    print(fulllink)
    price=item.find_next(class_='price').text.strip()
    print(price)
结果:

import requests
from simplified_scrapy.simplified_doc import SimplifiedDoc
def kijiji():
  url = 'https://www.kijiji.ca/b-mens-shoes/markham-york-region/c15117001l1700274'
  source = requests.get(url).text
  doc = SimplifiedDoc(source)
  infos = doc.getElements('div',attr='class',value='info-container')
  for info in infos:
    price = info.select('div.price>text()')
    a = info.select('a.title')
    link = doc.absoluteUrl(url,a.href)
    title = a.text
    print (price)
    print (link)
    print (title)
kijiji()

这里有更多的例子:

不知道可以使用select(),这样更方便!
$310.00
https://www.kijiji.ca/v-mens-shoes/markham-york-region/jordan-4-oreo-2015/1485391828
Jordan 4 Oreo (2015)
$560.00
https://www.kijiji.ca/v-mens-shoes/markham-york-region/yeezy-boost-350-yecheil-reflectives/1486296645
Yeezy Boost 350 Yecheil Reflectives
...