Python 如何从下一页中获取价格？_Python_Web Scraping_Beautifulsoup_Python Requests

Python 如何从下一页中获取价格？

python web-scraping

Python 如何从下一页中获取价格？,python,web-scraping,beautifulsoup,python-requests,Python,Web Scraping,Beautifulsoup,Python Requests,我不熟悉python和web抓取。我使用requests和beautifulsou编写了一些代码。其中一个代码用于删除价格、名称和链接。其工作正常，如下所示： from bs4 import BeautifulSoup import requests x = 0 counter = 1 for x in range(0, 70): urls = "https://www.meisamatr.com/fa/product/cat/2-%D8%A2%D8%B1%D8%A7%DB

我不熟悉python和web抓取。我使用requests和beautifulsou编写了一些代码。其中一个代码用于删除价格、名称和链接。其工作正常，如下所示：

from bs4 import BeautifulSoup
import requests

x = 0
counter = 1
for x in range(0, 70):
    urls = "https://www.meisamatr.com/fa/product/cat/2-%D8%A2%D8%B1%D8%A7%DB%8C%D8%B4%DB%8C.html#/pagesize-24/order-new/stock-1/page-" + str(counter)
    source = requests.get(urls).text
    soup = BeautifulSoup(source, 'lxml')
    counter += 1
    x += 1
    print(urls)

    for figcaption in soup.find_all('figcaption'):
        price = figcaption.div.text
        name = figcaption.find('a', class_='title').text
        link = figcaption.find('a', class_='title')['href']

        print(price)
        print()
        print(name)
        print()
        print(link)

从bs4导入美化组
导入请求
URL=”https://www.meisamatr.com/fa/product/cat/2-%D8%A2%D8%B1%D8%A7%DB%8C%D8%B4%DB%8C.html#/pagesize-24/新订单/库存-1/第1页“
source=请求.get（URL）.text
汤=BeautifulSoup（来源“lxml”）
对于汤中的figcaption。查找所有（'figcaption'）：
price=figcaption.div.text
name=figcaption.find（'a'，class='title'）.text
link=figcaption.find（'a'，class='title'）['href']
印刷品（价格）
印刷品（名称）
打印（链接）

还有一个用于创建其他URL，我需要从中获取这些信息，当我使用print（）时，这也会提供正确的URL：

x=0
计数器=1
适用于范围（0,70）内的x
URL=”https://www.meisamatr.com/fa/product/cat/2-%D8%A2%D8%B1%D8%A7%DB%8C%D8%B4%DB%8C.html#/pagesize-24/新订单/库存-1/第页-“+str（柜台）
计数器+=1
x+=1
打印（URL）

但是，当我尝试将这两种方法结合起来，以便刮取一个页面，然后将url更改为新的，然后刮取它时，它只会在第一个页面上提供70次刮取的信息。请引导我通过这个。整个代码如下：

from bs4 import BeautifulSoup
import requests

x = 0
counter = 1
for x in range(0, 70):
    urls = "https://www.meisamatr.com/fa/product/cat/2-%D8%A2%D8%B1%D8%A7%DB%8C%D8%B4%DB%8C.html#/pagesize-24/order-new/stock-1/page-" + str(counter)
    source = requests.get(urls).text
    soup = BeautifulSoup(source, 'lxml')
    counter += 1
    x += 1
    print(urls)

    for figcaption in soup.find_all('figcaption'):
        price = figcaption.div.text
        name = figcaption.find('a', class_='title').text
        link = figcaption.find('a', class_='title')['href']

        print(price)
        print()
        print(name)
        print()
        print(link)

你的

x=0

然后用1来证明它有罪是多余的，不需要，因为你让它在这个范围内迭代

range（0，70）

。我也不知道为什么你有一个

计数器，因为你也不需要它。以下是您将如何执行此操作：
然而，我相信问题不在于迭代或循环，而在于url本身。如果手动转到下面列出的两个页面，则内容不会更改：
https://www.meisamatr.com/fa/product/cat/2-%D8%A2%D8%B1%D8%A7%DB%8C%D8%B4%DB%8C.html#/pagesize-24/order-new/stock-1/page-1

然后
https://www.meisamatr.com/fa/product/cat/2-%D8%A2%D8%B1%D8%A7%DB%8C%D8%B4%DB%8C.html#/pagesize-24/order-new/stock-1/page-2

由于站点是动态的，您需要找到一种不同的方式来逐页迭代，或者找出确切的url。因此，请尝试：
from bs4 import BeautifulSoup
import requests

for x in range(0, 70):
    try:
        urls = 'https://www.meisamatr.com/fa/product/cat/2-%D8%A2%D8%B1%D8%A7%DB%8C%D8%B4%DB%8C.html&pagesize[]=24&order[]=new&stock[]=1&page[]=' +str(x+1) + '&ajax=ok?_=1561559181560'
        source = requests.get(urls).text
        soup = BeautifulSoup(source, 'lxml')

        print('Page: %s' %(x+1))

        for figcaption in soup.find_all('figcaption'):

            price = figcaption.find('span', {'class':'new_price'}).text.strip()
            name = figcaption.find('a', class_='title').text
            link = figcaption.find('a', class_='title')['href']

            print('%s\n%s\n%s' %(price, name, link))
    except:
        break

您可以通过访问该网站并查看开发工具（Ctrl+Shift+I或右键单击“检查”）->network->XHR找到该链接
当我这样做，然后实际单击下一页时，我可以看到数据是如何呈现的，并找到了引用url
其次，您没有获得其他页面的原因是因为您的for循环中没有它。我相信我的for循环中有它们。在网站上显示代码令人困惑。我会尽量使它更好…不需要增加x
，因为它是循环变量。另外，计数器
可以完全删除，只需编写ursl=“…”+str（x+1）
变大编码！谢谢工作起来很有魅力。我不明白“&ajax=ok？=156159181560”的想法是的。好问题。明天有机会我会回答/解决这个问题，并说明这个问题的来源。@Noshad70，好的，我在如何为您找到该url中添加了这个问题。一定要接受解决方案的答案。