Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/328.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Can';t解析网页中的不同产品链接_Python_Python 3.x_Web Scraping_Beautifulsoup_Python Requests - Fatal编程技术网

Python Can';t解析网页中的不同产品链接

Python Can';t解析网页中的不同产品链接,python,python-3.x,web-scraping,beautifulsoup,python-requests,Python,Python 3.x,Web Scraping,Beautifulsoup,Python Requests,我用Python创建了一个脚本,用于从网页获取不同的产品链接。虽然我知道那个网站的内容是动态的,但我还是尝试了传统的方式让你们知道我已经尝试过了。我在开发工具中查找API,但没有找到。没有任何方法可以通过请求获取这些链接吗 到目前为止,我写过: import requests from bs4 import BeautifulSoup link = "https://www.amazon.com/stores/node/10699640011" def fetch_product_link

我用Python创建了一个脚本,用于从网页获取不同的产品链接。虽然我知道那个网站的内容是动态的,但我还是尝试了传统的方式让你们知道我已经尝试过了。我在开发工具中查找API,但没有找到。没有任何方法可以通过请求获取这些链接吗

到目前为止,我写过:

import requests
from bs4 import BeautifulSoup

link = "https://www.amazon.com/stores/node/10699640011"

def fetch_product_links(url):
    res = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
    soup = BeautifulSoup(res.text,"lxml")
    for item_link in soup.select("[id^='ProductGrid-'] li[class^='style__itemOuter__'] > a"):
        print(item_link.get("href"))

if __name__ == '__main__':
    fetch_product_links(link)

如何使用请求从该站点获取不同的产品链接?

我认为您只需要ASIN,您可以从网络选项卡中看到的另一个url结构中收集ASIN,也就是说,您可以显著缩短最终url。不过,您确实需要向原始url发出请求,以获取在第二个url中使用的标识符。返回146个链接

import requests, re, json

node = '10699640011'

with requests.Session() as s:
    r = s.get(f'https://www.amazon.com/stores/node/{node}')
    p = re.compile(r'var slotsStr = "\[(.*?,){3} share\]";')
    identifier = p.findall(r.text)[0]
    identifier = identifier.strip()[:-1]
    r = s.get(f'https://www.amazon.com/stores/slot/{identifier}?node={node}')
    p = re.compile(r'var config = (.*?);')
    data = json.loads(p.findall(r.text)[0])
    asins = data['content']['ASINList']
    links = [f'https://www.amazon.com/dp/{asin}' for asin in asins]
    print(links)

编辑:

具有两个给定节点:

import requests, re, json
from bs4 import BeautifulSoup as bs

nodes = ['3039806011','10699640011']

with requests.Session() as s:
    for node in nodes:
        r = s.get(f'https://www.amazon.com/stores/node/{node}')
        soup = bs(r.content, 'lxml')
        identifier = soup.select('.stores-widget-btf:not([id=share],[id*=RECOMMENDATION])')[-1]['id']
        r = s.get(f'https://www.amazon.com/stores/slot/{identifier}?node={node}')
        p = re.compile(r'var config = (.*?);')
        data = json.loads(p.findall(r.text)[0])
        asins = data['content']['ASINList']
        links = [f'https://www.amazon.com/dp/{asin}' for asin in asins]
        print(links)

简直太棒了@QHarr。非常感谢。希望它能跨链接工作。第一个正则表达式可以改进。如果您有时间@QHarr,可以选择请求。这个节点怎么样?
3039806011
?如果可以的话,我今天稍后会看一看。我需要修改第一个正则表达式。