Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/kotlin/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Bs4按类查找p标记_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

Python Bs4按类查找p标记

Python Bs4按类查找p标记,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我试图使这个代码刮一个产品名称的网站。我正在尝试查找具有特定类的p标记。下面是代码,当我运行它时,它只打印出一个也没有。我试图刮取的元素已被注释 #<p class="product-name">Yaesu FT-DX101D HF/50MHz 100W SDR</p> import requests import urllib.request import time from bs4 import BeautifulSoup url = "https://www.

我试图使这个代码刮一个产品名称的网站。我正在尝试查找具有特定类的p标记。下面是代码,当我运行它时,它只打印出一个也没有。我试图刮取的元素已被注释

#<p class="product-name">Yaesu FT-DX101D HF/50MHz 100W SDR</p>


import requests
import urllib.request
import time
from bs4 import BeautifulSoup

url = "https://www.gigaparts.com/products/radios-amps-and-repeaters#/?Category1=Radios&Category2=Radios%2C+Amps+and+Repeaters&Category3=Radio+Transceivers&search_return=all&Category4=Base+Stations"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for i in range(10):
    a_tags = soup.find("p", {"class": "product-name"})
    print(a_tags)
    time.sleep(2)
Yaesu FT-DX101D HF/50MHz 100W SDR

导入请求 导入urllib.request 导入时间 从bs4导入BeautifulSoup url=”https://www.gigaparts.com/products/radios-amps-and-repeaters#/?Category1=Radios&Category2=Radios%2C+Amps+和+中继器和类别3=无线电+收发器和搜索\返回=所有和类别4=基站+电台” response=requests.get(url) soup=BeautifulSoup(response.text,“html.parser”) 对于范围(10)内的i: a_tags=soup.find(“p”,{“class”:“product name”}) 打印(a_标签) 时间。睡眠(2)
您可以模仿jquery xhr,但删除所有多余的、看起来很可能随时间变化的内容

import requests, re, json, ast
from bs4 import BeautifulSoup

r = requests.get('https://gigaparts-v2.ecomm-nav.com/nav.js?initial_url=https%3A%2F%2Fwww.gigaparts.com%2Fproducts%2Fradios-amps-and-repeaters%23%2F%3FCategory1%3DRadios%26Category2%3DRadios%252C%2BAmps%2Band%2BRepeaters%26Category3%3DRadio%2BTransceivers%26search_return%3Dall%26Category4%3DBase%2BStations&nxt_custom_options=formKey%3D%26groupId%3DNOT%2BLOGGED%2BIN&Category1=Radios&Category2=Radios%2C+Amps+and+Repeaters&Category3=Radio+Transceivers&search_return=all&Category4=Base+Stations&callback=jQuery0')
p = re.compile(r'jQuery0\((.*)\);')
d = ast.literal_eval(p.findall(r.text)[0])
soup = bs(d['content'], 'lxml')
product_names = [i.text for i in soup.select('.product-name')]
print(product_names )

您可以模仿jQueryXHR,但删除所有多余的内容,这些内容很可能会随着时间的推移而变化

import requests, re, json, ast
from bs4 import BeautifulSoup

r = requests.get('https://gigaparts-v2.ecomm-nav.com/nav.js?initial_url=https%3A%2F%2Fwww.gigaparts.com%2Fproducts%2Fradios-amps-and-repeaters%23%2F%3FCategory1%3DRadios%26Category2%3DRadios%252C%2BAmps%2Band%2BRepeaters%26Category3%3DRadio%2BTransceivers%26search_return%3Dall%26Category4%3DBase%2BStations&nxt_custom_options=formKey%3D%26groupId%3DNOT%2BLOGGED%2BIN&Category1=Radios&Category2=Radios%2C+Amps+and+Repeaters&Category3=Radio+Transceivers&search_return=all&Category4=Base+Stations&callback=jQuery0')
p = re.compile(r'jQuery0\((.*)\);')
d = ast.literal_eval(p.findall(r.text)[0])
soup = bs(d['content'], 'lxml')
product_names = [i.text for i in soup.select('.product-name')]
print(product_names )

如果循环和
time.sleep
的目的是“等待”直到所有元素都可见,那么它将不起作用。您只需解析响应并构建
soup
对象一次。它永远不会改变。如果该页面是用JS动态加载的,那么
BeautifulSoup
不是正确的工具。您需要使用
selenium
mechanize
或任何其他无头解决方案,无论如何,尝试
打印(response.text)
并搜索元素。如果没有,那么页面确实依赖于JS,需要使用我之前评论中的其他工具。如果循环和
时间的目的是“等待”,直到所有元素都可见,那么它将无法工作。您只需解析响应并构建
soup
对象一次。它永远不会改变。如果该页面是用JS动态加载的,那么
BeautifulSoup
不是正确的工具。您需要使用
selenium
mechanize
或任何其他无头解决方案,无论如何,尝试
打印(response.text)
并搜索元素。如果没有,那么页面确实依赖于JS,需要使用我之前评论中的其他工具