Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/linq/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 使用bs4刮取动态内容_Python 3.x_Selenium Webdriver_Beautifulsoup - Fatal编程技术网

Python 3.x 使用bs4刮取动态内容

Python 3.x 使用bs4刮取动态内容,python-3.x,selenium-webdriver,beautifulsoup,Python 3.x,Selenium Webdriver,Beautifulsoup,我正在手机网站上搜集一些信息。但它的内容看起来是动态的。我正在尝试使用selenium删除动态内容,但它也没有给我预期的输出 from bs4 import BeautifulSoup as bs from selenium import webdriver path = r'C:\\Users\\Goku\\Downloads\\Compressed\\chromedriver' driver = webdriver.Chrome(path) driver.get('https://ver

我正在手机网站上搜集一些信息。但它的内容看起来是动态的。我正在尝试使用selenium删除动态内容,但它也没有给我预期的输出

from bs4 import BeautifulSoup as bs
from selenium import webdriver
path = r'C:\\Users\\Goku\\Downloads\\Compressed\\chromedriver'

driver = webdriver.Chrome(path)

driver.get('https://versus.com/en')

res = driver.execute_script("return document.documentElement.outerHTML")

soup = bs(res, 'lxml')
box = soup.find('div', {'class':'CarouList__carouList___2WspW 
CarouList__isLandingPage___rPe4J'})

print(box)

您可以在html源代码中的
标记下找到数据。找到该文本,将字符串转换为有效的json格式,然后使用
json.loads()
读取该文本。然后你可以看看周围的结构,拿出你想要的。图像的url位于以下位置:

import requests
from bs4 import BeautifulSoup as soup
import json

my_url = 'https://versus.com/en'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}

# opening up connection, grabbing the page
response = requests.get(my_url, headers=headers)

#html parsing
page_soup = soup(response.text, "html.parser")

scripts = page_soup.find_all('script')
for script in scripts:
   if 'window.__data=' in script.text:
       jsonStr = script.text
       jsonStr = jsonStr.split('window.__data=')[-1]

       jsonData = json.loads(jsonStr)

phones = jsonData['landing']['trendings']['phone']['list']
for each in phones:
    root_url = 'https://versus.dadi.network'
    popImage = root_url + each['popImage']
    rivalImage = root_url + each['rivalImage']

    print ('%s\n%s' %(popImage, rivalImage))
输出:

https://versus.dadi.network/samsung-galaxy-a9-2018/front/front-1539337417084.variety.jpg
https://versus.dadi.network/samsung-galaxy-a50/front/front-1551183669492.variety.jpg
https://versus.dadi.network/samsung-galaxy-s10-plus/front/front-1550699605210.variety.jpg
https://versus.dadi.network/apple-iphone-xs-max/front/front-1536781345067.variety.jpg
https://versus.dadi.network/samsung-galaxy-a50/front/front-1551183669492.variety.jpg
https://versus.dadi.network/huawei-p30-lite/front/front-1555000229505.variety.jpg
https://versus.dadi.network/xiaomi-redmi-note-7/front/front-1550507767671.variety.jpg
https://versus.dadi.network/xiaomi-mi-8-lite/front/front-1537824165879.variety.jpg
https://versus.dadi.network/samsung-galaxy-s8/front/front-1490950798404.variety.jpg
https://versus.dadi.network/samsung-galaxy-a50/front/front-1551183669492.variety.jpg
https://versus.dadi.network/huawei-p20-lite/front/front-1521538430205.variety.jpg
https://versus.dadi.network/huawei-p-smart-2019/front/front-1547733931933.variety.jpg
https://versus.dadi.network/samsung-galaxy-a50/front/front-1551183669492.variety.jpg
https://versus.dadi.network/samsung-galaxy-a30/front/front-1551187893794.variety.jpg
https://versus.dadi.network/samsung-galaxy-m20/front/front-1550059143173.variety.jpg
https://versus.dadi.network/samsung-galaxy-a30/front/front-1551187893794.variety.jpg
https://versus.dadi.network/oneplus-6t/front/front-1540985964061.variety.jpg
https://versus.dadi.network/google-pixel-3/front/front-1539114763774.variety.jpg
https://versus.dadi.network/samsung-galaxy-a40/front/front-1555086727000.variety.jpg
https://versus.dadi.network/huawei-p20-lite/front/front-1521538430205.variety.jpg

你想从那个网址上得到什么?我想搜集一些数据你的问题真的很广泛。。。我想搜集“一些信息”或“智能手机比较数据”一点也不具体。您需要编辑您的问题,并将其限制为您正试图完成的一件事。首先用文字描述它,以便我们理解场景。然后发布您为完成场景而编写的代码。因为它不工作,但我们看不到它,所以您需要发布您收到的任何错误消息,或者描述结果如何不正确。你真的应该仔细阅读并使用这些技巧来澄清你的问题。如果解决方案是你所需要的,一定要通过点击“检查”来接受上面的解决方案