PYTHON:在html源代码中查找隐藏元素
我正在尝试获取下面的url,它隐藏在页面源代码中,但隐藏在脚本标记中PYTHON:在html源代码中查找隐藏元素,python,selenium,Python,Selenium,我正在尝试获取下面的url,它隐藏在页面源代码中,但隐藏在脚本标记中 <script> window.runParams = {"descriptionModule":{"descriptionUrl":"https://aeproductsourcesite.alicdn.com/product/description/pc/v2/en_US/desc.htm?productId=32212764152&key=HTB1GwO_aVY
<script>
window.runParams = {"descriptionModule":{"descriptionUrl":"https://aeproductsourcesite.alicdn.com/product/description/pc/v2/en_US/desc.htm?productId=32212764152&key=HTB1GwO_aVY7gK0jSZKzM7OikpXac.zip&token=f32528ddd34e37aecddda1c7778d5f0c"} .... </script>
window.runParams={“descriptionModule”:{“descriptionUrl”:https://aeproductsourcesite.alicdn.com/product/description/pc/v2/en_US/desc.htm?productId=32212764152&key=HTB1GwO_aVY7gK0jSZKzM7OikpXac.zip&token=f32528ddd34e37aecddda1c7778d5f0c"} ....
我已经获得了源代码,但不确定如何将url提取为对象
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
import re
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument("--test-type")
CHROMEDRIVER_PATH = '/Users/reezalaq/PycharmProjects/wholesale/driver/chromedriver'
options = Options()
options.headless = False
driver = webdriver.Chrome(CHROMEDRIVER_PATH, options=options)
driver.get('https://www.aliexpress.com/item/32212764152.html')
html = driver.page_source
def run_script():
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
body = driver.find_element_by_css_selector('body')
body.send_keys(Keys.PAGE_UP)
count = 0
while count < 3: #13
run_script()
count+=1
time.sleep(5)
x = html.startswith('https://aeproductsourcesite.alicdn.com')
print(x)
从selenium导入webdriver
从selenium.webdriver.common.keys导入密钥
从selenium导入webdriver
从selenium.webdriver.chrome.options导入选项
导入时间
进口稀土
options=webdriver.ChromeOptions()
options.add_参数('--ignore certificate errors')
options.add_参数(“--testtype”)
CHROMEDRIVER_PATH='/Users/reezalaq/Pycharm项目/批发/司机/CHROMEDRIVER'
选项=选项()
options.headless=False
driver=webdriver.Chrome(CHROMEDRIVER\u路径,options=options)
司机,上车https://www.aliexpress.com/item/32212764152.html')
html=driver.page\u源
def run_脚本():
执行脚本(“window.scrollTo(0,document.body.scrollHeight);”)
body=驱动程序。通过\u css\u选择器(“body”)查找\u元素\u
正文。发送密钥(密钥。向上翻页)
计数=0
计数<3时:#13
运行脚本()
计数+=1
时间。睡眠(5)
x=html.startswith('https://aeproductsourcesite.alicdn.com')
打印(x)
如何过滤源代码中的所有其他内容并拥有一个对象
x=”“您可以使用正则表达式提取值:
import re
#..
url = re.compile(r'"descriptionUrl":"([^"]*)"').search(html).group(1)