Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/341.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/templates/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从URL-Selenium列表中获取_属性(';innerHTML';)?_Python_Loops_Selenium - Fatal编程技术网

Python 如何从URL-Selenium列表中获取_属性(';innerHTML';)?

Python 如何从URL-Selenium列表中获取_属性(';innerHTML';)?,python,loops,selenium,Python,Loops,Selenium,我正在Python中使用Selenium进行web抓取。我正在使用xpath提取网站的部分内容 我想知道如何使用循环提取URL列表并将其保存到字典中 mylist_URLs = ['https://www.sec.gov/cgi-bin/own-disp? action=getowner&CIK=0001560258', 'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0000034088', 'https:/

我正在Python中使用Selenium进行web抓取。我正在使用xpath提取网站的部分内容

我想知道如何使用循环提取URL列表并将其保存到字典中

mylist_URLs = ['https://www.sec.gov/cgi-bin/own-disp? action=getowner&CIK=0001560258',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0000034088',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0001048911']
我下面的编码只适用于1个url

driver = webdriver.Chrome(r'xxx\chromedriver.exe')
driver.get('https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0000104169')

driver.find_elements_by_xpath('/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td')[0].get_attribute('innerHTML')

谢谢您的帮助。

您可以在WebDriverWait中使用simple for each循环,以确保在获取innerHTML之前加载了表

添加以下导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
mylist_URLs = ['https://www.sec.gov/cgi-bin/own-disp? action=getowner&CIK=0001560258',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0000034088',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0001048911']
# open the browser
driver = webdriver.Chrome(r'xxx\chromedriver.exe')
# iterate through all the urls
for url in mylist_URLs:
    print(url)
    driver.get(url)
    # wait for the table to present
    element = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.XPATH, "(//table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td)[1]"))
    # now get the element innerHTML
    print(element.get_attribute('innerHTML')))
脚本:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
mylist_URLs = ['https://www.sec.gov/cgi-bin/own-disp? action=getowner&CIK=0001560258',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0000034088',
'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=0001048911']
# open the browser
driver = webdriver.Chrome(r'xxx\chromedriver.exe')
# iterate through all the urls
for url in mylist_URLs:
    print(url)
    driver.get(url)
    # wait for the table to present
    element = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.XPATH, "(//table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td)[1]"))
    # now get the element innerHTML
    print(element.get_attribute('innerHTML')))

我收到了“SyntaxError:解析#element.get_属性('innerHTML')时出现意外EOF”。我还发现“SyntaxError:expected EOF while parsing element=WebDriverWait(driver,30)。直到(EC.presence_of_element_位于((By.XPATH,“(//table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td)[1]”)在行尾用缺少的
更新了答案。仍然收到“语法错误:无效语法:打印(element.get_属性('innerHTML'))”已解决!在element=WebDriverWait(driver,30)之后应该有一个“)”。直到(EC.presence_of_element_位于((By.XPATH,“(//table[1]/tbody/tr[2]/td/table/tbody/tr[1]/td)[1]”)