Python Selenium-web为相同的内容抓取多个URL，但XPath略有不同_Python_Html_Loops_Selenium_Xpath

Python Selenium-web为相同的内容抓取多个URL，但XPath略有不同

python html loops selenium xpath

Python Selenium-web为相同的内容抓取多个URL，但XPath略有不同,python,html,loops,selenium,xpath,Python,Html,Loops,Selenium,Xpath,我使用Selenium为同一个表刮取多个URL，但是这些表的xpath略有不同以下是我的编码： my_urls = ["https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001548760", "https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001366010", "https://www.se

我使用Selenium为同一个表刮取多个URL，但是这些表的xpath略有不同

以下是我的编码：

my_urls = ["https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001548760",
"https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001366010",
"https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001164390"]

driver = webdriver.Chrome()
for url in my_urls:
    driver.get(url)    
    export_table=driver.find_elements_by_xpath('')[0]
    export_table.text

xpath1:

/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[3]/td/table/tbody

xpath2:

/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[2]/td/table/tbody

如何使用一个xpath从这些URL中提取内容？并将所有结果导出到字典

谢谢你的帮助

如果希望从每个xpath获取文本，请尝试此方法。如果希望每个url都有一个路径，那么应该使用字典在url和xpath之间建立映射。你可以遍历字典来做你想做的事情

导入json
从selenium导入webdriver
我的URL=[”https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001548760",
"https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001366010",
"https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001164390"]
xpath1=“”/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[3]/td/table/tbody”“”
xpath2=“”/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[2]/td/table/tbody”“”
def getpath（元素）：
尝试：
返回元素[0]。文本
除索引器外，索引器为：
一无所获
导出_表={}
driver=webdriver.Chrome（“chromedriver.exe”）
对于my_url中的url：
获取驱动程序（url）
export_table[url]={path:getpath（driver.find_elements_by_xpath（path）），用于[xpath1，xpath2]}中的路径
驱动程序关闭（）
json.dumps（导出表）

输出

{
  "https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001548760": {
    "/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[3]/td/table/tbody": "Issuer Filings Transaction Date Type of Owner\\nFacebook Inc 0001326801 2019-04-26 director, 10 percent owner, officer: COB and CEO",
    "/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[2]/td/table/tbody": "Mailing Address\\nC/O FACEBOOK, INC.\\n1601 WILLOW ROAD\\nMENLO PARK CA 94025"
  },
  "https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001366010": {
    "/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[3]/td/table/tbody": "Issuer Filings Transaction Date Type of Owner\\nFacebook Inc 0001326801 2019-07-08 director, officer: Chief Operating Officer\\nSVMK Inc. 0001739936 2019-02-21 director\\nWALT DISNEY CO/\\nCurrent Name:TWDC Enterprises 18 Corp. 0001001039 2017-11-22 director\\nSTARBUCKS CORP 0000829224 2011-11-14 director\\neHealth, Inc. 0001333493 2008-06-10 director",
    "/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[2]/td/table/tbody": "Mailing Address\\n1 FACEBOOK WAY\\nMENLO PARK CA 94025"
  },
  "https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001164390": {
    "/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[3]/td/table/tbody": null,
    "/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[2]/td/table/tbody": "Issuer Filings Transaction Date Type of Owner\\nACE LTD\\nCurrent Name:Chubb Ltd 0000896159 2019-06-06 officer: Executive Vice President*"
  }
}

如果要从每个xpath获取文本，请尝试此操作。如果希望每个url都有一个路径，那么应该使用字典在url和xpath之间建立映射。你可以遍历字典来做你想做的事情

导入json
从selenium导入webdriver
我的URL=[”https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001548760",
"https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001366010",
"https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001164390"]
xpath1=“”/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[3]/td/table/tbody”“”
xpath2=“”/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[2]/td/table/tbody”“”
def getpath（元素）：
尝试：
返回元素[0]。文本
除索引器外，索引器为：
一无所获
导出_表={}
driver=webdriver.Chrome（“chromedriver.exe”）
对于my_url中的url：
获取驱动程序（url）
export_table[url]={path:getpath（driver.find_elements_by_xpath（path）），用于[xpath1，xpath2]}中的路径
驱动程序关闭（）
json.dumps（导出表）

输出

{
  "https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001548760": {
    "/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[3]/td/table/tbody": "Issuer Filings Transaction Date Type of Owner\\nFacebook Inc 0001326801 2019-04-26 director, 10 percent owner, officer: COB and CEO",
    "/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[2]/td/table/tbody": "Mailing Address\\nC/O FACEBOOK, INC.\\n1601 WILLOW ROAD\\nMENLO PARK CA 94025"
  },
  "https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001366010": {
    "/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[3]/td/table/tbody": "Issuer Filings Transaction Date Type of Owner\\nFacebook Inc 0001326801 2019-07-08 director, officer: Chief Operating Officer\\nSVMK Inc. 0001739936 2019-02-21 director\\nWALT DISNEY CO/\\nCurrent Name:TWDC Enterprises 18 Corp. 0001001039 2017-11-22 director\\nSTARBUCKS CORP 0000829224 2011-11-14 director\\neHealth, Inc. 0001333493 2008-06-10 director",
    "/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[2]/td/table/tbody": "Mailing Address\\n1 FACEBOOK WAY\\nMENLO PARK CA 94025"
  },
  "https://www.sec.gov/cgi-bin/own-disp?action=getowner&CIK=0001164390": {
    "/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[3]/td/table/tbody": null,
    "/html/body/div/table[1]/tbody/tr[2]/td/table/tbody/tr[2]/td/table/tbody": "Issuer Filings Transaction Date Type of Owner\\nACE LTD\\nCurrent Name:Chubb Ltd 0000896159 2019-06-06 officer: Executive Vice President*"
  }
}

一个xpath是否可以与特定url一起工作，而不是另一个？如果是这样的话，您应该在每个xpath上使用

try:…除了：…最后

表达式try这个xpath-//tbody/tr[2]/td//td/table/tbody一个xpath可以使用特定的url，而不是另一个？如果是这样，您应该在每个xpath上使用

try:…除了：…最后

表达式try this xpath-//tbody/tr[2]/td//td/table/tbodyThank非常感谢您的帮助。很高兴从创建函数中学习并完成工作！非常感谢你的帮助。很高兴从创建函数中学习并完成工作！