Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/286.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如果网站更改了文本的位置,则使用selenium和python从页面获取文本_Python_Selenium_Selenium Webdriver_Web Scraping_Xpath - Fatal编程技术网

如果网站更改了文本的位置,则使用selenium和python从页面获取文本

如果网站更改了文本的位置,则使用selenium和python从页面获取文本,python,selenium,selenium-webdriver,web-scraping,xpath,Python,Selenium,Selenium Webdriver,Web Scraping,Xpath,我正在尝试从这个名为“last matches”的特定表中获取最后的结果 例如,xpath第一次会议的结果如下: int1 = driver.find_element_by_xpath("//*[@id=\"sr-container\"]/div/div/div[3]/div/div/div/div[7]/div[2]/div/div/div/div/div[1]/table/tbody/tr[1]/td[5]/div/div[2]").get_attr

我正在尝试从这个名为“last matches”的特定表中获取最后的结果

例如,
xpath
第一次会议的结果如下:


int1 = driver.find_element_by_xpath("//*[@id=\"sr-container\"]/div/div/div[3]/div/div/div/div[7]/div[2]/div/div/div/div/div[1]/table/tbody/tr[1]/td[5]/div/div[2]").get_attribute("innerText")
   


int2 = driver.find_element_by_xpath("//*[@id=\"sr-container\"]/div/div/div[3]/div/div/div/div[7]/div[2]/div/div/div/div/div[1]/table/tbody/tr[2]/td[5]/div/div[2]").get_attribute("innerText")


第二次会议是这样的:


int1 = driver.find_element_by_xpath("//*[@id=\"sr-container\"]/div/div/div[3]/div/div/div/div[7]/div[2]/div/div/div/div/div[1]/table/tbody/tr[1]/td[5]/div/div[2]").get_attribute("innerText")
   


int2 = driver.find_element_by_xpath("//*[@id=\"sr-container\"]/div/div/div[3]/div/div/div/div[7]/div[2]/div/div/div/div/div[1]/table/tbody/tr[2]/td[5]/div/div[2]").get_attribute("innerText")


如果网页的形状与我发布的一样,我可以从该表中提取所有结果,我的问题是当我有这样一个会议时,
xpath
是不一样的,任何东西都不能工作

是否有更好的方法来定位此
最后匹配表
,并提取数据,即使页面形状不同


感谢大家的帮助

是的,您可能需要介绍xpath轴:

XPATH:

//strong[text()='Last matches']/ancestor::div[contains(@class,'component-header no-margin')]/../following-sibling::div[1]/descendant::table/descendant::td[5]/div/child::div[2]/div
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


url = 'https://s5.sir.sportradar.com/sports4africa/en/1/season/80526/headtohead/334075/340986/match/27195664'
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.get(url)
driver.implicitly_wait(10)
WebDriverWait(driver, 15).until(EC.presence_of_all_elements_located((By.XPATH, "//strong[text()='Last matches']/ancestor::div[6]//tbody/tr")))
rows= driver.find_elements_by_xpath("//strong[text()='Last matches']/ancestor::div[6]//tbody/tr")
output = []
for res in rows:
    score = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']").get_attribute("innerText")
    output.append(score)
print(output)

阅读有关xpath轴的更多信息

是的,您可能需要介绍xpath轴:

XPATH:

//strong[text()='Last matches']/ancestor::div[contains(@class,'component-header no-margin')]/../following-sibling::div[1]/descendant::table/descendant::td[5]/div/child::div[2]/div
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


url = 'https://s5.sir.sportradar.com/sports4africa/en/1/season/80526/headtohead/334075/340986/match/27195664'
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.get(url)
driver.implicitly_wait(10)
WebDriverWait(driver, 15).until(EC.presence_of_all_elements_located((By.XPATH, "//strong[text()='Last matches']/ancestor::div[6]//tbody/tr")))
rows= driver.find_elements_by_xpath("//strong[text()='Last matches']/ancestor::div[6]//tbody/tr")
output = []
for res in rows:
    score = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']").get_attribute("innerText")
    output.append(score)
print(output)

阅读有关xpath轴的更多信息

这是针对您的两个链接测试的

问题是HTML中有两个单独的表(左表和右表)用于
最后一次匹配
。为了得到所有的结果,您需要对它们进行迭代。我在下面使用xpath使其成为动态的,因为两个表的xpath完全相同,只是括号之间只有一个数字
[]

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.Chrome()
driver.get("https://s5.sir.sportradar.com/sports4africa/en/1/season/82128/headtohead/613958/33714/match/27197856")

tables = [1,2]
results = []
for table in tables:
    last_match_table = f"(//table[@class='table'])[{table}]//tbody/tr"
    scores = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.XPATH,(last_match_table))))
    for score in scores:
        results.append(score.get_attribute("innerText"))

for row in results:
    text_split = row.split()
    final = ' '.join(text_split[4:])
    print(final)
注意,我还使用了更通用的xPath。当DOM中发生更改时(如您所见),这不会受到影响。给定此路径,该页面上有4个表
//table[@class='table']
,2个用于
最后一次匹配
,2个用于
下一次匹配
,因此我们只希望针对前2个,因此动态迭代列表
tables=[1,2]
,以填充XPath

结果:

Bolivar 2:0 CD Real Tomayapo
CD Real Tomayapo 2:1 Blooming
Guabira 0:2 CD Real Tomayapo
CD Real Tomayapo 0:0 Real Potosi
Royal Pari 4:2 CD Real Tomayapo
CD Real Tomayapo 1:0 Always Ready
Aurora 3:0 Independiente Petrolero
Aurora 1:1 Bolivar
Blooming 1:0 Aurora
Aurora 2:1 Guabira
Real Potosi 1:1 Aurora
Aurora 0:8 Royal Pari

这是针对您的两个链接进行的测试

问题是HTML中有两个单独的表(左表和右表)用于
最后一次匹配
。为了得到所有的结果,您需要对它们进行迭代。我在下面使用xpath使其成为动态的,因为两个表的xpath完全相同,只是括号之间只有一个数字
[]

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.Chrome()
driver.get("https://s5.sir.sportradar.com/sports4africa/en/1/season/82128/headtohead/613958/33714/match/27197856")

tables = [1,2]
results = []
for table in tables:
    last_match_table = f"(//table[@class='table'])[{table}]//tbody/tr"
    scores = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.XPATH,(last_match_table))))
    for score in scores:
        results.append(score.get_attribute("innerText"))

for row in results:
    text_split = row.split()
    final = ' '.join(text_split[4:])
    print(final)
注意,我还使用了更通用的xPath。当DOM中发生更改时(如您所见),这不会受到影响。给定此路径,该页面上有4个表
//table[@class='table']
,2个用于
最后一次匹配
,2个用于
下一次匹配
,因此我们只希望针对前2个,因此动态迭代列表
tables=[1,2]
,以填充XPath

结果:

Bolivar 2:0 CD Real Tomayapo
CD Real Tomayapo 2:1 Blooming
Guabira 0:2 CD Real Tomayapo
CD Real Tomayapo 0:0 Real Potosi
Royal Pari 4:2 CD Real Tomayapo
CD Real Tomayapo 1:0 Always Ready
Aurora 3:0 Independiente Petrolero
Aurora 1:1 Bolivar
Blooming 1:0 Aurora
Aurora 2:1 Guabira
Real Potosi 1:1 Aurora
Aurora 0:8 Royal Pari

另一个好的选择是在xpath中使用
祖先
。我将主定位器绑定到表名,这样会更可靠

使用它,您可以找到其他定位器及其文本。只要用正确的路径将它们放入循环即可。 在子xpath
//td
中,表示元素名为
td的主定位器的直接子级。

我的解决方案:

//strong[text()='Last matches']/ancestor::div[contains(@class,'component-header no-margin')]/../following-sibling::div[1]/descendant::table/descendant::td[5]/div/child::div[2]/div
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


url = 'https://s5.sir.sportradar.com/sports4africa/en/1/season/80526/headtohead/334075/340986/match/27195664'
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.get(url)
driver.implicitly_wait(10)
WebDriverWait(driver, 15).until(EC.presence_of_all_elements_located((By.XPATH, "//strong[text()='Last matches']/ancestor::div[6]//tbody/tr")))
rows= driver.find_elements_by_xpath("//strong[text()='Last matches']/ancestor::div[6]//tbody/tr")
output = []
for res in rows:
    score = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']").get_attribute("innerText")
    output.append(score)
print(output)
输出:
第一: ['0:4','3:4','2:2','0:1','3:0','2:2','0:4','1:0','2:1','1:1','1:2','2:4']

第二: ['2:0','2:1','0:2','0:0','4:2','1:0','3:0','1:1','1:0','2:1','1:1','0:8']

更新: 我能做的交换分数的最快方法是分别获得两个分数,将它们放在一个单独的列表中,然后使用
zip
交换。 结果是两个元组列表

first_score = []
second_score = []
for res in rows:
    first = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']/div[1]").get_attribute("innerText")
    first_score.append(first)
    second = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']/div[3]").get_attribute("innerText")
    second_score.append(second)
first_list = list(zip(first_score, second_score))
second_list = list(zip(second_score, first_score))
print(first_list)
print(second_list)
[('0', '4'), ('3', '4'), ('2', '2'), ('0', '1'), ('3', '0'), ('2', '2'), ('0', '4'), ('1', '0'), ('2', '1'), ('1', '1'), ('1', '2'), ('2', '4')]
[('4', '0'), ('4', '3'), ('2', '2'), ('1', '0'), ('0', '3'), ('2', '2'), ('4', '0'), ('0', '1'), ('1', '2'), ('1', '1'), ('2', '1'), ('4', '2')]
结果是两个元组列表

first_score = []
second_score = []
for res in rows:
    first = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']/div[1]").get_attribute("innerText")
    first_score.append(first)
    second = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']/div[3]").get_attribute("innerText")
    second_score.append(second)
first_list = list(zip(first_score, second_score))
second_list = list(zip(second_score, first_score))
print(first_list)
print(second_list)
[('0', '4'), ('3', '4'), ('2', '2'), ('0', '1'), ('3', '0'), ('2', '2'), ('0', '4'), ('1', '0'), ('2', '1'), ('1', '1'), ('1', '2'), ('2', '4')]
[('4', '0'), ('4', '3'), ('2', '2'), ('1', '0'), ('0', '3'), ('2', '2'), ('4', '0'), ('0', '1'), ('1', '2'), ('1', '1'), ('2', '1'), ('4', '2')]

有更有效的方法可以做到这一点,但我建议单独问一个问题。

另一个好的选择是在xpath中使用
祖先。我将主定位器绑定到表名,这样会更可靠

使用它,您可以找到其他定位器及其文本。只要用正确的路径将它们放入循环即可。 在子xpath
//td
中,表示元素名为
td的主定位器的直接子级。

我的解决方案:

//strong[text()='Last matches']/ancestor::div[contains(@class,'component-header no-margin')]/../following-sibling::div[1]/descendant::table/descendant::td[5]/div/child::div[2]/div
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


url = 'https://s5.sir.sportradar.com/sports4africa/en/1/season/80526/headtohead/334075/340986/match/27195664'
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.get(url)
driver.implicitly_wait(10)
WebDriverWait(driver, 15).until(EC.presence_of_all_elements_located((By.XPATH, "//strong[text()='Last matches']/ancestor::div[6]//tbody/tr")))
rows= driver.find_elements_by_xpath("//strong[text()='Last matches']/ancestor::div[6]//tbody/tr")
output = []
for res in rows:
    score = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']").get_attribute("innerText")
    output.append(score)
print(output)
输出:
第一: ['0:4','3:4','2:2','0:1','3:0','2:2','0:4','1:0','2:1','1:1','1:2','2:4']

第二: ['2:0','2:1','0:2','0:0','4:2','1:0','3:0','1:1','1:0','2:1','1:1','0:8']

更新: 我能做的交换分数的最快方法是分别获得两个分数,将它们放在一个单独的列表中,然后使用
zip
交换。 结果是两个元组列表

first_score = []
second_score = []
for res in rows:
    first = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']/div[1]").get_attribute("innerText")
    first_score.append(first)
    second = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']/div[3]").get_attribute("innerText")
    second_score.append(second)
first_list = list(zip(first_score, second_score))
second_list = list(zip(second_score, first_score))
print(first_list)
print(second_list)
[('0', '4'), ('3', '4'), ('2', '2'), ('0', '1'), ('3', '0'), ('2', '2'), ('0', '4'), ('1', '0'), ('2', '1'), ('1', '1'), ('1', '2'), ('2', '4')]
[('4', '0'), ('4', '3'), ('2', '2'), ('1', '0'), ('0', '3'), ('2', '2'), ('4', '0'), ('0', '1'), ('1', '2'), ('1', '1'), ('2', '1'), ('4', '2')]
结果是两个元组列表

first_score = []
second_score = []
for res in rows:
    first = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']/div[1]").get_attribute("innerText")
    first_score.append(first)
    second = res.find_element_by_xpath(".//td[5]//div[@class=' no-wrap']/div[3]").get_attribute("innerText")
    second_score.append(second)
first_list = list(zip(first_score, second_score))
second_list = list(zip(second_score, first_score))
print(first_list)
print(second_list)
[('0', '4'), ('3', '4'), ('2', '2'), ('0', '1'), ('3', '0'), ('2', '2'), ('0', '4'), ('1', '0'), ('2', '1'), ('1', '1'), ('1', '2'), ('2', '4')]
[('4', '0'), ('4', '3'), ('2', '2'), ('1', '0'), ('0', '3'), ('2', '2'), ('4', '0'), ('0', '1'), ('1', '2'), ('1', '1'), ('2', '1'), ('4', '2')]

有更有效的方法可以做到这一点,但我建议单独问一个问题。

谢谢你的回答,如何分别获得两个团队的结果?要获得两个团队的名称和scor`使用定位器获得
分数
//td[5]//div[@class='row flex items xs middle']
但您需要拆分结果才能删除\n。如果不进行拆分,结果如下:
['CF America\n0:4\nTigres UANL'、'Guadalajara\n3:4\nTigres UANL'、'Tigres UANL\n2:2\nMazatlan'、'Tijuana\n0:1\nTigres UANL'、'Tigres UANL\n3:0\nCruz Azul'、'蒙特雷\n2:2\nTigres UANL'、'CF America\n0:4\nTigres UANL'、'Puebla FC\n1:0\nCF America'、'CF America'、'CF America\n2:1\nCruz Azul'、'蒙特雷\n1:1\nCuz'、's Laguna\nCam'、's艾丽卡\n2:4\nGuadalajara']
您希望得到什么样的输出?请阅读如何拆分列表以删除\n以及如何打印结果。这应该很简单。如果您无法完成,请编写。我指的是主队的最后结果,客队的最后结果,以及列表中的每一个队的结果。让我们来看看。感谢您的回答,这是怎么可能的要分别获取两个团队的结果?要获取两个团队的名称和scor`使用locator for
得分
//td[5]//div[@class='row flex items xs middle']
,但您需要拆分结果以摆脱\n。如果不拆分,则结果如下:
['CF America\n0:4\nTigres UANL'、'Guadalajara\n3:4\nTigres UANL'、'Tigres UANL\n2:2\nMazatlan'、'Tijuana\n0:1\nTigres UANL',