使用Python刮取URL链接
这是我的密码:使用Python刮取URL链接,python,xpath,href,Python,Xpath,Href,这是我的密码: from selenium import webdriver from bs4 import BeautifulSoup driver = webdriver.Firefox() url = 'https://www.coteur.com/cotes-foot.php' driver.get(url) links = driver.find_elements_by_xpath('//a[contains(@href, "match/cotes-")]')
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Firefox()
url = 'https://www.coteur.com/cotes-foot.php'
driver.get(url)
links = driver.find_elements_by_xpath('//a[contains(@href, "match/cotes-")]')
driver.close()
我想从该网站上删除与足球比赛相关的所有URL链接:
我总是把所有的
元素都刮到包括足球比赛的地方。但是如何提取链接到这些足球比赛的URL呢?试试以下方法:
import urllib.request, urllib.error, urllib.parse #Import required modules
from bs4 import BeautifulSoup
import ssl
ctx=ssl.create_default_context() #Check certificates, you can skip this for some
#websites
ctx.check_hostname=False
ctx.verify_mode=ssl.CERT_NONE
userInput=input("Enter URL: ")
url=userInput if len(userInput)!=0 else "https://www.coteur.com/cotes-foot.php"
html=urllib.request.urlopen(url, context=ctx).read()
soup=BeautifulSoup(html, "html.parser")
tags=soup("a") #Find all html "a" tags, and print
for tag in tags: #The "a" tag is used to create link
print(tag.get("href", None))
这个程序打印它在页面上找到的所有链接
如果只需要与足球相关的链接,可以将最后一行修改为:
if 'soccer' in tag.get("href", None):
print(tag.get("href", None))
我试过这个:
n = 0
while n < len(links):
links[n] = links[n].text
n = n + 1
print(links)
driver.findElement(By.linkText(link[1])).click()
n=0
而n
以下是输出:
hao@hao-ThinkPad-T420:~$ ./coteur2.py
['FS METTA/LU - JPFS/FK Spartaks', 'Fc Ararat Erevan - Lori Vanadzor', 'Stabaek If - Mjondalen', 'Viking - FK BODO/GLIMT', 'Aalesund - Molde', 'Odd Ballklubb Grenland - Sandefjord', 'Rosenborg - Kristiansund Bk', 'Ac Horsens - Esbjerg Fb', 'Fk Sutjeska Niksic - Ofk Titograd', 'FK Tukums 2000/Tss - Fk Jelgava', 'Borrusia Monch. - Vfl Wolfsburg', 'Fc Admira Wacker Modling - Scr Altach', 'Skn St. Pölten - Sv Mattersburg', 'Sv Wehen Wiesbaden - 1. Fc Nuremberg', 'Hambourg Sv - Vfl Osnabrück', 'Greuther Furth - 1. Fc Heidenheim 1846', 'Tallinna Jk Legion - Jk Tulevik Viljandi', 'Hnk Hajduk Split - Nk Varteks Varazdin', 'Nk Istra 1961 Pula - Inter Zapresic', 'Sepsi Osk Sfantu Gheorghe - Voluntari', 'Varda Se - Zalaegerszeg Te', 'Nd Mura 05 - Nk Celje', 'Ab Argir - EB/STREYMUR', 'Villarreal - Real Majorque', 'Getafe - Espanyol']
Traceback (most recent call last):
File "./coteur2.py", line 23, in <module>
driver.findElement(By.linkText(link[1])).click()
AttributeError: 'WebDriver' object has no attribute 'findElement'
hao@hao-ThinkPad-T420:~$/coteur2.py
['FS METTA/LU-JPFS/FK Spartaks'、'Fc Ararat Erevan-Lori Vanadzor'、'Stabaek If-Mjondalen'、'Viking-FK BODO/GLIMT'、'Aalesund-Molde'、'Odd Ballklubb Grenland-Sandefjord'、'Rosenborg-Kristiansund Bk'、'Ac Horsens-Esbjerg Fb'、'FK Sutjeska Niksic-of K Titograd'、'FK Tukums 2000/Tss-FK-FK-FK-Jelgava'、'FK/Tss-FK-Tssc海军上将瓦克·莫德林-Scr阿尔塔奇、“圣彼得堡-马特斯堡”、“威斯巴登-纽伦堡俱乐部”、“汉堡Sv-Vfl奥斯纳布吕克俱乐部”、“格雷瑟·福思-海登海姆1846俱乐部”、“塔林纳Jk军团-图列维克·维尔扬迪俱乐部”、“哈伊杜克斯普利特-瓦拉兹丁俱乐部”、“1961年伊斯特拉普拉-国际扎普雷西奇俱乐部”、“塞普·斯芬图·盖奥吉-沃尔特”“ari”、“Varda Se-Zalagerszeg Te”、“Nd Mura 05-Nk Celje”、“Ab Argir-EB/STREYMUR”、“Villarreal-Real Majorque”、“Getafe-Espanyol”]
回溯(最近一次呼叫最后一次):
文件“/coteur2.py”,第23行,在
driver.findElement(By.linkText(link[1])。单击()
AttributeError:“WebDriver”对象没有属性“findElement”
您正在使用通过xpath查找元素获取webelements,您需要从中获取href
from selenium import webdriver
driver = webdriver.Firefox()
url = 'https://www.coteur.com/cotes-foot.php'
driver.get(url)
links = []
for i in driver.find_elements_by_xpath('//a[contains(@href, "match/cotes-")]'):
links.append(i.get_attribute('href'))
print(links)
driver.close()
到目前为止你试过什么?