Python 如何排除某些我不知道的beautifulsoup结果';你不想要吗?
我在尝试排除Beauty soup程序中给出的结果时遇到问题这是我的代码:Python 如何排除某些我不知道的beautifulsoup结果';你不想要吗?,python,beautifulsoup,hyperlink,python-requests,screen-scraping,Python,Beautifulsoup,Hyperlink,Python Requests,Screen Scraping,我在尝试排除Beauty soup程序中给出的结果时遇到问题这是我的代码: from bs4 import BeautifulSoup import requests URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications' page = requests.get(URL) soup = BeautifulSoup(page.content, 'html.parser') for link in
from bs4 import BeautifulSoup
import requests
URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
for link in soup.find_all('a'):
print(link.get('href'))
我不想得到以“#”开头的结果,例如:#cite#u ref-18
我已尝试使用for循环,但收到以下错误消息:
KeyError:0
您可以使用str.startswith()
方法:
from bs4 import BeautifulSoup
import requests
URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
for tag in soup.find_all('a'):
link = tag.get('href')
if not str(link).startswith('#'):
print(link)
您可以使用CSS选择器
a[href]:not([href^=“#”])
。这将选择具有href=
属性的所有
标记,但不选择以#
字符开头的标记:
import requests
from bs4 import BeautifulSoup
URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
for link in soup.select('a[href]:not([href^="#"])'):
print(link['href'])