Python 如何排除某些我不知道的beautifulsoup结果'；你不想要吗？_Python_Beautifulsoup_Hyperlink_Python Requests_Screen Scraping

Python 如何排除某些我不知道的beautifulsoup结果'；你不想要吗？

python hyperlink

Python 如何排除某些我不知道的beautifulsoup结果'；你不想要吗？,python,beautifulsoup,hyperlink,python-requests,screen-scraping,Python,Beautifulsoup,Hyperlink,Python Requests,Screen Scraping,我在尝试排除Beauty soup程序中给出的结果时遇到问题这是我的代码： from bs4 import BeautifulSoup import requests URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications' page = requests.get(URL) soup = BeautifulSoup(page.content, 'html.parser') for link in

我在尝试排除Beauty soup程序中给出的结果时遇到问题这是我的代码：

from bs4 import BeautifulSoup
import requests

URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

for link in soup.find_all('a'):
    print(link.get('href'))

我不想得到以“#”开头的结果，例如：#cite#u ref-18

我已尝试使用for循环，但收到以下错误消息：

KeyError:0

您可以使用

str.startswith（）

方法：

from bs4 import BeautifulSoup
import requests

URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

for tag in soup.find_all('a'):
    link = tag.get('href')
    if not str(link).startswith('#'):
        print(link)

您可以使用CSS选择器

a[href]：not（[href^=“#”]）

。这将选择具有

href=

属性的所有

标记，但不选择以

字符开头的标记：

import requests
from bs4 import BeautifulSoup

URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

for link in soup.select('a[href]:not([href^="#"])'):
    print(link['href'])