Python 如何排除某些我不知道的beautifulsoup结果';你不想要吗?

Python 如何排除某些我不知道的beautifulsoup结果';你不想要吗?,python,beautifulsoup,hyperlink,python-requests,screen-scraping,Python,Beautifulsoup,Hyperlink,Python Requests,Screen Scraping,我在尝试排除Beauty soup程序中给出的结果时遇到问题这是我的代码: from bs4 import BeautifulSoup import requests URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications' page = requests.get(URL) soup = BeautifulSoup(page.content, 'html.parser') for link in

我在尝试排除Beauty soup程序中给出的结果时遇到问题这是我的代码:

from bs4 import BeautifulSoup
import requests

URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

for link in soup.find_all('a'):
    print(link.get('href'))
我不想得到以“#”开头的结果,例如:#cite#u ref-18


我已尝试使用for循环,但收到以下错误消息:
KeyError:0
您可以使用
str.startswith()
方法:

from bs4 import BeautifulSoup
import requests

URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

for tag in soup.find_all('a'):
    link = tag.get('href')
    if not str(link).startswith('#'):
        print(link)

您可以使用CSS选择器
a[href]:not([href^=“#”])
。这将选择具有
href=
属性的所有
标记,但不选择以
#
字符开头的标记:

import requests
from bs4 import BeautifulSoup

URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

for link in soup.select('a[href]:not([href^="#"])'):
    print(link['href'])