Python 3.x 在python中创建web爬虫时检索锚定标记_Python 3.x_Beautifulsoup_Web Crawler_Pycharm_Urllib

Python 3.x 在python中创建web爬虫时检索锚定标记

python-3.x web-crawler pycharm

Python 3.x 在python中创建web爬虫时检索锚定标记,python-3.x,beautifulsoup,web-crawler,pycharm,urllib,Python 3.x,Beautifulsoup,Web Crawler,Pycharm,Urllib,我正在创建一个网络爬虫，并试图在pycharm中运行程序来检索URL的锚定标记。但我得到的输出与我输入的URL完全相同。代码如下： import urllib.request,urllib.parse,urllib.error from bs4 import BeautifulSoup import ssl ctx=ssl.create_default_context() ctx.check_hostname=False ctx.verify_mo

我正在创建一个网络爬虫，并试图在pycharm中运行程序来检索URL的锚定标记。但我得到的输出与我输入的URL完全相同。代码如下：

    import urllib.request,urllib.parse,urllib.error
    from bs4 import BeautifulSoup
    import ssl
    ctx=ssl.create_default_context()
    ctx.check_hostname=False
    ctx.verify_mode=ssl.CERT_NONE

    url=input("https://en.wikipedia.org/wiki/Apple_Inc.")
    html=urllib.request.urlopen(url, context=ctx).read()
    soup=BeautifulSoup(html, 'html.parser')

    tags=soup("a")
    for tag in tags:
        print(tag.get("href",None))

这里需要注意的一点是，在urllib库中，只有urllib.error显示为used语句，urllib.request和urllib.parse都显示为未使用的语句，我不明白为什么

此程序的输出为：

我正在使用python 3.5.1和pycharm社区版。

您真的应该使用

请求

包。它对于爬行非常有用。退房

这是您转换的代码：

import requests
from bs4 import BeautifulSoup

request = requests.get("https://en.wikipedia.org/wiki/Apple_Inc.").text
soup = BeautifulSoup(request, "html.parser")

anchor = soup.find_all("a", href=True)
for a in anchor:
    print (a["href"])

您确实应该使用

请求

包。它对于爬行非常有用。退房

这是您转换的代码：

import requests
from bs4 import BeautifulSoup

request = requests.get("https://en.wikipedia.org/wiki/Apple_Inc.").text
soup = BeautifulSoup(request, "html.parser")

anchor = soup.find_all("a", href=True)
for a in anchor:
    print (a["href"])