python使用beautifulsoup使用完整URL提取标题_Python_Python Requests

python使用beautifulsoup使用完整URL提取标题

python

python使用beautifulsoup使用完整URL提取标题,python,python-requests,Python,Python Requests,我是一名初级Python程序员。为了练习，我试图从网页上获取文章标题及其URL的列表。到目前为止，我已经提出了以下代码： import requests from bs4 import BeautifulSoup as BS with requests.session() as r: headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88

我是一名初级Python程序员。为了练习，我试图从网页上获取文章标题及其URL的列表。到目前为止，我已经提出了以下代码：

import requests
from bs4 import BeautifulSoup as BS

with requests.session() as r:
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0'}
    r = requests.get('https://0xdf.gitlab.io', verify=False, headers=headers)
    response = r.text
    soup = BS(response, 'html.parser')
    tags = soup.find_all('a')

    for tag in tags:
        links = tag.get('href')
        if links[0] == '/':
            appended_link = 'https://0xdf.gitlab.io' + links
            print(appended_link)
        elif links[0] == '#':
            pass
        else:
            print(links)

然而，它并没有提取我感兴趣的内容。我想要文章的标题和完整的URL

谢谢

您可以使用此示例从该页面+URL中提取标题：

import requests
from bs4 import BeautifulSoup as BS

url = "https://0xdf.gitlab.io/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

for link in soup.select(".post-link"):
    print(
        "{:<40} {}".format(
            link.get_text(strip=True), "https://0xdf.gitlab.io" + link["href"]
        )
    )

通常，您最好发布相关HTML代码的片段。这样，我们就不需要访问URL了。如果“title”是指元素的文本，那么您可以通过

title=tag.text

获得它。谢谢您的帮助。这就是我要找的。