Python beautifulsoup4-如何同时打印文本和href？_Python_Parsing_Beautifulsoup

Python beautifulsoup4-如何同时打印文本和href？

python parsing

Python beautifulsoup4-如何同时打印文本和href？,python,parsing,beautifulsoup,Python,Parsing,Beautifulsoup,我已经阅读了10多篇关于print href，text的文章，但我找不到一篇同时打印text和href的文章该网站是我想爬网后的文字和网址这是我的代码： from urllib.request import urlopen from bs4 import BeautifulSoup page = urlopen("https://cyware.com/cyber-security-news-articles") soup = BeautifulSoup(page, 'html5lib')

我已经阅读了10多篇关于print href，text的文章，但我找不到一篇同时打印text和href的文章

该网站是

我想爬网后的文字和网址

这是我的代码：

from urllib.request import urlopen
from bs4 import BeautifulSoup

page = urlopen("https://cyware.com/cyber-security-news-articles")
soup = BeautifulSoup(page, 'html5lib')

questions = soup.find_all('h2',{"class":"post post-v2 format-image news-card get-id"})

for h2 in soup.find_all('h2'):
    print(h2.text)
    print(h2.href)

但href的结果是没有。我想知道为什么

print（h2.href）

不打印链接

问题包含href=“~”

查找所有（'h2'）

查找所有

标题元素，而不是

查找所有（'h2'）
查找所有
标题元素，而不是如果您想同时打印这两个元素，您可以通过html一次运行即可打印文章的标题和相关href。获取href时，需要搜索“a”标记
import requests
import bs4 as bs
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) 
Gecko/20100101 Firefox/20.0'}
req = requests.get('https://cyware.com/cyber-security-news-articles', 
headers=headers)

html = bs.BeautifulSoup(req.text, "lxml")

for i in html.find_all('h2',attrs={'class':"post-title post-v2-title text- 
image"}):
    print(i.text)
    for url in i.find_all('a'):
        print(url.get('href'))

如果你想同时打印文章的标题和相关href，你可以通过html一次打印。获取href时，需要搜索“a”标记
import requests
import bs4 as bs
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) 
Gecko/20100101 Firefox/20.0'}
req = requests.get('https://cyware.com/cyber-security-news-articles', 
headers=headers)

html = bs.BeautifulSoup(req.text, "lxml")

for i in html.find_all('h2',attrs={'class':"post-title post-v2-title text- 
image"}):
    print(i.text)
    for url in i.find_all('a'):
        print(url.get('href'))

在我看来，最好使用。请注意，如果您使用确切的post title post-v2-title text image
类将所有h2
作为目标，则您的代码容易受到网站上更改的影响。如果维护人员将重新排序或从h2
标题中删除其中一个类，那么您的代码将不再工作。在我看来，这是更精简、更可读的代码版本
import requests
from bs4 import BeautifulSoup

req = requests.get('https://cyware.com/cyber-security-news-articles')

soup = BeautifulSoup(req.text, 'lxml')

for a in soup.select('.post h2[class*="title"] a'):
    print(a.text, a['href'])

”.post h2[class*=“title”]a'
选择所有a
，它们是h2
的子元素，其中一个类包含title
，它们是post
类元素的子元素。
在我看来，最好使用。请注意，如果您使用确切的post title post-v2-title text image
类将所有h2
作为目标，则您的代码容易受到网站上更改的影响。如果维护人员将重新排序或从h2
标题中删除其中一个类，那么您的代码将不再工作。在我看来，这是更精简、更可读的代码版本
import requests
from bs4 import BeautifulSoup

req = requests.get('https://cyware.com/cyber-security-news-articles')

soup = BeautifulSoup(req.text, 'lxml')

for a in soup.select('.post h2[class*="title"] a'):
    print(a.text, a['href'])

'.post h2[class*=“title”]a'
选择a
中所有h2
的子类，该类包含title
，它们是post
类元素的子类。
我根据您的想法解决了这个问题，只需在汤中使用3行
即可。查找所有h2
的子类（“a”，{'rel'：True}，“action\url”）：打印（a.text）print（a.get（'href'））没问题。如果这有助于你解决问题，请不要忘记接受它作为解决方案。我是根据你的想法解决这个问题的，只需在汤中使用3行
即可。find_all（“a”，{rel'：True}，“action_url”）：print（a.text）print（a.get（'href'））没问题。如果这有助于你解决问题，请不要忘记接受它作为解决方案。