Python BeautifulSoup:get_text()从bs4标记返回空字符串

Python BeautifulSoup:get_text()从bs4标记返回空字符串,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正试图从中提取信息 首先,我解析页面: import requests from bs4 import BeautifulSoup page = requests.get("https://www.theguardian.com/politics/2019/oct/20/boris-johnson-could-be-held-in-contempt-of-court-over-brexit-letter") soup = BeautifulSoup(page.content, 'html.pa

我正试图从中提取信息

首先,我解析页面:

import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.theguardian.com/politics/2019/oct/20/boris-johnson-could-be-held-in-contempt-of-court-over-brexit-letter")
soup = BeautifulSoup(page.content, 'html.parser')
然后我从标题开始:

title = soup.find('meta', property="og:title")
如果我把它打印出来,我会得到:

<meta content="Boris Johnson could be held in contempt of court over Brexit letter" property="og:title"/>

但是,当我运行
title.get_text()
时,结果是一个空字符串:
'


我的错误在哪里?

那是因为标签实际上没有定义任何文本。在本例中,您要查找的“文本”包含在带有属性
content
标记中。因此,您需要提取
内容的值

import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.theguardian.com/politics/2019/oct/20/boris-johnson-could-be-held-in-contempt-of-court-over-brexit-letter")
soup = BeautifulSoup(page.content, 'html.parser')

title = soup.find('meta', property="og:title")['content']
输出:

print (title)
Boris Johnson could be held in contempt of court over Brexit letter
print (title.attrs)
{'property': 'og:title', 'content': 'Boris Johnson could be held in contempt of court over Brexit letter'}
您可以使用
.attrs
获取所有属性和值。这将返回给定标记中属性和值的字典(键:值对):

title = soup.find('meta', property="og:title")

print (title.attrs)
输出:

print (title)
Boris Johnson could be held in contempt of court over Brexit letter
print (title.attrs)
{'property': 'og:title', 'content': 'Boris Johnson could be held in contempt of court over Brexit letter'}

这是因为实际上没有任何由标记定义的文本。在本例中,您要查找的“文本”包含在带有属性
content
标记中。因此,您需要提取
内容的值

import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.theguardian.com/politics/2019/oct/20/boris-johnson-could-be-held-in-contempt-of-court-over-brexit-letter")
soup = BeautifulSoup(page.content, 'html.parser')

title = soup.find('meta', property="og:title")['content']
输出:

print (title)
Boris Johnson could be held in contempt of court over Brexit letter
print (title.attrs)
{'property': 'og:title', 'content': 'Boris Johnson could be held in contempt of court over Brexit letter'}
您可以使用
.attrs
获取所有属性和值。这将返回给定标记中属性和值的字典(键:值对):

title = soup.find('meta', property="og:title")

print (title.attrs)
输出:

print (title)
Boris Johnson could be held in contempt of court over Brexit letter
print (title.attrs)
{'property': 'og:title', 'content': 'Boris Johnson could be held in contempt of court over Brexit letter'}

谢谢,效果很好。有可能得到整个元素列表吗?不仅仅是
['content']
,而是想看到所有这些内容吗?谢谢,效果很好。有可能得到整个元素列表吗?不仅仅是
['content']
,而是要看到它们?