在Python上使用BeautifulSoup4时，为什么我要打印&书信电报；p>&引用；元素，它将导致'；无'；？_Python_Beautifulsoup

在Python上使用BeautifulSoup4时，为什么我要打印&书信电报；p>&引用；元素，它将导致'；无'；？

python

在Python上使用BeautifulSoup4时，为什么我要打印&书信电报；p>&引用；元素，它将导致'；无'；？,python,beautifulsoup,Python,Beautifulsoup,我的代码是： html_doc = "file:///C:/Users/Me/Desktop/Convert%20URL%20to%20HTML%20Link.html" soup = BeautifulSoup(html_doc, "html.parser") print(soup.p) 使用othersoup.a/p/title，也不会产生任何结果，尽管我确信HTML文档中应该有这些元素这是指向HTML文档的实际URL链接：假设HTML是您下载的目录中的文件必须先打开文件，然后进行

我的代码是：

html_doc = "file:///C:/Users/Me/Desktop/Convert%20URL%20to%20HTML%20Link.html"
soup = BeautifulSoup(html_doc, "html.parser")


print(soup.p)

使用othersoup.a/p/title，也不会产生任何结果，尽管我确信HTML文档中应该有这些元素

这是指向HTML文档的实际URL链接：

假设HTML是您下载的目录中的文件

必须先打开文件，然后进行读取和刮取：

import requests from bs4 import BeautifulSoup with requests.session() as s_request: url_to_scrape = 'https://www.textfixer.com/html/convert-url-to-html-link.php' request_page = s_request.get(url_to_scrape) soup = BeautifulSoup(request_page.content, 'html.parser') soup = BeautifulSoup(content, 'html.parser') selections_p = soup.find_all("p") print(selections_p)
示例如下：

from bs4 import BeautifulSoup file_dir = "C:/Users/Me/Desktop/Convert URL to HTML Link.html" with open (file_dir , "r") as files_f: content = files_f.read() files_f.close() soup = BeautifulSoup(content, 'html.parser') selections_p = soup.find_all("p") print(selections_p )
如果您正在从网站上进行抓取，则应先请求页面，然后进行抓取：

import requests from bs4 import BeautifulSoup with requests.session() as s_request: url_to_scrape = 'https://www.textfixer.com/html/convert-url-to-html-link.php' request_page = s_request.get(url_to_scrape) soup = BeautifulSoup(request_page.content, 'html.parser') soup = BeautifulSoup(content, 'html.parser') selections_p = soup.find_all("p") print(selections_p)

假设html是您下载的目录中的一个文件
必须先打开文件，然后进行读取和刮取：

import requests from bs4 import BeautifulSoup with requests.session() as s_request: url_to_scrape = 'https://www.textfixer.com/html/convert-url-to-html-link.php' request_page = s_request.get(url_to_scrape) soup = BeautifulSoup(request_page.content, 'html.parser') soup = BeautifulSoup(content, 'html.parser') selections_p = soup.find_all("p") print(selections_p)
示例如下：

from bs4 import BeautifulSoup file_dir = "C:/Users/Me/Desktop/Convert URL to HTML Link.html" with open (file_dir , "r") as files_f: content = files_f.read() files_f.close() soup = BeautifulSoup(content, 'html.parser') selections_p = soup.find_all("p") print(selections_p )
如果您正在从网站上进行抓取，则应先请求页面，然后进行抓取：

import requests from bs4 import BeautifulSoup with requests.session() as s_request: url_to_scrape = 'https://www.textfixer.com/html/convert-url-to-html-link.php' request_page = s_request.get(url_to_scrape) soup = BeautifulSoup(request_page.content, 'html.parser') soup = BeautifulSoup(content, 'html.parser') selections_p = soup.find_all("p") print(selections_p)

欢迎来到SO！你到底想得到什么数据
soup
是整个HTML文档，因此它没有
p
属性。您需要
find
或
find\u all
来遍历HTML树以定位具有特定属性（例如作为
标记）的元素，例如soup.find\u all（“p”）。感谢@ggorlen的欢迎。我试图获取标记中的所有文本。我尝试了你所说的，并将打印（soup.p）更改为打印（soup.find_all（“p”）。现在只剩下[]。你没发现什么吗？看起来是的。我没有仔细看你的代码，但是你从来没有打开过文件，只是把路径插入了BS。打开并将文件读入字符串，然后将其输入BS。欢迎使用SO！你到底想得到什么数据soup 是整个HTML文档，因此它没有p 属性。您需要find 或find\u all 来遍历HTML树以定位具有特定属性（例如作为标记）的元素，例如soup.find\u all（“p”）。感谢@ggorlen的欢迎。我试图获取标记中的所有文本。我尝试了你所说的，并将打印（soup.p）更改为打印（soup.find_all（“p”）。现在只剩下[]。你没发现什么吗？看起来是的。我没有仔细看你的代码，但是你从来没有打开过文件，只是把路径插入了BS。打开并将文件读入字符串，然后将其输入到BS中。我不会想到这一点，谢谢。但是，当突出显示带有open（file_dir，“r”）的作为files_f: ，Errno 22，无效参数时，弹出窗口会显示状态。阅读更多信息时，此错误意味着我正在尝试将目录作为文件打开。我应该如何打开文件并读取文件？请在中检查您的文件目录。选中，包括“斜杠”和html文件的位置。目录斜杠必须是单正斜杠“/”或双反斜杠“\\”。我不会想到的，谢谢。但是，当突出显示带有open（file_dir，“r”）的作为files_f: ，Errno 22，无效参数时，弹出窗口会显示状态。阅读更多信息时，此错误意味着我正在尝试将目录作为文件打开。我应该如何打开文件并读取文件？请在中检查您的文件目录。选中，包括“斜杠”和html文件的位置。目录斜杠必须是单正斜杠“/”或双反斜杠“\\”。