Html 使用beautifulsoup提取段落开头标记和换行符之间的文本_Html_Python 3.x_Beautifulsoup_Html Parsing

Html 使用beautifulsoup提取段落开头标记和换行符之间的文本

html python-3.x

Html 使用beautifulsoup提取段落开头标记和换行符之间的文本,html,python-3.x,beautifulsoup,html-parsing,Html,Python 3.x,Beautifulsoup,Html Parsing,我有以下HTML文档 "Year: 1932" "Total Share : 0.5 Lakhs (Pure Estimate)" "Verdict" “年份：1932年” “总份额：50万（纯估算）” “判决” 我目前正在使用BeautifulSoup来获取HTML中的其他元素，但我无法找到一种方法来获取这些行。我把它

我有以下HTML文档

<p>
  "Year: 1932"
   <br>
   <br>
  "Total Share : 0.5 Lakhs (Pure Estimate)"
  <br>
  <br>
  "Verdict"
</p>


“年份：1932年”




“总份额：50万（纯估算）”




“判决”

我目前正在使用BeautifulSoup来获取HTML中的其他元素，但我无法找到一种方法来获取这些行。我把它们排成一行

试着那样关闭br

试着这样做

from bs4 import BeautifulSoup

response_data = <Your html tags>

soup_data = BeautifulSoup(response_data, features="html5lib")
string_data = soup_data.find('p').text.strip().replace("\n", ",").replace("\"", "").split(',')
data_list=[]
for strng in string_data:
    if strng.strip():
        data_list.append(strng.strip())

print(data_list)

从bs4导入美化组
响应_数据=
soup\u data=BeautifulSoup（响应数据，features=“html5lib”）
string\u data=soup\u data.find（'p'）.text.strip（）.replace（“\n”，“，”）.replace（“\”，”）.split（“，”）
数据列表=[]
对于字符串中的strng\u数据：
如果strng.strip（）：
数据\u list.append（strng.strip（））
打印（数据列表）

你能举例说明你想要的产量吗？年份：1932年，总份额：50万（纯估算），定论。这是我正在寻找的理想输出。我添加了一个解决方案，请检查。响应数据应该有HTML文档。实际上，提到的HTML是我试图使用beautifulsoup刮取的网站的一部分。我无法更改网页的结构。我在尝试创建beautifulsoup时遇到错误@krishnaoup object.FeatureNotFound:找不到具有您请求的功能的树生成器：html5lib。您需要安装语法分析器库吗？是的，您需要安装该库。我们不能使用html.parser库而不是html5lib吗？@Krishna您可以使用它。