Python 按bs4标记拆分/在两个标记之间获取文本_Python_Python 3.x_Split_Beautifulsoup

Python 按bs4标记拆分/在两个标记之间获取文本

python python-3.x

Python 按bs4标记拆分/在两个标记之间获取文本,python,python-3.x,split,beautifulsoup,Python,Python 3.x,Split,Beautifulsoup,目前我正在尝试读取网页上两个标签之间的文本这是我目前的代码： soup = BeautifulSoup(r.text, 'lxml') text = soup.text tag_one = soup.select_one('div.first-header') tage_two = soup.select_one('div.second-header') text = text.split(tag_one)[1] text = text.split(tage_two)[0] p

目前我正在尝试读取网页上两个标签之间的文本

这是我目前的代码：

soup = BeautifulSoup(r.text, 'lxml')

text = soup.text

tag_one = soup.select_one('div.first-header')


tage_two = soup.select_one('div.second-header')



text = text.split(tag_one)[1]
text = text.split(tage_two)[0]

print(text)

基本上，我试图通过识别第一个和第二个标题之间的标签来获取文本。我计划通过按第一个标记和第二个标记进行拆分来实现这一点。这可能吗？有没有更聪明的方法

例如：如果你看：

我想找到一种方法来提取历史下的文本，方法是识别历史、特征和哲学的标签，并按这些标签进行拆分。

你不能按你希望的方式来做，因为BS4在dom上工作，它是一种树状结构，而不是线性结构

以你的维基为例，你真正想要的是

查找id=历史它是一个跨度导航到H2元素-记住这是一个起点。 find id=Features和哲学这是另一个跨度向上导航到最近的H2元素-记住这是终点。现在，请注意两个H2元素是兄弟元素，它们具有相同的父元素。因此，您要做的是获取从H2开始到H2结束之间的每个兄弟姐妹，对于每个兄弟姐妹，获取每个兄弟姐妹的全文

这并不难，但这是一个循环，在这个循环中，你会比较每个兄弟姐妹，直到你到达你的最后一个。没有你希望的那么简单

在一个更一般的情况下，这是非常困难或乏味的，实际上，因为您可能需要在DOM树上来回查找匹配的元素。

使用BeautifulSoup 4.7+，CSS选择功能得到了很大的改进。此任务可以使用CSS4:has选择器完成，该选择器现在在BeautifulSoup中受支持：

导入请求从bs4导入BeautifulSoup 网站\ url=请求。gethttps://en.wikipedia.org/wiki/Python_programming_language.text soup=BeautifulSoupwebsite\u url，lxml els=汤。选择'h2:hasspanHistory~*：has~h2:hasspanFeatures\u和\u哲学' 使用编解码器。将“text.txt”、“w”、“utf-8”作为f打开：对于els中的el： printel.get_文本输出：

 Guido van Rossum at OSCON 2006.Main article: History of PythonPython was conceived in the late 1980s[31] by Guido van Rossum at Centrum Wiskunde & Informatica (CWI) in the Netherlands as a successor to the ABC language (itself inspired by SETL)[32], capable of exception handling and interfacing with the Amoeba operating system.[7] Its implementation began in December 1989.[33] Van Rossum's long influence on Python is reflected in the title given to him by the Python community: Benevolent Dictator For Life (BDFL) –  a post from which he gave himself permanent vacation on July 12, 2018.[34]
Python 2.0 was released on 16 October 2000 with many major new features, including a cycle-detecting garbage collector and support for Unicode.[35]
Python 3.0 was released on 3 December 2008. It was a major revision of the language that is not completely backward-compatible.[36] Many of its major features were backported to Python 2.6.x[37] and 2.7.x version series.  Releases of Python 3 include the 2to3 utility, which automates (at least partially) the translation of Python 2 code to Python 3.[38]
Python 2.7's end-of-life date was initially set at 2015 then postponed to 2020 out of concern that a large body of existing code could not easily be forward-ported to Python 3.[39][40] In January 2017, Google announced work on a Python 2.7 to Go transcompiler to improve performance under concurrent workloads.[41]

您能否编辑您的问题以包括测试输入和预期输出？“我不完全清楚你想做什么。”科迪，我现在试过了