Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Beautifulsoup无需下一个标记即可获取内容_Python_Python 3.x_Beautifulsoup_Jupyter Notebook - Fatal编程技术网

Python Beautifulsoup无需下一个标记即可获取内容

Python Beautifulsoup无需下一个标记即可获取内容,python,python-3.x,beautifulsoup,jupyter-notebook,Python,Python 3.x,Beautifulsoup,Jupyter Notebook,我有一些像这样的html代码 <p><span class="map-sub-title">abc</span>123</p> abc123 我使用了Beautifulsoup,下面是我的代码: html = '<p><span class="map-sub-title">abc</span>123</p>' soup1 = BeautifulSoup(html,"lxml") p = soup1

我有一些像这样的html代码

<p><span class="map-sub-title">abc</span>123</p>
abc123

我使用了Beautifulsoup,下面是我的代码:

html = '<p><span class="map-sub-title">abc</span>123</p>'
soup1 = BeautifulSoup(html,"lxml")
p = soup1.text
html='abc123

' soup1=BeautifulSoup(html,“lxml”) p=soup1.text
我得到的结果是‘abc123’

但是我想得到的结果是“123”而不是“abc123”

您可以使用该函数删除span标记,然后获得所需的文本

from bs4 import BeautifulSoup

html = '<p><span class="map-sub-title">abc</span>123</p>'
soup = BeautifulSoup(html, "lxml")

for span in soup.find_all("span", {'class':'map-sub-title'}):
    span.decompose()

print(soup.text)
从bs4导入美化组
html='abc123

' soup=BeautifulSoup(html,“lxml”) 对于汤中的span.find_all(“span”,{'class':'map-sub-title'}): span.decompose() 打印(soup.text)
如果标记中有多个内容,您仍然可以只查看字符串。使用
.strings
生成器:

>>> from bs4 import BeautifulSoup
>>> html = '<p><span class="map-sub-title">abc</span>123</p>'
>>> soup1 = BeautifulSoup(html,"lxml")
>>> soup1.p.strings
<generator object _all_strings at 0x00000008768C50>
>>> list(soup1.strings)
['abc', '123']
>>> list(soup1.strings)[1]
'123'
>>来自bs4导入组
>>>html='abc123

' >>>soup1=BeautifulSoup(html,“lxml”) >>>soup1.p.strings >>>列表(soup1.strings) ['abc','123'] >>>列表(soup1.strings)[1] '123'
您还可以使用
extract()
删除不需要的标记,然后再从标记中获取文本,如下所示

from bs4 import BeautifulSoup

html = '<p><span class="map-sub-title">abc</span>123</p>'
soup1 = BeautifulSoup(html,"lxml")
soup1.p.span.extract()

print(soup1.text)
从bs4导入美化组
html='abc123

' soup1=BeautifulSoup(html,“lxml”) soup1.p.span.extract() 打印(soup1.text)
许多方法之一是在父标记上使用
内容(在本例中是

如果您知道字符串的位置,可以直接使用:

>>> from bs4 import BeautifulSoup, NavigableString
>>> soup = BeautifulSoup('<p><span class="map-sub-title">abc</span>123</p>', 'lxml')
>>> # check the contents
... soup.find('p').contents
[<span class="map-sub-title">abc</span>, '123']
>>> soup.find('p').contents[1]
'123'
使用第二种方法,您将能够获得直接作为
标记子项的所有文本。为了完整起见,这里还有一个例子:

>>> html = '''
... <p>
...     I want
...     <span class="map-sub-title">abc</span>
...     foo
...     <span class="map-sub-title">abc2</span>
...     text
...     <span class="map-sub-title">abc3</span>
...     only
... </p>
... '''
>>> soup = BeautifulSoup(html, 'lxml')
>>> ' '.join([x.strip() for x in soup.find('p').contents if isinstance(x, NavigableString)])
'I want foo text only'
>html=''
... 
...     我想要
...     abc
...     福
...     abc2
...     文本
...     abc3
...     只有
... 

... ''' >>>soup=BeautifulSoup(html,“lxml”) >>>''.join([x.strip()表示汤中的x.find('p')。如果存在内容(x,navigablesting)]) '我只想要foo文本'
尽管此线程上的每个响应似乎都是可以接受的,但我将指出另一种解决此问题的方法:

soup.find(“span”,{'class':'map-sub-title'})。下一个兄弟姐妹

您可以使用
next\u sibling
在同一
parent
上的元素之间导航,在本例中是
p
标记

>>> html = '''
... <p>
...     I want
...     <span class="map-sub-title">abc</span>
...     foo
...     <span class="map-sub-title">abc2</span>
...     text
...     <span class="map-sub-title">abc3</span>
...     only
... </p>
... '''
>>> soup = BeautifulSoup(html, 'lxml')
>>> ' '.join([x.strip() for x in soup.find('p').contents if isinstance(x, NavigableString)])
'I want foo text only'