Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/286.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用Beautifulsoup根据嵌套标记分割和重新组合文本?_Python_Python 3.x_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 如何使用Beautifulsoup根据嵌套标记分割和重新组合文本?

Python 如何使用Beautifulsoup根据嵌套标记分割和重新组合文本?,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,在下面的html中,我需要按顺序阅读所有文本,并为每个span类组合单独的句子 <label for="01">"The traveler, with his powerful " <span class ="Wizard">"Storm"</span> <span class ="Warrior">"Whirlwind"</span> <span class ="Monk">"Prayer"<

在下面的html中,我需要按顺序阅读所有文本,并为每个span类组合单独的句子

<label for="01">"The traveler, with his powerful "
    <span class ="Wizard">"Storm"</span>
    <span class ="Warrior">"Whirlwind"</span>
    <span class ="Monk">"Prayer"</span>", took down the dark forces of evil. The "
    <span class ="Wizard">"wizard"</span>
    <span class ="Warrior">"warrior"</span>
    <span class ="Monk">"monk"</span>" was exhausted afterwards and needed to take a rest."
</label>
我不知道如何处理这个问题,我在网上也找不到任何东西——可能是因为我不知道如何表述我的问题(如果你有建议如何更好地表述我的问题,请留下评论,我会的)

提前谢谢你


编辑:我尝试了
find(text=True)
find\u all(text=True)
但是我不知道怎么做。

你可以使用
itertools.groupby

import bs4
from bs4 import BeautifulSoup as soup
from itertools import groupby
d = [(a, list(b)) for a, b in groupby(list(filter(lambda x:x != '\n', soup(content, 'html.parser').label.contents)), key=lambda x:isinstance(x, bs4.element.NavigableString))]
users, _text = list(zip(*[b for a, b in d if not a])), [b for a, b in d if a]
result = [[a[0]['class'][0], (lambda x:''.join(f'{j[1:-1]} {next(x).text[1:-1]}' if l < len(_text) - 1 else j[1:-2] for l, [j] in enumerate(_text)))(iter(a))] for a in users]
输出:

[['Wizard', 'The traveler, with his powerful Storm, took down the dark forces of evil. The wizard was exhausted afterwards and needed to take a rest.'], 
 ['Warrior', 'The traveler, with his powerful Whirlwind, took down the dark forces of evil. The warrior was exhausted afterwards and needed to take a rest.'], 
 ['Monk', 'The traveler, with his powerful Prayer, took down the dark forces of evil. The monk was exhausted afterwards and needed to take a rest.']]

您可以使用
itertools.groupby

import bs4
from bs4 import BeautifulSoup as soup
from itertools import groupby
d = [(a, list(b)) for a, b in groupby(list(filter(lambda x:x != '\n', soup(content, 'html.parser').label.contents)), key=lambda x:isinstance(x, bs4.element.NavigableString))]
users, _text = list(zip(*[b for a, b in d if not a])), [b for a, b in d if a]
result = [[a[0]['class'][0], (lambda x:''.join(f'{j[1:-1]} {next(x).text[1:-1]}' if l < len(_text) - 1 else j[1:-2] for l, [j] in enumerate(_text)))(iter(a))] for a in users]
输出:

[['Wizard', 'The traveler, with his powerful Storm, took down the dark forces of evil. The wizard was exhausted afterwards and needed to take a rest.'], 
 ['Warrior', 'The traveler, with his powerful Whirlwind, took down the dark forces of evil. The warrior was exhausted afterwards and needed to take a rest.'], 
 ['Monk', 'The traveler, with his powerful Prayer, took down the dark forces of evil. The monk was exhausted afterwards and needed to take a rest.']]

这是完整的密码吗?我无法成功运行它-我在bs4.element“AttributeError:type对象'BeautifulSoup'没有属性'element'”中遇到错误。在我之前的评论中,我可能忘记了包含导入bs4-我将试用它,看看它在5分钟内如何工作。这是完整的代码吗?我无法成功运行它-我在bs4.element“AttributeError:type对象'BeautifulSoup'没有属性'element'”处遇到错误。在我之前的评论中,我可能忘记了包含导入bs4-我将在5分钟内试用它,看看它是如何工作的