Python 美化组-避免考虑包含<；br>；使用findAll作为不同的元素_Python_Html_Beautifulsoup

Python 美化组-避免考虑包含<；br>；使用findAll作为不同的元素

python html

Python 美化组-避免考虑包含<；br>；使用findAll作为不同的元素,python,html,beautifulsoup,Python,Html,Beautifulsoup,我有以下html代码段： <div id="targetdown" class="content"> <div class="alertbox"> <div class="ym-wrapper"> <div class="ym-wbox"> </div> </div> </div> <div class

我有以下html代码段：

<div id="targetdown" class="content">
    <div class="alertbox">
        <div class="ym-wrapper">
            <div class="ym-wbox">

            </div>
        </div>
    </div>
    <div class="ym-wrapper">
        <div class="ym-wbox">
            <p style="text-align: center;">EXCEL Physical Therapy has been keeping our patients moving forward<br />
for nearly 30 years. In the process, we have built an unparalleled<br />
reputation&nbsp;by combining the highest quality of physical therapy<br />
with exceptional&nbsp;customer service to provide a genuinely<br />
&ldquo;patient first&rdquo; approach.&nbsp;It is this philosophy&nbsp;that has established<br />
EXCEL&nbsp;as&nbsp;a premier physical therapy provider in Northern New Jersey.</p>
        </div>
    </div>
</div>
<section class="parallaxone parallax">
    <div class="ym-wrapper">
        <div class="ym-wbox">
            <h2>Helping you navigate the road to recovery</h2>


        </div>
    </div>
</section>

如何避免在换行符标记中出现这种拆分，以使文本

近30年来，EXCEL理疗一直在帮助我们的患者不断进步。在这个过程中，我们建立了一个通过结合最高质量的物理治疗
除客户服务外提供真正的
“患者优先” 方法。正是这种哲学确立了
EXCELasa是纽约北部地区的顶级理疗提供商泽西岛

作为列表中的单个元素返回？

您可以这样做：

soup.find_all("div", class_="ym-wbox")[1].find("p").text

我希望它是一般性的，而不仅仅是针对这种情况。出于我的特殊目的，我尝试使用soup.findAll（text=True）提取页面中的所有可见文本，然后对其进行筛选，以查找script、head等元素。这很好，除了bs会将页面中的元素打断之外。页面中的所有文本都只是

汤。text

我还需要将来自不同元素的文本放入不同的列表元素中，因为我想分别对每个元素进行文本处理。不同的元素是什么意思？您的预期输出是通过使用我发布的选项之一实现的……如果不是，请编辑您的问题以解释预期输出是什么。

In [20]: texts
Out[20]:
['EXCEL Physical Therapy has been keeping our patients moving forward',
 'for nearly 30 years. In the process, we have built an unparalleled',
 ' reputation\xa0by combining the highest quality of physical therapy',
 ' with exceptional\xa0customer service to provide a genuinely',
 ' “patient first” approach.\xa0It is this philosophy\xa0that has established',
 ' EXCEL\xa0as\xa0a premier physical therapy provider in Northern New Jersey.',
 'Helping you navigate the road to recovery',
 ' ']

soup.find_all("div", class_="ym-wbox")[1].find("p").text