Python BeautifulSoup，findAll（'；table'；）返回所有表，但也返回它们之间的文本_Python_Web Scraping_Beautifulsoup_Html Table

Python BeautifulSoup，findAll（'；table'；）返回所有表，但也返回它们之间的文本

python web-scraping

Python BeautifulSoup，findAll（'；table'；）返回所有表，但也返回它们之间的文本,python,web-scraping,beautifulsoup,html-table,Python,Web Scraping,Beautifulsoup,Html Table,我正在尝试隔离网页的一部分，不幸的是，它不包含在任何我可以拉出的内容中我能得到的最接近的方法是获取网页的整个主体，然后尝试删除表格（这是我唯一不想要的部分）我正在使用的代码： storyText = soup.body toRemove = storyText.findAll('table') for each in toRemove: print each 目前的问题是，toRemove行返回表和表之间包含的文本，尽管它们不在其中所以我得到： <body> <

我正在尝试隔离网页的一部分，不幸的是，它不包含在任何我可以拉出的内容中

我能得到的最接近的方法是获取网页的整个主体，然后尝试删除表格（这是我唯一不想要的部分）

我正在使用的代码：

storyText = soup.body
toRemove = storyText.findAll('table')
for each in toRemove:
    print each

目前的问题是，toRemove行返回表和表之间包含的文本，尽管它们不在其中

所以我得到：

<body>
<table>
    table stuff
</table>
    Text, not in tags </br> #This is what I want.
<table>
    table stuff
</table
</body>


餐桌用品
文本，而不是标签
#这是我想要的。
餐桌用品
你的代码在我的Mac上运行良好。
你用的是哪个版本？我用了漂亮的汤
（不推荐使用“靓汤3”。因为它已不再开发。）
这是我的密码：
from bs4 import BeautifulSoup

contents = '''<body>
<table>
     table stuff1
</table>
     Text, not in tags </br> #This is what I want.
<table>
     table stuff2
</table>
</body>'''

soup = BeautifulSoup(contents)

storyText = soup.body
toRemove = storyText.findAll('table')
for each in toRemove:
    print each
    each.extract()

print '----result-------------'
print soup

从bs4导入美化组
内容='''
表1
文本，而不是标签
#这是我想要的。
表2
'''
汤=美汤（内含物）
storyText=soup.body
toRemove=storyText.findAll（'表'）
对于toRemove中的每个组件：
打印每个
each.extract（）
打印'----结果----------------'
印花汤

将得出以下结果
<table>
    table stuff1
</table>
<table>
    table stuff2
</table>
----result-------------
<body>

    Text, not in tags  #This is what I want.

</body>


表1
表2
----结果-------------
文本，而不是标签，这是我想要的。
因此，在示例页面上，您正在尝试获取所有文本？从它开始，是的。我正在使用BS4，我已经尝试使用适用于您的代码，但它不满足我的需要，它会从soup中删除表和我想要的文本。嗯，这可能是由默认HTML解析器造成的。将我的代码的第13行更改如下：`soup=BeautifulSoup（contents，'html5lib'）`您可以使用以下命令之一安装html5lib:$apt get install python-html5lib$easy\u install html5lib$pip install html5lib
<table>
    table stuff1
</table>
<table>
    table stuff2
</table>
----result-------------
<body>

    Text, not in tags  #This is what I want.

</body>