Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/309.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 调用文本,但完全排除表_Python_Xml_Xml Parsing_Beautifulsoup - Fatal编程技术网

Python 调用文本,但完全排除表

Python 调用文本,但完全排除表,python,xml,xml-parsing,beautifulsoup,Python,Xml,Xml Parsing,Beautifulsoup,我正在使用Beautiful Soup加载XMl。我所需要的只是文本,忽略标记和text属性词 但是,我想完全排除标记中的任何内容。我曾想过用正则表达式来代替中间的所有东西,但我想知道是否有更干净的解决方案,部分原因是。例如: s =""" <content><p>Hasselt ( ) is a <link target="Belgium">Belgian</link> <link target="city">city</lin

我正在使用Beautiful Soup加载XMl。我所需要的只是文本,忽略标记和
text
属性词

但是,我想完全排除
标记中的任何内容。我曾想过用正则表达式来代替中间的所有东西,但我想知道是否有更干净的解决方案,部分原因是。例如:

s =""" <content><p>Hasselt ( ) is a <link target="Belgium">Belgian</link> <link target="city">city</link> and <link target="Municipalities in Belgium">municipality</link>. 
<table><cell>Passenger growth
<cell>Year</cell><cell>Passengers</cell><cell>Percentage </cell></cell>
<cell>1996</cell><cell>360 000</cell><cell>100%</cell>
<cell>1997</cell><cell>1 498 088</cell><cell>428%</cell>
</table>"""
clean = Soup(s)
print clean.text
而我只想:

Hasselt ( ) is a Belgian city and municipality.

您可以找到
内容
元素并从中删除所有
元素,然后获取文本:

from bs4 import BeautifulSoup

s =""" <content><p>Hasselt ( ) is a <link target="Belgium">Belgian</link> <link target="city">city</link> and <link target="Municipalities in Belgium">municipality</link>.
<table><cell>Passenger growth
<cell>Year</cell><cell>Passengers</cell><cell>Percentage </cell></cell>
<cell>1996</cell><cell>360 000</cell><cell>100%</cell>
<cell>1997</cell><cell>1 498 088</cell><cell>428%</cell>
</table>"""
soup = BeautifulSoup(s, "xml")

content = soup.content
for table in content("table"):
    table.extract()

print(content.get_text().strip())

你一定是在问题的墨汁干之前就开始写代码的;)@事实上,我们已经准备好了
bs4
方便的代码片段。我们在这里做严肃的运动!谢谢
from bs4 import BeautifulSoup

s =""" <content><p>Hasselt ( ) is a <link target="Belgium">Belgian</link> <link target="city">city</link> and <link target="Municipalities in Belgium">municipality</link>.
<table><cell>Passenger growth
<cell>Year</cell><cell>Passengers</cell><cell>Percentage </cell></cell>
<cell>1996</cell><cell>360 000</cell><cell>100%</cell>
<cell>1997</cell><cell>1 498 088</cell><cell>428%</cell>
</table>"""
soup = BeautifulSoup(s, "xml")

content = soup.content
for table in content("table"):
    table.extract()

print(content.get_text().strip())
Hasselt ( ) is a Belgian city and municipality.