Python 如何使用BeautifulSoup删除嵌套标记中的内容？_Python_Html_Nested_Beautifulsoup

Python 如何使用BeautifulSoup删除嵌套标记中的内容？

python html

Python 如何使用BeautifulSoup删除嵌套标记中的内容？,python,html,nested,beautifulsoup,Python,Html,Nested,Beautifulsoup,如何使用BeautifulSoup删除嵌套标记中的内容？这些帖子显示了检索嵌套标记中的内容的相反方式：，和我尝试了.text，但它只删除了标记 >>> from bs4 import BeautifulSoup as bs >>> html = "<foo>Something something <bar> blah blah</bar> something</foo>" >>> bs(htm

如何使用

BeautifulSoup

删除嵌套标记中的内容？这些帖子显示了检索嵌套标记中的内容的相反方式：，和

我尝试了

.text

，但它只删除了标记

>>> from bs4 import BeautifulSoup as bs
>>> html = "<foo>Something something <bar> blah blah</bar> something</foo>"
>>> bs(html).find_all('foo')[0]
<foo>Something something <bar> blah blah</bar> something else</foo>
>>> bs(html).find_all('foo')[0].text
u'Something something  blah blah something else'

>>从bs4导入BeautifulSoup作为bs
>>>html=“某物，某物，某物”
>>>bs（html）.find_all（'foo'）[0]
什么什么什么什么别的
>>>bs（html）。查找所有（'foo'）[0]。文本
你“什么什么什么什么什么别的”

期望输出：

还有别的吗

例如

您可以检查子项上的

bs4.element.NavigableString

：

from bs4 import BeautifulSoup as bs
import bs4
html = "<foo>Something something <bar> blah blah</bar> something <bar2>GONE!</bar2> else</foo>"
def get_only_text(elem):
    for item in elem.children:
        if isinstance(item,bs4.element.NavigableString):
            yield item

print ''.join(get_only_text(bs(html).find_all('foo')[0]))

这是我的简单方法，

soup.body.clear（）

或

soup.tag.clear（）

假设您希望清除

中的内容并添加新的数据帧；稍后，您可以使用此clear方法轻松更新网页html文件中的表格，而不是flask/django：

    import pandas as pd
    import bs4

我想将120万行.csv转换为数据帧，然后转换为HTML表格，然后将其添加到我的网页的html语法中。以后我想轻松的只要切换一个变量，csv就可以随时更新数据

    bizcsv = read_csv("business.csv")
    dframe = pd.DataFrame(bizcsv)
    dfhtml = dframe.to_html #convert DataFrame to table, HTML format
    dfhtml_update = dfhtml_html.strip('<table border="1" class="dataframe">, </table>')
    """use dfhtml_update later to update your table without the <table> tags,
    the <table> is easy for BS to select & clear!"""

    #A small function to unescape (&lt; to <) the tags back into HTML format
    def unescape(s):
        s = s.replace("&lt;", "<")
        s = s.replace("&gt;", ">")
        # this has to be last:
        s = s.replace("&amp;", "&")
        return s

    with open("page.html") as page:  #return to here when updating
        txt = page.read()
        soup = bs4.BeautifulSoup(txt, features="lxml")
        soup.body.append(dfhtml) #adds table to <body>
        with open("page.html", "w") as outf:
            outf.write(unescape(str(soup))) #writes to page.html

    """lets say you want to make seamless table updates to your 
    webpage instead of using flask or django x_x; return to with open function"""
    soup.table.clear()  #clears everything in <table></table>
    soup.table.append(dfhtml_update)
    with open("page.html", "w") as outf:
        outf.write(unescape(str(soup)))

bizcsv=read\u csv（“business.csv”）
dframe=pd.DataFrame（bizcsv）
dfhtml=dframe.to_html#将数据帧转换为表格，html格式
dfhtml_update=dfhtml_html.strip（'，'）
“”“稍后使用dfhtml\u update更新不带标记的表，
BS易于选择和清除！“”
#一个用于取消浏览的小函数（为了……在本例中，您希望删除条的内容）？第二行代码中是否应该有一个“else”？
    import pandas as pd
    import bs4

    bizcsv = read_csv("business.csv")
    dframe = pd.DataFrame(bizcsv)
    dfhtml = dframe.to_html #convert DataFrame to table, HTML format
    dfhtml_update = dfhtml_html.strip('<table border="1" class="dataframe">, </table>')
    """use dfhtml_update later to update your table without the <table> tags,
    the <table> is easy for BS to select & clear!"""

    #A small function to unescape (&lt; to <) the tags back into HTML format
    def unescape(s):
        s = s.replace("&lt;", "<")
        s = s.replace("&gt;", ">")
        # this has to be last:
        s = s.replace("&amp;", "&")
        return s

    with open("page.html") as page:  #return to here when updating
        txt = page.read()
        soup = bs4.BeautifulSoup(txt, features="lxml")
        soup.body.append(dfhtml) #adds table to <body>
        with open("page.html", "w") as outf:
            outf.write(unescape(str(soup))) #writes to page.html

    """lets say you want to make seamless table updates to your 
    webpage instead of using flask or django x_x; return to with open function"""
    soup.table.clear()  #clears everything in <table></table>
    soup.table.append(dfhtml_update)
    with open("page.html", "w") as outf:
        outf.write(unescape(str(soup)))