Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/actionscript-3/7.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python beautifulsoup.get_text()对于我的HTML解析不够具体_Python_Html_Regex_Beautifulsoup - Fatal编程技术网

Python beautifulsoup.get_text()对于我的HTML解析不够具体

Python beautifulsoup.get_text()对于我的HTML解析不够具体,python,html,regex,beautifulsoup,Python,Html,Regex,Beautifulsoup,给定下面的HTML代码,我只希望输出h1的文本,而不是“Details about”,这是跨度的文本(由h1封装) 我的当前输出给出: Details about  New Men's Genuine Leather Bifold ID Credit Card Money Holder Wallet Black 我想: New Men's Genuine Leather Bifold ID Credit Card Money Holder Wallet Black 这是我正在使用的HTML

给定下面的HTML代码,我只希望输出h1的文本,而不是“Details about”,这是跨度的文本(由h1封装)

我的当前输出给出:

Details about   New Men's Genuine Leather Bifold ID Credit Card Money Holder Wallet Black
我想:

New Men's Genuine Leather Bifold ID Credit Card Money Holder Wallet Black
这是我正在使用的HTML

<h1 class="it-ttl" itemprop="name" id="itemTitle"><span class="g-hdn">Details about  &nbsp;</span>New Men&#039;s Genuine Leather Bifold ID Credit Card Money Holder Wallet Black</h1>
注意:我不想仅仅截断字符串,因为我希望这段代码具有一些可重用性。
最好的方法是使用一些代码来裁剪出由跨度限定的任何文本。

一种解决方案是检查字符串是否包含
html

from bs4 import BeautifulSoup

html = """<h1 class="it-ttl" itemprop="name" id="itemTitle"><span class="g-hdn">Details about  &nbsp;</span>New Men&#039;s Genuine Leather Bifold ID Credit Card Money Holder Wallet Black</h1>"""
soup = BeautifulSoup(html, 'html.parser')

for line in soup.find_all('h1', attrs={'itemprop': 'name'}):
    for content in line.contents:
        if bool(BeautifulSoup(str(content), "html.parser").find()):
            continue

        print content
您可以使用删除所有
span
标记:

for line in soup.find_all('h1',attrs={'itemprop':'name'}):
    [s.extract() for s in line('span')]
print line.get_text()
# => New Men's Genuine Leather Bifold ID Credit Card Money Holder Wallet Black
import bs4

html = """<h1 class="it-ttl" itemprop="name" id="itemTitle"><span class="g-hdn">Details about  &nbsp;</span>New Men&#039;s Genuine Leather Bifold ID Credit Card Money Holder Wallet Black</h1>"""
soup = bs4.BeautifulSoup(html, 'html.parser')

for line in soup.find_all('h1', attrs={'itemprop': 'name'}):
    for content in line.contents:
        if isinstance(content, bs4.element.Tag):
            continue

        print content
for line in soup.find_all('h1',attrs={'itemprop':'name'}):
    [s.extract() for s in line('span')]
print line.get_text()
# => New Men's Genuine Leather Bifold ID Credit Card Money Holder Wallet Black