Python 使用Beauty Soup时删除html标记时出现的问题_Python_Python 2.7_Beautifulsoup

Python 使用Beauty Soup时删除html标记时出现的问题

python python-2.7

Python 使用Beauty Soup时删除html标记时出现的问题,python,python-2.7,beautifulsoup,Python,Python 2.7,Beautifulsoup,我正在使用BeautifulSoup从网站上刮取一些数据，但在打印数据时，我无法从数据中删除html标记。参考代码为： import csv import urllib2 import sys from bs4 import BeautifulSoup page = urllib2.urlopen('http://www.att.com/shop/wireless/devices/smartphones.html').read() soup = BeautifulSoup(page) so

我正在使用BeautifulSoup从网站上刮取一些数据，但在打印数据时，我无法从数据中删除html标记。参考代码为：

import csv
import urllib2
import sys  
from bs4 import BeautifulSoup

page = urllib2.urlopen('http://www.att.com/shop/wireless/devices/smartphones.html').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor1 in soup.findAll('div', {"class": "listGrid-price"}):
    print anchor1
for anchor2 in soup.findAll('div', {"class": "gridPrice"}):
    print anchor2
for anchor3 in soup.findAll('div', {"class": "gridMultiDevicePrice"}):
    print anchor3

我用这个得到的输出如下：

<div class="listGrid-price"> 
                                $99.99 
            </div>
<div class="listGrid-price"> 
                                $0.01 
            </div>
<div class="listGrid-price"> 
                                $0.01 
            </div>


$99.99 
$0.01 
$0.01

我只想在输出价格没有任何html标签围绕它。请原谅我的无知，因为我是编程新手。

您正在打印找到的标记。要仅打印包含的文本，请使用

.string

属性：

print anchor1.string

.string

值是一个；要像普通unicode对象一样使用它，请先将其转换。然后可以使用

strip（）

删除多余的空白：

print unicode(anchor1.string).strip()

稍微调整此值以允许空值：

for anchor1 in soup.findAll('div', {"class": "listGrid-price"}):
    if anchor1.string:
        print unicode(anchor1.string).strip()

这给了我：

$99.99
$0.99
$0.99
$299.99
$199.99
$49.99
$49.99
$99.99
$0.99
$99.99
$0.01
$0.01
$0.01
$0.01
$0.01

我之前尝试过使用anchor1.string，但它给出了这样一个错误：“RuntimeError:超过了最大递归深度”。在此之后，我还尝试将递归限制从1000更改为1500，但即使这样，它也没有改变work@user1915050：然后提出这个问题，并进行回溯。在这种情况下，

anchor1

可能不是您所想的那样。@user1915050:我刚刚用

anchor1.string

尝试了您的代码，它可以正常工作