Python 图像的数量，使用；len（）；_Python_Parsing

Python 图像的数量，使用；len（）；

python parsing

Python 图像的数量，使用；len（）；,python,parsing,Python,Parsing,我需要计算图像的数量（在本例中为1个图像）。显然在使用“len（）” 以下是HTML： <div class="detail-headline"> Fotogaléria </div> <div class="detail-indent"> <table id="ctl00_ctl00_ctl00_containerHolder_mainContentHolder_innnerContentHolder_

我需要计算图像的数量（在本例中为1个图像）。显然在使用“len（）”

以下是HTML：

<div class="detail-headline">
    Fotogal&#233;ria
        </div>
<div class="detail-indent">
    <table id="ctl00_ctl00_ctl00_containerHolder_mainContentHolder_innnerContentHolder_ZakazkaControl_ZakazkaObrazky1_ObrazkyDataList" cellspacing="0" border="0" style="width:100%;border-collapse:collapse;">
    <tr>
        <td align="center" style="width:25%;">
            <div id="ctl00_ctl00_ctl00_containerHolder_mainContentHolder_innnerContentHolder_ZakazkaControl_ZakazkaObrazky1_ObrazkyDataList_ctl02_PictureContainer">
                <a title="1-izb. Kaspická" class="highslide detail-img-link" onclick="return hs.expand(this);" href="/imgcache/cache231/3186-000393~8621457~640x480.jpg"><img src="/imgcache/cache231/3186-000393~8621457~120x120.jpg" class="detail-img" width="89" height="120" alt="1-izb. Kaspická" /></a>
            </div>
        </td><td></td>
    </tr>
</table>
</div>

然后（检查开始标记）。。像这样

if tag == 'div' and len(attrs) > 0 and attrs[0][0] == 'class' and attrs[0][1] == 'detail-headline' \
      and self.srcData[self.getpos()[0]].strip() == 'Fotogal&#233;ria':
      self.status = 3

可以吗？还有。。。？谢谢

导入urllib
导入urllib2
导入HTMLPasser
导入编解码器
导入时间
从BeautifulSoup导入BeautifulSoup
#解码字符串
def解码（istr）：
ostr=u“
idx=0
当idxidx+1和istr[idx+1]='#'：
iend=istr.find（“；”，idx）
如果iend>idx：
ostr+=unichr（国际标准（istr[idx+2:iend]））
idx=iend
add=False
如果添加：
ostr+=istr[idx]
idx+=1
返回ostr
#解析器1
类FlatDetailParser（HTMLParser.HTMLParser）：
定义初始化（自）：
HTMLParser.HTMLParser.\uuuuu初始化（self）
def加载详细信息（自身、链接）：
self.record=（len（self.characters）+1）*[“”]
self.status=0
self.index=-1
self.reset（）
request=urlib2.request（链接）
data=urlib2.urlopen（请求）#从下一个类获得的URL
self.srcData=[]
对于行输入数据：
line=line.decode（'utf8'）
self.srcData.append（第行）
对于self.srcData中的行：
自进给（直线）
self.close（）
返回自我记录
def句柄\u开始标记（自身、标记、属性）：
如果标记=='div'和len（attrs）>1，attrs[1][0]=='class'和attrs[1][1]==''detail headline'\
和self.srcData[self.getpos（）[0]].strip（）==u'Realitná；坎塞尔á；ria'：
self.status=2
如果self.status==2，tag='div'和len（attrs）>0，attrs[0][0]='class'\
和attrs[0][1]=“name”：
self.record[-1]=解码（self.srcData[self.getpos（）[0]].strip（））
self.status=0

…和下一类解析器，并将数据添加到txt文件中

当我使用BeautifulSoup。。什么是汤=美汤（？？？）。如何添加到srcData？

这可以合并吗？如何操作？

如果使用

也许是这样的

from BeautifulSoup import BeaufitulSoup
def count_images(htmltext)
    soup=BeautifulSoup(htmltext)
    return len(soup.findAll('div',{'class':'detail-indent'}))

或使用lxml

from lxml.html.soupparser import fromstring
def count_images(htmltext)
    return len([e.attrib for e in fromstring(htmltext).findall('div')
                             if e.attrib.get('class')=='detail-indent'])

如果你使用

也许是这样的

from BeautifulSoup import BeaufitulSoup
def count_images(htmltext)
    soup=BeautifulSoup(htmltext)
    return len(soup.findAll('div',{'class':'detail-indent'}))

或使用lxml

from lxml.html.soupparser import fromstring
def count_images(htmltext)
    return len([e.attrib for e in fromstring(htmltext).findall('div')
                             if e.attrib.get('class')=='detail-indent'])

只是为了好玩，我尝试了一种pyparsing方法。Pyparsing包括一些帮助构建HTML标记匹配模式的方法，包括属性匹配、意外的空白、单引号或双引号以及其他难以预测的HTML标记陷阱。这是一个pyparsing解决方案（假设您的HTML源代码已读入字符串变量“HTML”）：

印刷品：

2
/imgcache/cache231/3186-000393~8621457~640x480.jpg
/imgcache/cache231/3186-000393~8621457~120x120.jpg

印刷品：

2
/imgcache/cache231/3186-000393~8621457~640x480.jpg
/imgcache/cache231/3186-000393~8621457~120x120.jpg

能否在示例中添加第二个图像，以便我们更好地理解文档的结构？能否在示例中添加第二个图像，以便我们更好地理解文档的结构？BeautifulSoup的开发已经停止。推荐lxml（）更有意义，它仍然是最新的、受支持的、改进的软件。作为初学者，我知道一些BeautifulSoup，但我不知道如何组合HTMLPasser（并使用“len（）”）。这是必要的。以前的代码（200行）是他写的。。（后置方法等）BeautifulSoup的开发已经停止。推荐lxml（）更有意义，它仍然是最新的、受支持的、改进的软件。作为初学者，我知道一些BeautifulSoup，但我不知道如何组合HTMLPasser（并使用“len（）”）。这是必要的。以前的代码（200行）是他写的。。（POST方法等）很好的解决方案。BeautifulSoup、lxml、pyparsing。再一次。。在我的例子中什么是html？我没有任何类型为“http://www……com/……html”的url。以及如何添加到srcData（下一次导入到txt/csv文件）？我喜欢这些方法，但不要在困难的情况下使用。编辑说明“html”是一个字符串变量，包含您正在搜索图像引用的html源。我不明白你的其他问题。很好的解决方案。BeautifulSoup、lxml、pyparsing。再一次。。在我的例子中什么是html？我没有任何类型为“http://www……com/……html”的url。以及如何添加到srcData（下一次导入到txt/csv文件）？我喜欢这些方法，但不要在困难的情况下使用。编辑说明“html”是一个字符串变量，包含您正在搜索图像引用的html源。我不明白你的其他问题。