用Python计算HTML图像_Python_Operating System_Urllib

用Python计算HTML图像

python operating-system

用Python计算HTML图像,python,operating-system,urllib,Python,Operating System,Urllib,在提取HTML图像后，我需要一些关于如何使用Python 3.01计算HTML图像的反馈，可能是我的正则表达式没有正确使用这是我的密码： import re, os import urllib.request def get_image(url): url = 'http://www.google.com' total = 0 try: f = urllib.request.urlopen(url) for line in f.readline(): l

在提取HTML图像后，我需要一些关于如何使用Python 3.01计算HTML图像的反馈，可能是我的正则表达式没有正确使用

这是我的密码：

import re, os
import urllib.request
def get_image(url):
  url = 'http://www.google.com'
  total = 0
  try:
    f = urllib.request.urlopen(url)
    for line in f.readline():
      line = re.compile('<img.*?src="(.*?)">')
      if total > 0:
        x = line.count(total)
        total += x
        print('Images total:', total)

  except:
    pass

导入re，操作系统
导入urllib.request
def get_图像（url）：
url='1〕http://www.google.com'
总数=0
尝试：
f=urllib.request.urlopen（url）
对于f.readline（）中的行：
行=重新编译（“”）
如果总数>0：
x=行数（总计）
总数+=x
打印（'图像总数：'，总数）
除：
通过

使用beautifulsoup4（html解析器）而不是正则表达式：

import urllib.request

import bs4  # beautifulsoup4

html = urllib.request.urlopen('http://www.imgur.com/').read()
soup = bs4.BeautifulSoup(html)
images = soup.findAll('img')
print(len(images))

关于您的代码，有两点：

使用专用的HTML解析库来解析页面（这是python的方式）更容易。。我个人更喜欢

您在循环中过度写入了

行

变量

total

在当前逻辑中始终为0

不需要编译您的RE，因为

您正在丢弃您的异常，因此没有关于代码中发生了什么的线索

标记可能还有其他属性。。因此您的正则表达式有点基本，另外，使用

re.findall（）

方法捕获同一行上的多个实例稍微更改一下您的代码，我得到：

import re
from urllib.request import urlopen

def get_image(url):

    total  = 0
    page   = urlopen(url).readlines()

    for line in page:

        hit   = re.findall('<img.*?>', str(line))
        total += len(hit)

    print('{0} Images total: {1}'.format(url, total))

get_image("http://google.com")
get_image("http://flickr.com")

重新导入
从urllib.request导入urlopen
def get_图像（url）：
总数=0
page=urlopen（url）.readlines（）
对于第页中的行：
hit=re.findall（“”，str（行））
总+=镜头（命中）
打印（{0}个图像总计：{1}）。格式（url，总计））
获取图像（“http://google.com")
获取图像（“http://flickr.com")

不要接受异常。它引发了什么？嗯，我可以做一个例外：例外urllib.error.HTTPError：如果找不到这样的url，你正在隐藏它抛出的错误。嗯，我不能在空闲时做漂亮的soup4，我得到了一个回溯错误。非常感谢！！这段代码更好，我会记下你的评论，并尝试安装BeautifulSoup！没问题。。如果你需要答案，别忘了接受！也请注意Corey的答案，这是一个很好的例子，说明了这些任务有多么简单！完美的此accept位于此网站的何处，找不到它：（答案旁边的上/下箭头下应该有一个勾号，答案的分数得到了！：）再次感谢！！