Python链接下载器速度慢_Python_Beautifulsoup_Lxml

Python链接下载器速度慢

python

Python链接下载器速度慢,python,beautifulsoup,lxml,Python,Beautifulsoup,Lxml,通过使用lxml或mechanizer并一起切掉漂亮的汤，有没有办法提高这个脚本的速度 python: import lxml.html as html import urllib import urlparse from BeautifulSoup import BeautifulSoup import re import os, sys print ("downloading and parsing bibles...") root = html.parse(open('all.html')

通过使用lxml或mechanizer并一起切掉漂亮的汤，有没有办法提高这个脚本的速度

python:

import lxml.html as html
import urllib
import urlparse
from BeautifulSoup import BeautifulSoup
import re
import os, sys
print ("downloading and parsing bibles...")
root = html.parse(open('all.html'))
for link in root.findall('//a'):
  url = link.get('href')
  name = urlparse.urlparse(url).path.split('/')[-1]
  dirname = urlparse.urlparse(url).path.split('.')[-1]
  f = urllib.urlopen(url)
  s = f.read()
  if (os.path.isdir(dirname) == 0): 
    os.mkdir(dirname)
  soup = BeautifulSoup(s)
  articleTag = soup.html.body.article
  converted = str(articleTag)
  full_path = os.path.join(dirname, name)
  open(full_path, 'w').write(converted)
  print(name)
print("downloads complete!")

all.html

<a href="http://www.youversion.com/bible/gen.1.nmv-fas">http://www.youversion.com/bible/gen.1.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.2.nmv-fas">http://www.youversion.com/bible/gen.2.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.3.nmv-fas">http://www.youversion.com/bible/gen.3.nmv-fas</a>

您应该首先衡量脚本中真正需要花费时间的内容。优化不慢的东西是浪费你的时间

这可能是下载，而不是解析。在这种情况下，切换解析器将无济于事。使用线程加速许多文件的下载（每次下载一个）可能会有所帮助，因为在第一次下载完成之前可以开始另一次下载。

下载不需要时间吗？这占用了大部分时间，但我不能使用lxml而不是beautifulsoup并提高速度吗？这些都是用于解析的。如果下载占用了大部分时间，那么解析器就无关紧要了。我可以使用mechanize更快地下载吗？如果有很多文件，您可以尝试在单独的线程中下载它们。这可能会有帮助。