Python bs4.FeatureNotFound:无法'；t找不到具有所需功能的树生成器：lxml_Python_Web Scraping_Beautifulsoup_Lxml_Bs4

Python bs4.FeatureNotFound:无法'；t找不到具有所需功能的树生成器：lxml

python web-scraping

Python bs4.FeatureNotFound:无法'；t找不到具有所需功能的树生成器：lxml,python,web-scraping,beautifulsoup,lxml,bs4,Python,Web Scraping,Beautifulsoup,Lxml,Bs4,你能建议一个解决办法吗？它几乎从imgur页面下载了所有的图像，只有一个图像，不知道为什么它在这种情况下不工作，以及如何修复它 elif 'imgur.com' in submission.url and not (submission.url.endswith('gif') or submission.url.endswith('webm') or submission.url.endswith('

你能建议一个解决办法吗？它几乎从imgur页面下载了所有的图像，只有一个图像，不知道为什么它在这种情况下不工作，以及如何修复它

elif 'imgur.com' in submission.url and not (submission.url.endswith('gif')
                        or submission.url.endswith('webm')
                        or submission.url.endswith('mp4')
                        or 'all' in submission.url
                        or '#' in submission.url
                        or '/a/' in submission.url):
                html_source = requests.get(submission.url).text # download the image's page
                soup = BeautifulSoup(html_source, "lxml")
                image_url = soup.select('img')[0]['src']
                if image_url.startswith('//'):
                image_url = 'http:' + image_url
                image_id = image_url[image_url.rfind('/') + 1:image_url.rfind('.')]
                try:
                image_file = urllib2.urlopen(image_url, timeout = 5)
                with open('/home/mona/computer_vision/image_retrieval/images/'+ category+ '/'+ 'imgur_'+ datetime.datetime.now().strftime('%y-%m-%d-%s') + image_url[-9:], 'wb') as output_image:
                        output_image.write(image_file.read())
                        except urllib2.URLError as e:
                        print(e)
                        continue

错误是：

[LOG] Done Getting http://i.imgur.com/FoCjtI7.jpg
submission id is: 1alffm
[LOG] Getting url:  http://sphotos-a.ak.fbcdn.net/hphotos-ak-ash4/217834_10151246341237704_484810759_n.jpg
HTTP Error 403: Forbidden
[LOG] Getting url:  http://imgur.com/xp386
Traceback (most recent call last):
  File "download_images.py", line 67, in <module>
    soup = BeautifulSoup(html_source, "lxml")
  File "/usr/lib/python2.7/dist-packages/bs4/__init__.py", line 155, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

[LOG]完成获取http://i.imgur.com/FoCjtI7.jpg
提交id为：1alffm
[日志]正在获取url:http://sphotos-a.ak.fbcdn.net/hphotos-ak-ash4/217834_10151246341237704_484810759_n.jpg
HTTP错误403：禁止
[日志]正在获取url:http://imgur.com/xp386
回溯（最近一次呼叫最后一次）：
文件“download_images.py”，第67行，在
soup=BeautifulSoup（html_源代码，“lxml”）
文件“/usr/lib/python2.7/dist-packages/bs4/_-init__.py”，第155行，在_-init中__
%“，”。连接（功能））
bs4.FeatureNotFound:找不到具有您请求的功能的树生成器：lxml。您需要安装解析器库吗？

打开python shell并尝试以下操作：

from bs4 import BeautifulSoup
myHTML = "<html><head></heda><body><strong>Hi</strong></body></html>"
soup = BeautifulSoup(myHTML, "lxml")

我之所以要完成这些步骤，是因为您指出脚本在崩溃之前工作了很长一段时间，在这种情况下，您不会错过解析器吧

由OP添加：

If you are using Python2.7 in Ubuntu/Debian, this worked for me:

$ sudo apt-get build-dep python-lxml
$ sudo pip install lxml 

Test it like:

mona@pascal:~/computer_vision/image_retrieval$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml

打开python shell并尝试以下操作：

from bs4 import BeautifulSoup
myHTML = "<html><head></heda><body><strong>Hi</strong></body></html>"
soup = BeautifulSoup(myHTML, "lxml")

我之所以要完成这些步骤，是因为您指出脚本在崩溃之前工作了很长一段时间，在这种情况下，您不会错过解析器吧

由OP添加：

If you are using Python2.7 in Ubuntu/Debian, this worked for me:

$ sudo apt-get build-dep python-lxml
$ sudo pip install lxml 

Test it like:

mona@pascal:~/computer_vision/image_retrieval$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml

谢谢脚本正在另一台机器上运行。我没有在这台新机器上安装lxml。谢谢。脚本正在另一台机器上运行。我没能在这台新机器上安装lxml。