Python 3.x 使用BeautifulSoup和ftlib访问ftp网站时出错

Python 3.x 使用BeautifulSoup和ftlib访问ftp网站时出错,python-3.x,beautifulsoup,python-requests,ftplib,Python 3.x,Beautifulsoup,Python Requests,Ftplib,我正在尝试访问网页以下载以下数据: from bs4 import BeautifulSoup import urllib.request from lxml import html download_url = "ftp://nomads.ncdc.noaa.gov/NARR_monthly/" s = requests.session() page = Beautiful

我正在尝试访问网页以下载以下数据:

from bs4 import BeautifulSoup
import urllib.request
from lxml import html

download_url = "ftp://nomads.ncdc.noaa.gov/NARR_monthly/"

s = requests.session()                                                         


page = BeautifulSoup(s.get(download_url).text, "lxml")
但这也带来了:

Traceback (most recent call last):

  File "<ipython-input-271-59c5b15a7e34>", line 1, in <module>
    r = requests.get(download_url)

  File "/anaconda3/lib/python3.6/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)

  File "/anaconda3/lib/python3.6/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)

  File "/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)

  File "/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 612, in send
    adapter = self.get_adapter(url=request.url)

  File "/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 703, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)

InvalidSchema: No connection adapters were found for 'ftp://nomads.ncdc.noaa.gov/NARR_monthly/'
  File "<ipython-input-284-60bd19e600fe>", line 1, in <module>
    ftp = ftplib.FTP(download_url)

  File "/anaconda3/lib/python3.6/ftplib.py", line 117, in __init__
    self.connect(host)

  File "/anaconda3/lib/python3.6/ftplib.py", line 152, in connect
    source_address=self.source_address)

  File "/anaconda3/lib/python3.6/socket.py", line 704, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):

  File "/anaconda3/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):

gaierror: [Errno 8] nodename nor servname provided, or not known
我也试过:

import ftplib

ftp = ftplib.FTP(download_url)
但这也带来了:

Traceback (most recent call last):

  File "<ipython-input-271-59c5b15a7e34>", line 1, in <module>
    r = requests.get(download_url)

  File "/anaconda3/lib/python3.6/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)

  File "/anaconda3/lib/python3.6/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)

  File "/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)

  File "/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 612, in send
    adapter = self.get_adapter(url=request.url)

  File "/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 703, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)

InvalidSchema: No connection adapters were found for 'ftp://nomads.ncdc.noaa.gov/NARR_monthly/'
  File "<ipython-input-284-60bd19e600fe>", line 1, in <module>
    ftp = ftplib.FTP(download_url)

  File "/anaconda3/lib/python3.6/ftplib.py", line 117, in __init__
    self.connect(host)

  File "/anaconda3/lib/python3.6/ftplib.py", line 152, in connect
    source_address=self.source_address)

  File "/anaconda3/lib/python3.6/socket.py", line 704, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):

  File "/anaconda3/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):

gaierror: [Errno 8] nodename nor servname provided, or not known
文件“”,第1行,在
ftp=ftplib.ftp(下载url)
文件“/anaconda3/lib/python3.6/ftplib.py”,第117行,在__
self.connect(主机)
文件“/anaconda3/lib/python3.6/ftplib.py”,第152行,在connect中
source\u address=self.source\u address)
文件“/anaconda3/lib/python3.6/socket.py”,第704行,在create_connection中
对于getaddrinfo(主机、端口、0、SOCK_流)中的res:
文件“/anaconda3/lib/python3.6/socket.py”,第745行,在getaddrinfo中
对于_socket.getaddrinfo(主机、端口、系列、类型、协议、标志)中的res:
gaierror:[Errno 8]提供了节点名或服务名,或者未知

不幸的是,
请求
不支持FTP链接,但您可以使用内置的
urllib
模块

import urllib.request

download_url = "ftp://nomads.ncdc.noaa.gov/NARR_monthly/"
with urllib.request.urlopen(download_url) as r:
    data = r.read()

print(data)
响应不是html,因此无法使用
BeautifulSoup
解析,但可以使用正则表达式或字符串操作

links = [
    download_url + line.split()[-1] 
    for line in data.decode().splitlines()
]
for link in links:
    print(link)

如果愿意,也可以使用
ftplib
,但只需使用主机名即可。然后,您可以将cd刻录到“NARR_monthly”并获取数据

from ftplib import FTP

with FTP('nomads.ncdc.noaa.gov') as ftp:
    ftp.login() 
    ftp.cwd('NARR_monthly')
    data = ftp.nlst()

path = "ftp://nomads.ncdc.noaa.gov/NARR_monthly/"
links = [path + i for i in data]

有时主机会因为客户端太多而拒绝连接,因此您可能需要使用“除此之外尝试”块。

谢谢,我一直遇到R的主机问题,这很烦人,恐怕我们对连接错误无能为力。我无法帮助您使用R,但是如果您使用的是Python,那么可以在循环中使用try-ecxept块,如果连接成功,则中断。