Python 3.x 从pydoop访问hdfs群集

Python 3.x 从pydoop访问hdfs群集,python-3.x,filesystems,hdfs,Python 3.x,Filesystems,Hdfs,我在同一个谷歌云平台上有hdfs集群和python。我想从python访问hdfs集群中存在的文件。我发现使用pydoop可以做到这一点,但我正在努力为它提供正确的参数。以下是我迄今为止尝试过的代码:- import pydoop.hdfs as hdfs import pydoop pydoop.hdfs.hdfs(host='url of the file system goes here', port=9864, user=None, groups=No

我在同一个谷歌云平台上有hdfs集群和python。我想从python访问hdfs集群中存在的文件。我发现使用pydoop可以做到这一点,但我正在努力为它提供正确的参数。以下是我迄今为止尝试过的代码:-

import pydoop.hdfs as hdfs
import pydoop

pydoop.hdfs.hdfs(host='url of the file system goes here',
                 port=9864, user=None, groups=None)

"""
 class pydoop.hdfs.hdfs(host='default', port=0, user=None, groups=None)

    A handle to an HDFS instance.

    Parameters

            host (str) – hostname or IP address of the HDFS NameNode. Set to an empty string (and port to 0) to connect to the local file system; set to 'default' (and port to 0) to connect to the default (i.e., the one defined in the Hadoop configuration files) file system.

            port (int) – the port on which the NameNode is listening

            user (str) – the Hadoop domain user name. Defaults to the current UNIX user. Note that, in MapReduce applications, since tasks are spawned by the JobTracker, the default user will be the one that started the JobTracker itself.

            groups (list) – ignored. Included for backwards compatibility.


"""

#print (hdfs.ls("/vs_co2_all_2019_v1.csv"))
它给出了以下错误:-

RuntimeError: Hadoop config not found, try setting HADOOP_CONF_DIR
如果我执行这行代码:-

print (hdfs.ls("/vs_co2_all_2019_v1.csv"))
什么也没发生。但是这个“vs_co2_all_2019_v1.csv”文件确实存在于集群中,但在我拍摄屏幕截图时,它目前不可用

我的hdfs屏幕截图如下所示:

我拥有的证书如下所示:


谁能告诉我我做错了什么?我需要在pydoop api中放置哪些凭据?或者也许有另一种更简单的方法来解决这个问题,任何帮助都将不胜感激

问题:正常的hdfs端口是8020。为什么您的请求显示端口9864?因为在我的hdfs集群中,它显示端口是9864,当我在地址栏中浏览到google cloud/dataproc/clusters/hdfs_nodename时,它显示端口:9864。