Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
为什么在使用Python脚本读取或写入Hadoop文件系统时会出现这些奇怪的连接错误?_Python_Hadoop_Read Write - Fatal编程技术网

为什么在使用Python脚本读取或写入Hadoop文件系统时会出现这些奇怪的连接错误?

为什么在使用Python脚本读取或写入Hadoop文件系统时会出现这些奇怪的连接错误?,python,hadoop,read-write,Python,Hadoop,Read Write,我编写了一个python代码,用IPhdfs\u IP读写hadoop文件系统。它需要3个参数- 本地文件路径 远程文件路径 读还是写 当我想要运行它时,我首先需要导出namenodeip,然后使用3个参数运行python代码。然而,我得到了一些奇怪的错误 $ export namenode='http://hdfs_ip1:50470,http://hdfs_ip2:50470' $ python3 python_hdfs.py ./1.png /user/testuser/new_1.png

我编写了一个python代码,用IP
hdfs\u IP
读写hadoop文件系统。它需要3个参数-

  • 本地文件路径
  • 远程文件路径
  • 读还是写
  • 当我想要运行它时,我首先需要导出namenodeip,然后使用3个参数运行python代码。然而,我得到了一些奇怪的错误

    $ export namenode='http://hdfs_ip1:50470,http://hdfs_ip2:50470'
    $ python3 python_hdfs.py ./1.png /user/testuser/new_1.png read
    ['http://hdfs_ip1:50470', 'http://hdfs_ip2:50470']
    ./1.png /user/testuser/new_1.png
    http://hdfs_ip1:50470
    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 377, in _make_request
        httplib_response = conn.getresponse(buffering=True)
    TypeError: getresponse() got an unexpected keyword argument 'buffering'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
        body=body, headers=headers)
      File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 379, in _make_request
        httplib_response = conn.getresponse()
      File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
        response.begin()
      File "/usr/lib/python3.5/http/client.py", line 297, in begin
        version, status, reason = self._read_status()
      File "/usr/lib/python3.5/http/client.py", line 279, in _read_status
        raise BadStatusLine(line)
    http.client.BadStatusLine: 
    
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
        timeout=timeout
      File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
        _stacktrace=sys.exc_info()[2])
      File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 247, in increment
        raise six.reraise(type(error), error, _stacktrace)
      File "/usr/lib/python3/dist-packages/six.py", line 685, in reraise
        raise value.with_traceback(tb)
      File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
        body=body, headers=headers)
      File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 379, in _make_request
        httplib_response = conn.getresponse()
      File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
        response.begin()
      File "/usr/lib/python3.5/http/client.py", line 297, in begin
        version, status, reason = self._read_status()
      File "/usr/lib/python3.5/http/client.py", line 279, in _read_status
        raise BadStatusLine(line)
    requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', BadStatusLine('\x15\x03\x03\x00\x02\x02\n',))
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "python_hdfs.py", line 63, in <module>
        status, name, nnaddress= check_node_status(node)
      File "python_hdfs.py", line 18, in check_node_status
        request = requests.get("%s/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus"%name,verify=False).json()
      File "/usr/lib/python3/dist-packages/requests/api.py", line 67, in get
        return request('get', url, params=params, **kwargs)
      File "/usr/lib/python3/dist-packages/requests/api.py", line 53, in request
        return session.request(method=method, url=url, **kwargs)
      File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request
        resp = self.send(prep, **send_kwargs)
      File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send
        r = adapter.send(request, **kwargs)
      File "/usr/lib/python3/dist-packages/requests/adapters.py", line 426, in send
        raise ConnectionError(err, request=request)
    requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine('\x15\x03\x03\x00\x02\x02\n',))
    

    在用urllib替换请求模块时,我遇到了相同的错误。问题是异常跟踪在概念上顺序错误。真正的问题是跟踪中的最后一个异常

    更多信息请点击此处:

    在您的情况下,python_hdfs.py中出现了连接错误 这才是真正的问题

    import requests
    import json
    import os
    import kerberos
    import sys
    
    node = os.getenv("namenode").split(",")
    print (node)
    
    local_file_path = sys.argv[1]
    remote_file_path = sys.argv[2]
    read_or_write = sys.argv[3]
    print (local_file_path,remote_file_path)
    
    def check_node_status(node):
        for name in node:
            print (name)
            request = requests.get("%s/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus"%name,
                                   verify=False).json()
            status = request["beans"][0]["State"]
            if status =="active":
                nnhost = request["beans"][0]["HostAndPort"]
                splitaddr = nnhost.split(":")
                nnaddress = splitaddr[0]
                print(nnaddress)
                break
        return status,name,nnaddress
    
    def kerberos_auth(nnaddress):
        __, krb_context = kerberos.authGSSClientInit("HTTP@%s"%nnaddress)
        kerberos.authGSSClientStep(krb_context, "")
        negotiate_details = kerberos.authGSSClientResponse(krb_context)
        headers = {"Authorization": "Negotiate " + negotiate_details,
                    "Content-Type":"application/binary"}
        return headers
    
    def kerberos_hdfs_upload(status,name,headers):
        print("running upload function")
        if status =="active":
            print("if function")
            data=open('%s'%local_file_path, 'rb').read()
            write_req = requests.put("%s/webhdfs/v1%s?op=CREATE&overwrite=true"%(name,remote_file_path),
                                     headers=headers,
                                     verify=False, 
                                     allow_redirects=True,
                                     data=data)
            print(write_req.text)
    
    def kerberos_hdfs_read(status,name,headers):
        if status == "active":
            read = requests.get("%s/webhdfs/v1%s?op=OPEN"%(name,remote_file_path),
                                headers=headers,
                                verify=False,
                                allow_redirects=True)
    
            if read.status_code == 200:
                data=open('%s'%local_file_path, 'wb')
                data.write(read.content)
                data.close()
            else : 
                print(read.content)
    
    
    status, name, nnaddress= check_node_status(node)
    headers = kerberos_auth(nnaddress)
    if read_or_write == "write":
        kerberos_hdfs_upload(status,name,headers)
    elif read_or_write == "read":
        print("fun")
        kerberos_hdfs_read(status,name,headers)