Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/346.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在使用Python时,如果不在本地下载FTP文件,如何计算FTP文件中的行数_Python_Ftp - Fatal编程技术网

在使用Python时,如果不在本地下载FTP文件,如何计算FTP文件中的行数

在使用Python时,如果不在本地下载FTP文件,如何计算FTP文件中的行数,python,ftp,Python,Ftp,因此,在使用Python时,我需要能够从FTP服务器读取和计算行数,而无需将其下载到本地机器 我知道连接到服务器的代码: ftp = ftplib.FTP('example.com') //Object ftp set as server address ftp.login ('username' , 'password') // Login info ftp.retrlines('LIST') // List file directories ftp.cwd ('/parent fold

因此,在使用Python时,我需要能够从FTP服务器读取和计算行数,而无需将其下载到本地机器

我知道连接到服务器的代码:

ftp = ftplib.FTP('example.com')  //Object ftp set as server address
ftp.login ('username' , 'password')  // Login info
ftp.retrlines('LIST')  // List file directories
ftp.cwd ('/parent folder/another folder/file/')  //Change file directory
我还知道计算行数的基本代码,如果行数已在本地下载/存储:

with open('file') as f:  
...     count = sum(1 for line in f)  
...     print (count)                 
我只需要知道如何连接这两段代码,而不必将文件下载到本地系统

感谢您的帮助。
谢谢

据我所知,FTP不提供任何类型的功能来读取文件内容而不实际下载。但是,您可以尝试使用 (您尚未指定正在使用的python)


请将此代码仅作为参考

有一种方法:我修改了一段代码,该代码是我“动态”为csv文件创建的。通过生产者-消费者问题方法实现。应用此模式允许我们将每个任务分配给一个线程(或进程),并显示大型远程文件的部分结果。您可以将其调整为ftp请求

下载流保存在队列中,并“在运行中”使用。。不需要额外的硬盘空间,内存效率高。在Fedora Core 25 x86_64上以Python 3.5.2(香草版)进行测试

这是适用于ftp(通过http)检索的源:

from threading import Thread, Event
from queue import Queue, Empty
import urllib.request,sys,csv,io,os,time;
import argparse

FILE_URL = 'http://cdiac.ornl.gov/ftp/ndp030/CSV-FILES/nation.1751_2010.csv'


def download_task(url,chunk_queue,event):

    CHUNK = 1*1024
    response = urllib.request.urlopen(url)
    event.clear()

    print('%% - Starting Download  - %%')
    print('%% - ------------------ - %%')
    '''VT100 control codes.'''
    CURSOR_UP_ONE = '\x1b[1A'
    ERASE_LINE = '\x1b[2K'
    while True:
        chunk = response.read(CHUNK)
        if not chunk:
            print('%% - Download completed - %%')
            event.set()
            break
        chunk_queue.put(chunk)

def count_task(chunk_queue, event):
    part = False
    time.sleep(5) #give some time to producer
    M=0
    contador = 0
    '''VT100 control codes.'''
    CURSOR_UP_ONE = '\x1b[1A'
    ERASE_LINE = '\x1b[2K'
    while True:
        try:
            #Default behavior of queue allows getting elements from it and block if queue is Empty.
            #In this case I set argument block=False. When queue.get() and queue Empty ocurrs not block and throws a 
            #queue.Empty exception that I use for show partial result of process.
            chunk = chunk_queue.get(block=False)
            for line in chunk.splitlines(True):
                if line.endswith(b'\n'):
                    if part: ##for treat last line of chunk (normally is a part of line)
                        line = linepart + line
                        part = False
                    M += 1
                else: 
                ##if line not contains '\n' is last line of chunk. 
                ##a part of line which is completed in next interation over next chunk
                    part = True
                    linepart = line
        except Empty:
            # QUEUE EMPTY 
            print(CURSOR_UP_ONE + ERASE_LINE + CURSOR_UP_ONE)
            print(CURSOR_UP_ONE + ERASE_LINE + CURSOR_UP_ONE)
            print('Downloading records ...')
            if M>0:
                print('Partial result:  Lines: %d ' % M) #M-1 because M contains header
            if (event.is_set()): #'THE END: no elements in queue and download finished (even is set)'
                print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
                print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
                print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
                print('The consumer has waited %s times' % str(contador))
                print('RECORDS = ', M)
                break
            contador += 1
            time.sleep(1) #(give some time for loading more records) 

def main():


    chunk_queue = Queue()
    event = Event()
    args = parse_args()
    url = args.url

    p1 = Thread(target=download_task, args=(url,chunk_queue,event,))
    p1.start()
    p2 = Thread(target=count_task, args=(chunk_queue,event,))
    p2.start()
    p1.join()
    p2.join()

# The user of this module can customized one parameter:
#   + URL where the remote file can be found.

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-u', '--url', default=FILE_URL,
                        help='remote-csv-file URL')
    return parser.parse_args()


if __name__ == '__main__':
    main()
用法


Github上的Csv版本:

您仍然需要下载数据来计算行数;您只能避免将其保存在磁盘上。注意:如果在Windows中运行此脚本,VT100控制台控制代码,则只能在VT100兼容的终端上运行。这远远超出了我对Python的了解(我仍在学习),我需要一些时间来弄清楚如何使用它。感谢您的帮助:)有了一些关于多线程和队列数据结构的基本知识,代码非常简单(我也在Python学习的第一阶段);-)。
from threading import Thread, Event
from queue import Queue, Empty
import urllib.request,sys,csv,io,os,time;
import argparse

FILE_URL = 'http://cdiac.ornl.gov/ftp/ndp030/CSV-FILES/nation.1751_2010.csv'


def download_task(url,chunk_queue,event):

    CHUNK = 1*1024
    response = urllib.request.urlopen(url)
    event.clear()

    print('%% - Starting Download  - %%')
    print('%% - ------------------ - %%')
    '''VT100 control codes.'''
    CURSOR_UP_ONE = '\x1b[1A'
    ERASE_LINE = '\x1b[2K'
    while True:
        chunk = response.read(CHUNK)
        if not chunk:
            print('%% - Download completed - %%')
            event.set()
            break
        chunk_queue.put(chunk)

def count_task(chunk_queue, event):
    part = False
    time.sleep(5) #give some time to producer
    M=0
    contador = 0
    '''VT100 control codes.'''
    CURSOR_UP_ONE = '\x1b[1A'
    ERASE_LINE = '\x1b[2K'
    while True:
        try:
            #Default behavior of queue allows getting elements from it and block if queue is Empty.
            #In this case I set argument block=False. When queue.get() and queue Empty ocurrs not block and throws a 
            #queue.Empty exception that I use for show partial result of process.
            chunk = chunk_queue.get(block=False)
            for line in chunk.splitlines(True):
                if line.endswith(b'\n'):
                    if part: ##for treat last line of chunk (normally is a part of line)
                        line = linepart + line
                        part = False
                    M += 1
                else: 
                ##if line not contains '\n' is last line of chunk. 
                ##a part of line which is completed in next interation over next chunk
                    part = True
                    linepart = line
        except Empty:
            # QUEUE EMPTY 
            print(CURSOR_UP_ONE + ERASE_LINE + CURSOR_UP_ONE)
            print(CURSOR_UP_ONE + ERASE_LINE + CURSOR_UP_ONE)
            print('Downloading records ...')
            if M>0:
                print('Partial result:  Lines: %d ' % M) #M-1 because M contains header
            if (event.is_set()): #'THE END: no elements in queue and download finished (even is set)'
                print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
                print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
                print(CURSOR_UP_ONE + ERASE_LINE+ CURSOR_UP_ONE)
                print('The consumer has waited %s times' % str(contador))
                print('RECORDS = ', M)
                break
            contador += 1
            time.sleep(1) #(give some time for loading more records) 

def main():


    chunk_queue = Queue()
    event = Event()
    args = parse_args()
    url = args.url

    p1 = Thread(target=download_task, args=(url,chunk_queue,event,))
    p1.start()
    p2 = Thread(target=count_task, args=(chunk_queue,event,))
    p2.start()
    p1.join()
    p2.join()

# The user of this module can customized one parameter:
#   + URL where the remote file can be found.

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-u', '--url', default=FILE_URL,
                        help='remote-csv-file URL')
    return parser.parse_args()


if __name__ == '__main__':
    main()
$ python ftp-data.py -u <ftp-file>
python ftp-data-ol.py -u 'http://cdiac.ornl.gov/ftp/ndp030/CSV-FILES/nation.1751_2010.csv' 
The consumer has waited 0 times
RECORDS =  16327