Jupyter notebook 如何加载iPython中过大的csv文件?

Jupyter notebook 如何加载iPython中过大的csv文件?,jupyter-notebook,bigdata,ipython,Jupyter Notebook,Bigdata,Ipython,如何加载iPython中过大的csv文件?它似乎无法在内存中一次加载。您可以使用此代码将文件分块读取,并且它还将文件分发到多个处理器上 import pandas as pd import multiprocessing as mp LARGE_FILE = "yourfile.csv" CHUNKSIZE = 100000 # processing 100,000 rows at a time def process_frame(df): # process data f

如何加载iPython中过大的csv文件?它似乎无法在内存中一次加载。

您可以使用此代码将文件分块读取,并且它还将文件分发到多个处理器上

import pandas as pd 
import multiprocessing as mp

LARGE_FILE = "yourfile.csv"
CHUNKSIZE = 100000 # processing 100,000 rows at a time

def process_frame(df):
        # process data frame
        return len(df)

if __name__ == '__main__':
        reader = pd.read_csv(LARGE_FILE, chunksize=CHUNKSIZE)
        pool = mp.Pool(4) # use 4 processes

        funclist = []
        for df in reader:
                # process each data frame
                f = pool.apply_async(process_frame,[df])
                funclist.append(f)

        result = 0
        for f in funclist:
                result += f.get(timeout=10) # timeout in 10 seconds

所以csv文件比可用的RAM大?@shuttle87我不知道。这是一个练习,他们建议尝试使用网络浏览器下载数据。但是我不知道怎么做…你在用熊猫吗?如果是,请尝试read_csv方法中的chunksize参数