Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/351.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫";pandas.errors.ParserError:标记数据时出错。C错误:IO回调中出现未知错误;_Python_Windows_Pandas_Python 3.6_Tokenize - Fatal编程技术网

Python 熊猫";pandas.errors.ParserError:标记数据时出错。C错误:IO回调中出现未知错误;

Python 熊猫";pandas.errors.ParserError:标记数据时出错。C错误:IO回调中出现未知错误;,python,windows,pandas,python-3.6,tokenize,Python,Windows,Pandas,Python 3.6,Tokenize,我正在使用pandas读取\u csv一个3.8gig的文本文件,以管道分隔,但在将文件读入内存时出错 下面是我的read\u in\u files()函数抛出的完整错误: 错误: 用谷歌搜索这个特定的错误,只会得到程序中错误处理的源代码 第583-612行: 在一个更强大的服务器上进行测试后,我现在意识到这个错误显然是由于我的4 GB文件需要25到35 GB的空闲RAM,其中有114列。这实际上会引发内存不足错误,但我认为RAM中的增量超过了标记器代码检查内存不足程度的能力。代码在哪里出错?你

我正在使用pandas
读取\u csv
一个3.8gig的文本文件,以管道分隔,但在将文件读入内存时出错

下面是我的
read\u in\u files()
函数抛出的完整错误:

错误: 用谷歌搜索这个特定的错误,只会得到程序中错误处理的源代码

第583-612行:
在一个更强大的服务器上进行测试后,我现在意识到这个错误显然是由于我的4 GB文件需要25到35 GB的空闲RAM,其中有114列。这实际上会引发内存不足错误,但我认为RAM中的增量超过了标记器代码检查内存不足程度的能力。

代码在哪里出错?你能把不必要的线路移走吗?@MattR,我刚才就这么做了。
Reading in file C:\Users\cdabel\Desktop\_Temp\Master_Extract_Data_Mart_201909240935.txt
Traceback (most recent call last):
  File "<stdin>", line 10, in <module>
  File "<stdin>", line 7, in read_in_files
  File "c:\python36\lib\site-packages\pandas\io\parsers.py", line 685, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "c:\python36\lib\site-packages\pandas\io\parsers.py", line 463, in _read
    data = parser.read(nrows)
  File "c:\python36\lib\site-packages\pandas\io\parsers.py", line 1154, in read
    ret = self._engine.read(nrows)
  File "c:\python36\lib\site-packages\pandas\io\parsers.py", line 2048, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 879, in pandas._libs.parsers.TextReader.read
  File "pandas\_libs\parsers.pyx", line 894, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas\_libs\parsers.pyx", line 948, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 935, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas\_libs\parsers.pyx", line 2130, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Unknown error in IO callback
import os
import pandas as pd

# File
filepath = "C:\\Temp\\datafile.txt"
filename_w_ext = "datafile.txt"

# Read in TXT file
def read_in_files(filepath, filename_w_ext):
    filename, file_ext = os.path.splitext(filename_w_ext)
    print('Reading in file {}'.format(filepath))
    with open(filepath, "r", newline='') as file:
        global df_data
        # Here's where it errors:
        df_data = pd.read_csv(file, dtype=str, sep='|')
        return df_data.columns.values.tolist(), df_data.values.tolist()
static int parser_buffer_bytes(parser_t *self, size_t nbytes) {
    int status;
    size_t bytes_read;

    status = 0;
    self->datapos = 0;
    self->data = self->cb_io(self->source, nbytes, &bytes_read, &status);
    TRACE((
        "parser_buffer_bytes self->cb_io: nbytes=%zu, datalen: %d, status=%d\n",
        nbytes, bytes_read, status));
    self->datalen = bytes_read;

    if (status != REACHED_EOF && self->data == NULL) {
        int64_t bufsize = 200;
        self->error_msg = (char *)malloc(bufsize);

        if (status == CALLING_READ_FAILED) {
            snprintf(self->error_msg, bufsize,
                     "Calling read(nbytes) on source failed. "
                     "Try engine='python'.");
        } else {
            snprintf(self->error_msg, bufsize, "Unknown error in IO callback");
        }
        return -1;
    }

    TRACE(("datalen: %d\n", self->datalen));

    return status;
}