Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 将csv文件读入python Spyder时出现CParserError_Python 3.x_Csv_Pandas - Fatal编程技术网

Python 3.x 将csv文件读入python Spyder时出现CParserError

Python 3.x 将csv文件读入python Spyder时出现CParserError,python-3.x,csv,pandas,Python 3.x,Csv,Pandas,我正在尝试使用pandas模块将一个大的csv文件(大约17GB)读入python Spyder。这是我的密码 data =pd.read_csv('example.csv', encoding = 'ISO-8859-1') 但我一直收到CParserError错误消息 Traceback (most recent call last): File "<ipython-input-3-3993cadd40d6>", line 1, in <module> data

我正在尝试使用pandas模块将一个大的csv文件(大约17GB)读入python Spyder。这是我的密码

data =pd.read_csv('example.csv', encoding = 'ISO-8859-1')
但我一直收到CParserError错误消息

Traceback (most recent call last):

File "<ipython-input-3-3993cadd40d6>", line 1, in <module>
data =pd.read_csv('newsall.csv', encoding = 'ISO-8859-1')

File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)

File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 325, in _read
return parser.read()

File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 815, in read
ret = self._engine.read(nrows)

File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1314, in read
data = self._reader.read(nrows)

File "pandas\parser.pyx", line 805, in pandas.parser.TextReader.read (pandas\parser.c:8748)

File "pandas\parser.pyx", line 827, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:9003)

File "pandas\parser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandas\parser.c:9731)

File "pandas\parser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:9602)

File "pandas\parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas\parser.c:23325)

CParserError: Error tokenizing data. C error: out of memory
但它不会返回任何结果

data.shape
Out[12]: (0, 0)
然后我尝试了以下代码

data = pd.DataFrame()
reader = pd.read_csv('newsall.csv', encoding = 'ISO-8859-1', chunksize = 10000)
for chunk in reader:
   data.append(chunk, ignore_index=True)
data = pd.DataFrame()
reader = pd.read_csv('newsall.csv', encoding = 'ISO-8859-1', chunksize = 10000)
for chunk in reader:
   data = data.append(chunk, ignore_index=True)
它再次显示内存不足错误,这是trackback

Traceback (most recent call last):

File "<ipython-input-23-ee9021fcc9b4>", line 3, in <module>
for chunk in reader:

File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 795, in __next__
return self.get_chunk()

File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 836, in get_chunk
return self.read(nrows=size)

File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 815, in read
ret = self._engine.read(nrows)

File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1314, in read
data = self._reader.read(nrows)

File "pandas\parser.pyx", line 805, in pandas.parser.TextReader.read (pandas\parser.c:8748)

File "pandas\parser.pyx", line 839, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:9208)

File "pandas\parser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandas\parser.c:9731)

File "pandas\parser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:9602)

File "pandas\parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas\parser.c:23325)

CParserError: Error tokenizing data. C error: out of memory
回溯(最近一次呼叫最后一次):
文件“”,第3行,在
对于读卡器中的块:
文件“I:\Program Files\Anaconda3\lib\site packages\pandas\io\parsers.py”,第795行,下一步__
返回self.get_chunk()
文件“I:\Program Files\Anaconda3\lib\site packages\pandas\io\parsers.py”,第836行,在get\u块中
返回self.read(nrows=size)
文件“I:\Program Files\Anaconda3\lib\site packages\pandas\io\parsers.py”,第815行,已读
ret=自身。\发动机读取(nrows)
文件“I:\Program Files\Anaconda3\lib\site packages\pandas\io\parsers.py”,第1314行,已读
数据=自身。\读卡器读取(nrows)
pandas.parser.textleader.read(pandas\parser.c:8748)中第805行的文件“pandas\parser.pyx”
文件“pandas\parser.pyx”,第839行,位于pandas.parser.TextReader.\u read\u low\u内存中(pandas\parser.c:9208)
文件“pandas\parser.pyx”,第881行,位于pandas.parser.TextReader.\u read\u行(pandas\parser.c:9731)
文件“pandas\parser.pyx”,第868行,位于pandas.parser.TextReader.\u标记化\u行(pandas\parser.c:9602)
pandas.parser.raise_parser_error(pandas\parser.c:23325)中第1865行的文件“pandas\parser.pyx”
CParserError:标记数据时出错。C错误:内存不足

在我看来,你的错误很明显:计算机内存不足。文件本身是17GB,根据经验,当它读取文件时,
pandas
将占用大约两倍的空间。因此,您需要大约34GB的RAM来直接读取这些数据

现在大多数计算机都有4、8或16GB;一些人有32岁。您的计算机内存不足,C会在内存不足时终止进程

您可以通过分块读取数据来解决这个问题,依次对每个数据段执行您想对其执行的任何操作。请参阅参数to
pd.read\u csv
了解更多详细信息,但基本上您需要的是如下所示:

for chunk in pd.read_csv("...", chunksize=10000):
    do_something()

谢谢你的回答。我只是想把数据作为一个数据框来读,做什么的代码应该是什么?这由你来决定。你能看看我编辑的问题吗?它仍然会产生错误。