Python:标记数据时出错。C错误：调用源上的读取（nbytes）失败，输入nzip文件_Python_Pandas

Python:标记数据时出错。C错误：调用源上的读取（nbytes）失败，输入nzip文件

python pandas

Python:标记数据时出错。C错误：调用源上的读取（nbytes）失败，输入nzip文件,python,pandas,Python,Pandas,我正在使用conda python 2.7 我使用fallowing方法读取大型gzip文件： df = pd.read_csv(os.path.join(filePath, fileName), sep='|', compression = 'gzip', dtype='unicode', error_bad_lines=False) 但是当我读取文件时，我得到以下错误： pandas.parser.CParserError: Error tokenizing data. C er

我正在使用conda python 2.7

我使用fallowing方法读取大型gzip文件：

df = pd.read_csv(os.path.join(filePath, fileName),
     sep='|', compression = 'gzip', dtype='unicode', error_bad_lines=False)

但是当我读取文件时，我得到以下错误：

pandas.parser.CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.
Segmentation fault: 11

我阅读了所有现有的答案，但大多数问题都有错误，比如增加了专栏。我已经在用error\u bad\u lines=False选项处理这个问题了

我的选择是什么

在尝试解压缩文件时发现了一些有趣的内容：

gunzip -k myfile.txt.gz 
gunzip: myfile.txt.gz: unexpected end of file
gunzip: myfile.txt.gz: uncompress failed

输入zip文件已损坏。从源代码处获取此文件的正确副本。在将其传递给pandas之前，请尝试使用zip修复工具。

我没有真正找到python解决方案，但使用unix工具我设法找到了解决方案：

首先，我使用zless myfile.txt.gz>uncompressedMyfile.txt 然后我使用sed工具删除最后一行，因为我清楚地看到最后一行已损坏

sed“$d”解压缩文件.txt

我再次压缩了文件gzip-k uncompressedMyfile.txt

我能够使用以下python代码成功读取该文件：

try:
    df = pd.read_csv(os.path.join(filePath, fileName),
                        sep='|', compression = 'gzip', dtype='unicode', error_bad_lines=False)
except CParserError:
    print "Something wrong the file"
return df

如果文件已打开，则有时会显示错误。尝试关闭文件并重新运行

很可能您放置的路径实际上是文件夹路径，而不是需要读取的文件路径

Pandas.read\u csv无法读取文件夹，需要明确的兼容文件名。

如果没有数据文件或示例数据，我们无法知道。您是否尝试过使用Python引擎选项读取数据，正如错误消息所建议的那样？您是否尝试按照错误消息所建议的那样添加engine='python'？PS:pandas version？@Boud pandas 0.17.1 np110py27_0 conda给出的内容也尝试了引擎？在我的情况下，文件夹与文件同名。home/file.csv/file.csv xDD

try:
    df = pd.read_csv(os.path.join(filePath, fileName),
                        sep='|', compression = 'gzip', dtype='unicode', error_bad_lines=False)
except CParserError:
    print "Something wrong the file"
return df