Python 如何修复打开简单txt文件时出现的以下异常？_Python_Pandas_Jupyter

Python 如何修复打开简单txt文件时出现的以下异常？

python pandas

Python 如何修复打开简单txt文件时出现的以下异常？,python,pandas,jupyter,Python,Pandas,Jupyter,你好，我正在尝试jupyter，我安装了pandas、python和jupyter，为了检查是否一切正常，我尝试使用pandas打开一个txt文件，如下所示： import pandas as pd df=pd.read_csv("/authorprof/res_es.txt", sep=" ", header = None) Running testing authorid Running training authorprof [[325 301] [236 191] [294

你好，我正在尝试jupyter，我安装了pandas、python和jupyter，为了检查是否一切正常，我尝试使用pandas打开一个txt文件，如下所示：

import pandas as pd


df=pd.read_csv("/authorprof/res_es.txt", sep=" ", header = None)

Running testing authorid
Running training authorprof
[[325 301]
 [236 191]
 [294 274]
 [354 357]
 [237 241]
 [344 335]
 [419 401]
 [312 286]
 [209 206]

txt文件如下所示：

import pandas as pd


df=pd.read_csv("/authorprof/res_es.txt", sep=" ", header = None)

Running testing authorid
Running training authorprof
[[325 301]
 [236 191]
 [294 274]
 [354 357]
 [237 241]
 [344 335]
 [419 401]
 [312 286]
 [209 206]

但是，我得到以下例外情况：

-----------------------------------------------------------------------
CParserError                          Traceback (most recent call last)
<ipython-input-26-c970702c41ed> in <module>()
      3 print(sys.version)
      4 print(pd.__version__)
----> 5 df=pd.read_csv("/authorprof/res_es.txt", sep=" ", header = None)
      6 
      7 

/home/neo/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    560                     skip_blank_lines=skip_blank_lines)
    561 
--> 562         return _read(filepath_or_buffer, kwds)
    563 
    564     parser_f.__name__ = name

/home/neo/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    323         return parser
    324 
--> 325     return parser.read()
    326 
    327 _parser_defaults = {

/home/neo/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py in read(self, nrows)
    813                 raise ValueError('skip_footer not supported for iteration')
    814 
--> 815         ret = self._engine.read(nrows)
    816 
    817         if self.options.get('as_recarray'):

/home/neo/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py in read(self, nrows)
   1312     def read(self, nrows=None):
   1313         try:
-> 1314             data = self._reader.read(nrows)
   1315         except StopIteration:
   1316             if self._first_chunk:

pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:8748)()

pandas/parser.pyx in pandas.parser.TextReader._read_low_memory (pandas/parser.c:9003)()

pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)()

pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)()

pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:23325)()

CParserError: Error tokenizing data. C error: Expected 3 fields in line 71, saw 5

我想您需要参数

skiprows

来省略

txt

中的前两行：

df=pd.read_csv("/authorprof/res_es.txt", sep="s\+", header = None, skiprows=2)

样本：

import pandas as pd
import numpy as np
from pandas.compat import StringIO

temp=u"""Running testing authorid
Running training authorprof
[[325 301]
 [236 191]
 [294 274]
 [354 357]
 [237 241]
 [344 335]
 [419 401]
 [312 286]
 [209 206]"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp),  delim_whitespace=True, header = None, skiprows=2)

print (df)
       0     1
0  [[325  301]
1   [236  191]
2   [294  274]
3   [354  357]
4   [237  241]
5   [344  335]
6   [419  401]
7   [312  286]
8   [209  206]

我想您需要参数

skiprows

来省略

txt

中的前两行：

df=pd.read_csv("/authorprof/res_es.txt", sep="s\+", header = None, skiprows=2)

样本：

import pandas as pd
import numpy as np
from pandas.compat import StringIO

temp=u"""Running testing authorid
Running training authorprof
[[325 301]
 [236 191]
 [294 274]
 [354 357]
 [237 241]
 [344 335]
 [419 401]
 [312 286]
 [209 206]"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp),  delim_whitespace=True, header = None, skiprows=2)

print (df)
       0     1
0  [[325  301]
1   [236  191]
2   [294  274]
3   [354  357]
4   [237  241]
5   [344  335]
6   [419  401]
7   [312  286]
8   [209  206]

感谢您的支持，但我得到了相同的错误：-->5 df=pd.read_csv（“/authorprof/res_es.txt”，sep=“”，header=None，skiprows=2）/home/neo/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py在解析器中（文件路径或缓冲区、sep、分隔符、标题、名称、索引列、usecols、挤压、前缀、重复、数据类型、引擎、转换器、true值、false值、SKIPINITALSPACE、skiprows、skipfooter、nrows、na值、保留默认值、na过滤器、冗余、跳过空白行、解析日期、推断日期时间格式，我的数据没有定义的结构内容包括：feats/spanish/user759.txt[u'00'，u'000'，u'10'，u'100'，u'11'，u'12'，u'13'，u'14'，u'15'，u'17'，u'18'，u'19'，u'20'，u'2011'，你知道熊猫是否能够阅读这样的东西吗？也许这是问题的根源，好吧，我认为最好的办法是分享你的文件，如果不向gdocs、dropbox、wetrasfer吐露秘密的话。或者给我发电子邮件（在我的个人资料中）好的，这只是一个测试，我已经了解到，当txt没有定义结构时，它在读取时会产生错误，感谢支持，我会接受你的答案非常有用，感谢支持，但我得到了相同的错误：-->5 df=pd.read\u csv（“/authorprof/res\es.txt”，sep=“”，header=None，skiprows=2）/解析器中的home/neo/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py（文件路径或缓冲区、sep、分隔符、标题、名称、索引列、usecols、挤压、前缀、重复、数据类型、引擎、转换器、true值、false值、SKIPINITALSPACE、skiprows、skipfooter、nrows、na值、保留默认值、na过滤器、冗余、跳过空白行、解析日期、推断日期时间格式，我的数据没有定义的结构内容包括：feats/spanish/user759.txt[u'00'，u'000'，u'10'，u'100'，u'11'，u'12'，u'13'，u'14'，u'15'，u'17'，u'18'，u'19'，u'20'，u'2011'，你知道熊猫是否能够阅读这样的东西吗？也许这是问题的根源，好吧，我认为最好的办法是分享你的文件，如果不向gdocs、dropbox、wetrasfer吐露秘密的话。或者给我发电子邮件（在我的个人资料中）好的，这只是一个测试，我已经了解到，当txt没有定义的结构时，它会在读取时产生错误，感谢您的支持，我会接受您的答案非常有用，