Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/302.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pandas无法读取锯齿状文本文件的第216行_Python_Pandas - Fatal编程技术网

Python Pandas无法读取锯齿状文本文件的第216行

Python Pandas无法读取锯齿状文本文件的第216行,python,pandas,Python,Pandas,我有一个参差不齐的txt文件(每行的列数不同),我正在尝试用Pandas读取它。出于某种原因,它可以读取前216行,但不能读取前217行 >>> df = pd.read_table("test.txt", names = range(2000), nrows = 216) >>> df = pd.read_table("test.txt", names = range(2000), nrows = 217) Traceback (most recent ca

我有一个参差不齐的txt文件(每行的列数不同),我正在尝试用Pandas读取它。出于某种原因,它可以读取前216行,但不能读取前217行

>>> df = pd.read_table("test.txt", names = range(2000), nrows = 216)
>>> df = pd.read_table("test.txt", names = range(2000), nrows = 217)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/alexwhatley/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 562, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/alexwhatley/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 321, in _read
    return parser.read(nrows)
  File "/Users/alexwhatley/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 815, in read
    ret = self._engine.read(nrows)
  File "/Users/alexwhatley/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 1314, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 805, in pandas.parser.TextReader.read (pandas/parser.c:8748)
  File "pandas/parser.pyx", line 839, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:9208)
  File "pandas/parser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)
  File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)
  File "pandas/parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas/parser.c:23325)
pandas.io.common.CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
df=pd.read_表(“test.txt”,name=range(2000),nrows=216) >>>df=pd.read_表(“test.txt”,name=range(2000),nrows=217) 回溯(最近一次呼叫最后一次): 文件“”,第1行,在 文件“/Users/alexwhatley/anaconda3/lib/python3.5/site packages/pandas/io/parsers.py”,第562行,在解析器中 返回读取(文件路径或缓冲区,kwds) 文件“/Users/alexwhatley/anaconda3/lib/python3.5/site packages/pandas/io/parsers.py”,第321行,已读 返回parser.read(nrows) 文件“/Users/alexwhatley/anaconda3/lib/python3.5/site packages/pandas/io/parsers.py”,第815行,已读 ret=自身。\发动机读取(nrows) 文件“/Users/alexwhatley/anaconda3/lib/python3.5/site packages/pandas/io/parsers.py”,第1314行,已读 数据=自身。\读卡器读取(nrows) pandas.parser.textleader.read(pandas/parser.c:8748)中的文件“pandas/parser.pyx”,第805行 文件“pandas/parser.pyx”,第839行,位于pandas.parser.TextReader.\u read\u low\u内存中(pandas/parser.c:9208) 文件“pandas/parser.pyx”,第881行,位于pandas.parser.TextReader.\u read\u行(pandas/parser.c:9731) 文件“pandas/parser.pyx”,第868行,位于pandas.parser.TextReader.\u标记化\u行(pandas/parser.c:9602) pandas.parser.raise_parser_error(pandas/parser.c:23325)中的文件“pandas/parser.pyx”,第1865行 pandas.io.common.CParserError:标记数据时出错。C错误:捕获到缓冲区溢出-可能是输入文件格式错误。
文件位于此处:。有人知道发生了什么吗?

解决办法是:

import pandas as pd

the_file = []
with open(r"./genes.txt", 'rb') as f:
    for line in f:
        the_file.append(line.split('\t'))

df = pd.DataFrame(the_file,columns=range(max([len(l) for l in the_file])))

print df[0]
结果:

0                       KEGG_GLYCOLYSIS_GLUCONEOGENESIS
1                          KEGG_CITRATE_CYCLE_TCA_CYCLE
2                        KEGG_PENTOSE_PHOSPHATE_PATHWAY
3         KEGG_PENTOSE_AND_GLUCURONATE_INTERCONVERSIONS
4                  KEGG_FRUCTOSE_AND_MANNOSE_METABOLISM
5                             KEGG_GALACTOSE_METABOLISM
6                KEGG_ASCORBATE_AND_ALDARATE_METABOLISM
7                            KEGG_FATTY_ACID_METABOLISM
8                             KEGG_STEROID_BIOSYNTHESIS
9                   KEGG_PRIMARY_BILE_ACID_BIOSYNTHESIS
10                    KEGG_STEROID_HORMONE_BIOSYNTHESIS
11                       KEGG_OXIDATIVE_PHOSPHORYLATION
12                               KEGG_PURINE_METABOLISM
13                           KEGG_PYRIMIDINE_METABOLISM
14      KEGG_ALANINE_ASPARTATE_AND_GLUTAMATE_METABOLISM
15         KEGG_GLYCINE_SERINE_AND_THREONINE_METABOLISM
16              KEGG_CYSTEINE_AND_METHIONINE_METABOLISM
17       KEGG_VALINE_LEUCINE_AND_ISOLEUCINE_DEGRADATION
18      KEGG_VALINE_LEUCINE_AND_ISOLEUCINE_BIOSYNTHESIS
19                              KEGG_LYSINE_DEGRADATION
20                 KEGG_ARGININE_AND_PROLINE_METABOLISM
21                            KEGG_HISTIDINE_METABOLISM
22                             KEGG_TYROSINE_METABOLISM
23                        KEGG_PHENYLALANINE_METABOLISM
24                           KEGG_TRYPTOPHAN_METABOLISM
25                         KEGG_BETA_ALANINE_METABOLISM
26              KEGG_TAURINE_AND_HYPOTAURINE_METABOLISM
27                     KEGG_SELENOAMINO_ACID_METABOLISM
28                          KEGG_GLUTATHIONE_METABOLISM
29                   KEGG_STARCH_AND_SUCROSE_METABOLISM
                             ...                       
425                                      ST_GAQ_PATHWAY
426                                     ST_GA13_PATHWAY
427                                    ST_STAT3_PATHWAY
428                                    SA_FAS_SIGNALING
429                                  SA_G1_AND_S_PHASES
430    SIG_INSULIN_RECEPTOR_PATHWAY_IN_CARDIAC_MYOCYTES
431                       ST_T_CELL_SIGNAL_TRANSDUCTION
432                        ST_TYPE_I_INTERFERON_PATHWAY
433                            ST_PAC1_RECEPTOR_PATHWAY
434                 SIG_PIP3_SIGNALING_IN_B_LYMPHOCYTES
435                           SIG_BCR_SIGNALING_PATHWAY
436                                  SA_G2_AND_M_PHASES
437                          ST_B_CELL_ANTIGEN_RECEPTOR
438                            ST_INTERLEUKIN_4_PATHWAY
439                         ST_WNT_BETA_CATENIN_PATHWAY
440                          SA_MMP_CYTOKINE_CONNECTION
441                                 ST_JNK_MAPK_PATHWAY
442                            SA_PROGRAMMED_CELL_DEATH
443                            ST_FAS_SIGNALING_PATHWAY
444                               ST_MYOCYTE_AD_PATHWAY
445                                     SA_PTEN_PATHWAY
446                       SA_REG_CASCADE_OF_CYCLIN_EXPR
447                                    SA_TRKA_RECEPTOR
448                ST_PHOSPHOINOSITIDE_3_KINASE_PATHWAY
449                                 PID_FANCONI_PATHWAY
450                          PID_SMAD2_3NUCLEAR_PATHWAY
451                                   PID_FCER1_PATHWAY
452                              PID_ENDOTHELIN_PATHWAY
453                                    PID_BCR_5PATHWAY
454                    PID_PRL_SIGNALING_EVENTS_PATHWAY

如果删除文件的前216行并使用
nrows=1
,会发生什么情况?由于文件几乎不包含大约100列,使用2000列有什么特殊原因吗?