Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/283.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python中无法将文本数据读入数据框_Python_Pandas_Dataframe - Fatal编程技术网

在python中无法将文本数据读入数据框

在python中无法将文本数据读入数据框,python,pandas,dataframe,Python,Pandas,Dataframe,我正在尝试将文本数据从读取到数据框中。我的代码是: dftxt = """ 0 1 2 1 10/1/2016 'stringvalue' 456 2 NaN 'anothersting' NaN 3 NaN 'and another ' NaN 4 11/1/2016 'more strings' 943 5 NaN 'strings

我正在尝试将文本数据从读取到数据框中。我的代码是:

dftxt = """
    0             1               2
1  10/1/2016    'stringvalue'     456
2  NaN          'anothersting'    NaN
3  NaN          'and another '    NaN
4  11/1/2016    'more strings'    943
5  NaN          'stringstring'    NaN
"""

from io import StringIO
df = pd.read_csv(StringIO(dftxt), sep='\s+')
print (df)
但我得到了以下错误:

Traceback (most recent call last):
  File "mydf.py", line 16, in <module>
    df = pd.read_csv(StringIO(dftxt), sep='\s+')
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 401, in _read
    data = parser.read()
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 939, in read
    ret = self._engine.read(nrows)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1508, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 848, in pandas.parser.TextReader.read (pandas/parser.c:10415)
  File "pandas/parser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:10691)
  File "pandas/parser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandas/parser.c:11437)
  File "pandas/parser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:11308)
  File "pandas/parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas/parser.c:27037)
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 4 fields in line 5, saw 6
回溯(最近一次呼叫最后一次):
文件“mydf.py”,第16行,在
df=pd.read_csv(StringIO(dftxt),sep='\s+')
文件“/usr/lib/python3/dist packages/pandas/io/parsers.py”,第646行,在parser\u f中
返回读取(文件路径或缓冲区,kwds)
文件“/usr/lib/python3/dist packages/pandas/io/parsers.py”,第401行,已读
data=parser.read()
文件“/usr/lib/python3/dist packages/pandas/io/parsers.py”,第939行,已读
ret=自身。\发动机读取(nrows)
文件“/usr/lib/python3/dist packages/pandas/io/parsers.py”,第1508行,已读
数据=自身。\读卡器读取(nrows)
pandas.parser.textleader.read(pandas/parser.c:10415)中的文件“pandas/parser.pyx”,第848行
文件“pandas/parser.pyx”,第870行,位于pandas.parser.TextReader.\u read\u low\u内存中(pandas/parser.c:10691)
文件“pandas/parser.pyx”,第924行,在pandas.parser.TextReader.\u read\u行中(pandas/parser.c:11437)
文件“pandas/parser.pyx”,第911行,位于pandas.parser.TextReader.\u标记化\u行(pandas/parser.c:11308)
pandas.parser.raise_parser_error(pandas/parser.c:27037)中的文件“pandas/parser.pyx”,第2024行
pandas.io.common.CParserError:标记数据时出错。C错误:第5行中预期有4个字段,saw 6

我无法理解错误读取的是哪6个字段:
第5行应该有4个字段,saw 6
。问题在哪里?如何解决?第5行是这个-

 3  NaN          'and another '    NaN
 1   2             3    4     5     6
问题在于你的分离器。它将每个空格分隔的单词解释为一个单独的列。在这种情况下,您需要

  • sep
    参数更改为
    \s{2,}
    ,然后
  • 将引擎更改为
    'python'
    以抑制警告

另外,我会使用
str.strip
-

df.iloc[:, 1] = df.iloc[:, 1].str.strip("'")
df

           0             1      2
1  10/1/2016   stringvalue  456.0
2        NaN  anothersting    NaN
3        NaN  and another     NaN
4  11/1/2016  more strings  943.0
5        NaN  stringstring    NaN


最后,从一个用户到另一个用户,有一个小的便利功能,我想你应该看看。它从剪贴板中读取数据,并接受几乎所有的参数,
read\u csv

它工作得非常好。为什么需要
engine='python'
?如果没有此选项,代码似乎也能正常工作。@非常正确。默认情况下,“python”引擎用于可变宽度分隔符,因为C引擎似乎不支持它。但它仍然发出了警告。显式比隐式好!(还记得Python的禅宗吗。)
df.iloc[:, 1] = df.iloc[:, 1].str.strip("'")
df

           0             1      2
1  10/1/2016   stringvalue  456.0
2        NaN  anothersting    NaN
3        NaN  and another     NaN
4  11/1/2016  more strings  943.0
5        NaN  stringstring    NaN