Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/309.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python DataFrame无法读取数据_Python_Numpy_Pandas_Dataframe - Fatal编程技术网

Python DataFrame无法读取数据

Python DataFrame无法读取数据,python,numpy,pandas,dataframe,Python,Numpy,Pandas,Dataframe,我和熊猫有一个问题,几个月前我还没有。我试图从用户输入中获取一组数据(使用tkinter)并将其放入一个数据帧中。以下是数据的外观: 1.000000 03/27/2016 13:29:26.098 1431.778943 0.092089 1.000000 03/27/2016 13:29:26.298 1432.410517 0.078570 1.000000 03/27/2016 13:29:26.498 1431.905258 0.089538

我和熊猫有一个问题,几个月前我还没有。我试图从用户输入中获取一组数据(使用tkinter)并将其放入一个数据帧中。以下是数据的外观:

1.000000    03/27/2016   13:29:26.098   1431.778943 0.092089
1.000000    03/27/2016   13:29:26.298   1432.410517 0.078570
1.000000    03/27/2016   13:29:26.498   1431.905258 0.089538
1.000000    03/27/2016   13:29:26.698   1431.399999 0.080930
5.000000    03/28/2016   00:00:00.098   1289.422164 0.392945
25.000000   03/28/2016   00:00:00.298   1289.295849 0.145016
25.000000   03/28/2016   00:00:00.498   1289.295849 0.183149
25.000000   03/28/2016   00:00:00.698   1288.790590 0.175114
26.000000   03/28/2016   00:25:16.698   1302.053644 0.162170
.....
设置了5列,但数据集中通常有200000到800000行

这是我的密码:

import pandas as pd
import tkinter as tk
from tkinter import filedialog

root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename() #User selects file

file = pd.read_table(file_path, index_col=False)
df = pd.DataFrame(data=file, columns=['Measurement', 'Date', 'Time','CO2', 'Flow'], dtype=object)

print(file_path)
print(file)
print(df)
打印(文件路径)输出正确的路径,打印(文件)显示所有正确的数据,打印(df)显示以下内容:

 Measurement Date Time  CO2 Flow
0            NaN  NaN  NaN  NaN  NaN
1            NaN  NaN  NaN  NaN  NaN
2            NaN  NaN  NaN  NaN  NaN
3            NaN  NaN  NaN  NaN  NaN
4            NaN  NaN  NaN  NaN  NaN
5            NaN  NaN  NaN  NaN  NaN
6            NaN  NaN  NaN  NaN  NaN
7            NaN  NaN  NaN  NaN  NaN
8            NaN  NaN  NaN  NaN  NaN
.......
我以前也做过同样的事情,但我失去了正在编写的脚本,需要重新开始。它以前工作得很好,但我不确定发生了什么。我尝试了几种方法来修复它:

  • 将pd.read_表更改为pd.io.parsers.read_表
  • 更改了pd.DataFrame的index=、dtype=、和其他属性
  • 将文件转换为.csv并使用pd.read\u csv
  • 大大缩短了文件
  • 创建一个具有单列的pd.Series并打印,但所有数据点仍具有NaN
  • 我可以轻松地生成一组随机数据,并将其转换为一个pd.DataFrame,没有问题(我在ipython中使用了df2=DataFrame(np.random.randn(10,5)列=['a'、'b'、'c'、'd'、'e']),并且显示正确)


    我用相同的数据制作了一个numpy数组,它工作得很好。我想使用熊猫,因为从长远来看,我认为它会更容易进行分析。我真的希望这是我缺少的一些小东西,但我已经为此工作了一段时间,所以我愿意尝试任何东西。

    参考的文档,您已经在文件中获得了数据帧

    试试这个:

    In [71]: f = pd.read_table('table.txt', names=['Measurement', 'Date', 'Time','CO2', 'Flow'])
    
    In [72]: f
    Out[72]:
       Measurement        Date          Time          CO2      Flow
    0            1  03/27/2016  13:29:26.098  1431.778943  0.092089
    1            1  03/27/2016  13:29:26.298  1432.410517  0.078570
    2            1  03/27/2016  13:29:26.498  1431.905258  0.089538
    3            1  03/27/2016  13:29:26.698  1431.399999  0.080930
    4            5  03/28/2016  00:00:00.098  1289.422164  0.392945
    5           25  03/28/2016  00:00:00.298  1289.295849  0.145016
    6           25  03/28/2016  00:00:00.498  1289.295849  0.183149
    7           25  03/28/2016  00:00:00.698  1288.790590  0.175114
    8           26  03/28/2016  00:25:16.698  1302.053644  0.162170
    
    那你为什么没有得到想要的结果呢? 请注意,在读取表之后,它没有所需的列名

    In [77]: file = pd.read_table('table.txt', index_col=False)
    
    In [78]: file
    Out[78]:
       1.000000  03/27/2016  13:29:26.098  1431.778943  0.092089
    0         1  03/27/2016  13:29:26.298  1432.410517  0.078570
    1         1  03/27/2016  13:29:26.498  1431.905258  0.089538
    2         1  03/27/2016  13:29:26.698  1431.399999  0.080930
    3         5  03/28/2016  00:00:00.098  1289.422164  0.392945
    4        25  03/28/2016  00:00:00.298  1289.295849  0.145016
    5        25  03/28/2016  00:00:00.498  1289.295849  0.183149
    6        25  03/28/2016  00:00:00.698  1288.790590  0.175114
    7        26  03/28/2016  00:25:16.698  1302.053644  0.162170
    
    因此,当您使用现有的DataFrame和列名调用DataFrame构造函数时,会得到所有空值,因为输入DataFrame中没有名称对应的列

    In [80]: df = pd.DataFrame(data=file, columns=['Measurement', 'Date', 'Time','CO2', 'Flow'], dtype=object)
    
    In [81]: df
    Out[81]:
      Measurement Date Time  CO2 Flow
    0         NaN  NaN  NaN  NaN  NaN
    1         NaN  NaN  NaN  NaN  NaN
    2         NaN  NaN  NaN  NaN  NaN
    3         NaN  NaN  NaN  NaN  NaN
    4         NaN  NaN  NaN  NaN  NaN
    5         NaN  NaN  NaN  NaN  NaN
    6         NaN  NaN  NaN  NaN  NaN
    7         NaN  NaN  NaN  NaN  NaN
    

    我认为您可以省略
    df=pd.DataFrame(data=file,columns=['Measurement','Date','Time','CO2','Flow'],dtype=object)
    ,因为
    file
    DataFrame
    。当您将旧的df作为数据参数传递给DataFrame构造函数时,实际上是在重新编制索引,如果您传递了一个np数组,那么它将工作:
    pd.DataFrame(data=file.values,columns=['Measurement'、'Date'、'Time'、'CO2'、'Flow'],dtype=object)
    此外,如果您的文件没有列名,您通常可以将所需的名称作为参数传递给
    read_table
    ,但通常您必须告诉它没有标题
    header=None
    您的帖子对我的学习做出了重大贡献!谢谢你!非常感谢。我阅读了文档,但阅读的内容超过了“将常规分隔文件读入数据帧”。我认为我仍然需要将读取的文件放入数据帧中。这也是一个很好的解释方式,谢谢!