Python DataFrame无法读取数据
我和熊猫有一个问题,几个月前我还没有。我试图从用户输入中获取一组数据(使用tkinter)并将其放入一个数据帧中。以下是数据的外观:Python DataFrame无法读取数据,python,numpy,pandas,dataframe,Python,Numpy,Pandas,Dataframe,我和熊猫有一个问题,几个月前我还没有。我试图从用户输入中获取一组数据(使用tkinter)并将其放入一个数据帧中。以下是数据的外观: 1.000000 03/27/2016 13:29:26.098 1431.778943 0.092089 1.000000 03/27/2016 13:29:26.298 1432.410517 0.078570 1.000000 03/27/2016 13:29:26.498 1431.905258 0.089538
1.000000 03/27/2016 13:29:26.098 1431.778943 0.092089
1.000000 03/27/2016 13:29:26.298 1432.410517 0.078570
1.000000 03/27/2016 13:29:26.498 1431.905258 0.089538
1.000000 03/27/2016 13:29:26.698 1431.399999 0.080930
5.000000 03/28/2016 00:00:00.098 1289.422164 0.392945
25.000000 03/28/2016 00:00:00.298 1289.295849 0.145016
25.000000 03/28/2016 00:00:00.498 1289.295849 0.183149
25.000000 03/28/2016 00:00:00.698 1288.790590 0.175114
26.000000 03/28/2016 00:25:16.698 1302.053644 0.162170
.....
设置了5列,但数据集中通常有200000到800000行
这是我的密码:
import pandas as pd
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename() #User selects file
file = pd.read_table(file_path, index_col=False)
df = pd.DataFrame(data=file, columns=['Measurement', 'Date', 'Time','CO2', 'Flow'], dtype=object)
print(file_path)
print(file)
print(df)
打印(文件路径)输出正确的路径,打印(文件)显示所有正确的数据,打印(df)显示以下内容:
Measurement Date Time CO2 Flow
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN
.......
我以前也做过同样的事情,但我失去了正在编写的脚本,需要重新开始。它以前工作得很好,但我不确定发生了什么。我尝试了几种方法来修复它:
我用相同的数据制作了一个numpy数组,它工作得很好。我想使用熊猫,因为从长远来看,我认为它会更容易进行分析。我真的希望这是我缺少的一些小东西,但我已经为此工作了一段时间,所以我愿意尝试任何东西。参考的文档,您已经在文件中获得了数据帧 试试这个:
In [71]: f = pd.read_table('table.txt', names=['Measurement', 'Date', 'Time','CO2', 'Flow'])
In [72]: f
Out[72]:
Measurement Date Time CO2 Flow
0 1 03/27/2016 13:29:26.098 1431.778943 0.092089
1 1 03/27/2016 13:29:26.298 1432.410517 0.078570
2 1 03/27/2016 13:29:26.498 1431.905258 0.089538
3 1 03/27/2016 13:29:26.698 1431.399999 0.080930
4 5 03/28/2016 00:00:00.098 1289.422164 0.392945
5 25 03/28/2016 00:00:00.298 1289.295849 0.145016
6 25 03/28/2016 00:00:00.498 1289.295849 0.183149
7 25 03/28/2016 00:00:00.698 1288.790590 0.175114
8 26 03/28/2016 00:25:16.698 1302.053644 0.162170
那你为什么没有得到想要的结果呢?
请注意,在读取表之后,它没有所需的列名
In [77]: file = pd.read_table('table.txt', index_col=False)
In [78]: file
Out[78]:
1.000000 03/27/2016 13:29:26.098 1431.778943 0.092089
0 1 03/27/2016 13:29:26.298 1432.410517 0.078570
1 1 03/27/2016 13:29:26.498 1431.905258 0.089538
2 1 03/27/2016 13:29:26.698 1431.399999 0.080930
3 5 03/28/2016 00:00:00.098 1289.422164 0.392945
4 25 03/28/2016 00:00:00.298 1289.295849 0.145016
5 25 03/28/2016 00:00:00.498 1289.295849 0.183149
6 25 03/28/2016 00:00:00.698 1288.790590 0.175114
7 26 03/28/2016 00:25:16.698 1302.053644 0.162170
因此,当您使用现有的DataFrame和列名调用DataFrame构造函数时,会得到所有空值,因为输入DataFrame中没有名称对应的列
In [80]: df = pd.DataFrame(data=file, columns=['Measurement', 'Date', 'Time','CO2', 'Flow'], dtype=object)
In [81]: df
Out[81]:
Measurement Date Time CO2 Flow
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN
我认为您可以省略
df=pd.DataFrame(data=file,columns=['Measurement','Date','Time','CO2','Flow'],dtype=object)
,因为file
是DataFrame
。当您将旧的df作为数据参数传递给DataFrame构造函数时,实际上是在重新编制索引,如果您传递了一个np数组,那么它将工作:pd.DataFrame(data=file.values,columns=['Measurement'、'Date'、'Time'、'CO2'、'Flow'],dtype=object)
此外,如果您的文件没有列名,您通常可以将所需的名称作为参数传递给read_table
,但通常您必须告诉它没有标题header=None
您的帖子对我的学习做出了重大贡献!谢谢你!非常感谢。我阅读了文档,但阅读的内容超过了“将常规分隔文件读入数据帧”。我认为我仍然需要将读取的文件放入数据帧中。这也是一个很好的解释方式,谢谢!