Python 通过使用globals()作为变量引用在多个数据帧中循环排序datetime
我有20个以下格式的数据帧,但有140000行左右。数据格式为“%Y/%m/%d”或YYYY/MM/DDPython 通过使用globals()作为变量引用在多个数据帧中循环排序datetime,python,pandas,sorting,datetime,globals,Python,Pandas,Sorting,Datetime,Globals,我有20个以下格式的数据帧,但有140000行左右。数据格式为“%Y/%m/%d”或YYYY/MM/DD In [1]: data1 = pd.DataFrame({'Day': ['2020-04-07','2020-04-07', '2020-04-07','2020-08-11','2020-08-11','2020-08-11','2020-06-14','2020-06-14','2020-06-14'], 'Time': ['2
In [1]: data1 = pd.DataFrame({'Day': ['2020-04-07','2020-04-07', '2020-04-07','2020-08-11','2020-08-11','2020-08-11','2020-06-14','2020-06-14','2020-06-14'],
'Time': ['23:41:18', '23:42:56', '23:44:34','10:23:10','15:24:46','10:24:13','23:41:18','23:42:56','23:44:34'],
'V': [1044.865, 1044.889, 1044.914,320.014,320.033,320.018,1044.865,1044.889,1044.914]})
data2 = pd.DataFrame({'Day': ['2020-04-07','2020-04-07', '2020-04-07','2020-08-11','2020-08-11','2020-08-11','2020-06-14','2020-06-14','2020-06-14'],
'Time': ['23:41:18', '23:42:56', '23:44:34','10:23:10','15:24:46','10:24:13','23:41:18','23:42:56','23:44:34'],
'V': [1044.865, 1044.887, 1044.914,320.014,320.033,320.018,1044.865,1044.889,1044.914]})
data3 = pd.DataFrame({'Day': ['2020-04-07','2020-04-07', '2020-04-07','2020-08-11','2020-08-11','2020-08-11','2020-06-14','2020-06-14','2020-06-14'],
'Time': ['23:41:18', '23:42:56', '23:44:34','10:23:10','15:24:46','10:24:13','23:41:18','23:42:56','23:44:34'],
'V': [1044.865, 1044.888, 1044.914,320.014,320.033,320.018,1044.865,1044.889,1044.914]})
In [2]:data2.head(15)
Out[2]:
Day Time V
0 2020-04-07 23:41:18 1044.865
1 2020-04-07 23:42:56 1044.887
2 2020-04-07 23:44:34 1044.914
3 2020-08-11 10:23:10 320.014
4 2020-08-11 15:24:46 320.033
5 2020-08-11 10:24:13 320.018
6 2020-06-14 23:41:18 1044.865
7 2020-06-14 23:42:56 1044.889
8 2020-06-14 23:44:34 1044.914
我正在使用下面的循环尝试按“Day”列中的日期对数据帧进行排序,然后按“Time”列对数据帧进行排序。在我的实际数据帧中,每分钟大约有3个度量值
我的目的是不必键入数据帧名称20次,我发现这是一种非常适合.index.drop().reset_index()属性的解决方案。
但由于某些原因,不能使用此循环中显示的.sort_values():
In [3]:for n in range(1,4,1):
globals()["data" + str(n)]['Day'] = pd.to_datetime(globals()["data" + str(n)]['Day'],
format = '%Y/%m/%d')
globals()["data" + str(n)].sort_values(by=['Day','Time'])
globals()["data" + str(n)]['Day'] = globals()["data" + str(n)]['Day'].astype(str)
Out[3]:
Day Time V
0 2020-04-07 23:41:18 1044.865
1 2020-04-07 23:42:56 1044.887
2 2020-04-07 23:44:34 1044.914
3 2020-08-11 10:23:10 320.014
4 2020-08-11 15:24:46 320.033
5 2020-08-11 10:24:13 320.018
6 2020-06-14 23:41:18 1044.865
7 2020-06-14 23:42:56 1044.889
8 2020-06-14 23:44:34 1044.914
但是,如果我只是使用循环将“Date”列设置为“datetime”,然后通过手动键入数据帧名称来使用.sort_values(),它就会起作用
In [4]:for n in range(1,4,1):
globals()["data" + str(n)]['Day'] = pd.to_datetime(globals()["data" + str(n)]['Day'],
format = '%Y/%m/%d')
data2.sort_values(by=['Day','Time'])
Out[3]:
Day Time V
0 2020-04-07 23:41:18 1044.865
1 2020-04-07 23:42:56 1044.887
2 2020-04-07 23:44:34 1044.914
6 2020-06-14 23:41:18 1044.865
7 2020-06-14 23:42:56 1044.889
8 2020-06-14 23:44:34 1044.914
3 2020-08-11 10:23:10 320.014
4 2020-08-11 10:24:13 320.018
5 2020-08-11 15:24:46 320.033
关于如何使这项工作更具动态性,您有什么建议吗?第一件事:停止使用
globals()
!这几乎从来都不合适。对于您的用例,字典或常规列表可能是合适的。我将对此进行研究,谢谢链接!您必须在sort_值内使用temp变量或inplace参数。