Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/288.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/clojure/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Dataframe:从时间戳列获取唯一值_Python_Pandas_Dataframe - Fatal编程技术网

Python Dataframe:从时间戳列获取唯一值

Python Dataframe:从时间戳列获取唯一值,python,pandas,dataframe,Python,Pandas,Dataframe,我的时间序列数据如下所示: 1998-01-02 09:30:00,0.4298,0.4337,0.4258,0.4317,6426369 1999-01-02 09:45:00,0.4317,0.4337,0.4258,0.4298,10589080 2000-01-02 10:00:00,0.4298,0.4337,0.4278,0.4337,9507980 2001-01-02 10:15:00,0.4337,0.4416,0.4298,0.4416,13639022 我想要的是一张年表

我的时间序列数据如下所示:

1998-01-02 09:30:00,0.4298,0.4337,0.4258,0.4317,6426369
1999-01-02 09:45:00,0.4317,0.4337,0.4258,0.4298,10589080
2000-01-02 10:00:00,0.4298,0.4337,0.4278,0.4337,9507980
2001-01-02 10:15:00,0.4337,0.4416,0.4298,0.4416,13639022
我想要的是一张年表

years=list['1998'、'1999'、'2000'、'2001']
因此,我可以使用该列表来了解我可以在该数据框中查询的年份。并不是所有的数据帧都有相同的年份

data=pd.read\u csv(str(inFileName),index\u col=0,parse\u dates=True,header=None)
#data.iloc[:,0]
打印(pd.DatetimeIndex(data.iloc[:,0])。年)
#打印(data.iloc[:,0])
#年份=列表(数据索引)
#印刷品(年)
对于x年:
我尝试了很多事情,但都没有成功。有人能给我解释一下如何解决这样的问题吗

编辑1:在一些建议之后,我正在这样做:

data = pd.read_csv(str(inFileName), parse_dates=[0], header=None)
  data.iloc[:, 0] = pd.to_datetime(data.iloc[:, 0])
  data['year'] = data.iloc[:, 0].apply(lambda x: x.year)
  year_list = data['year'].unique().tolist()
  print(year_list)
  for x in year_list:
    newDF = data[x]
    newDF.head()

    print(newDF.head(5))
我得到了名单:
[2017、2018、2019]

但是我无法从列表中创建新的数据帧。我想为列表中的每个值创建一个新的数据帧。我发现错误:

[2017, 2018, 2019]

Traceback (most recent call last):
  File "/home/jason/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 2017

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./massageSM.py", line 123, in <module>
    main(sys.argv[1:])
  File "./massageSM.py", line 33, in main
    newDF = data[x]
  File "/home/jason/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2688, in __getitem__
    return self._getitem_column(key)
  File "/home/jason/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
    return self._get_item_cache(key)
  File "/home/jason/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
    values = self._data.get(item)
  File "/home/jason/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/internals.py", line 4115, in get
    loc = self.items.get_loc(item)
  File "/home/jason/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 2017
并且它产生输出:

[2017, 2018, 2019]
   years
0   2017
1   2018
2   2019
   years
0   2017
1   2018
2   2019
   years
0   2017
1   2018
2   2019
但我想要的是创造: 仅2017年的数据帧 2018年刚刚推出的数据帧 仅2019年的数据帧

但我不能硬编码,因为其他文件不会包含相同的年份。我需要列出可用的年份,并反复浏览

编辑3: 我也尝试过:

data = pd.read_csv("RHE.SM", header=None, parse_dates=[0])
year_list = data[0].dt.year.unique().tolist()
print(year_list)
data.index = pd.DatetimeIndex(data[0])
print(type(data.index))
print(data.index)

for x in year_list:
    print(x)
    newDF = data[x]
    #newDF.head()

    #print(newDF.head(5))
我得到了以下输出,它一开始很好,但随后我在创建newDF时出错

[2017, 2018, 2019]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
DatetimeIndex(['2017-10-02 10:15:00', '2017-10-02 10:30:00',
               '2017-10-02 10:45:00', '2017-10-02 11:00:00',
               '2017-10-02 11:15:00', '2017-10-02 11:30:00',
               '2017-10-02 11:45:00', '2017-10-02 12:00:00',
               '2017-10-02 12:15:00', '2017-10-02 12:30:00',
               ...
               '2019-01-03 14:45:00', '2019-01-03 15:00:00',
               '2019-01-03 15:15:00', '2019-01-03 15:30:00',
               '2019-01-03 15:45:00', '2019-01-03 16:00:00',
               '2019-01-03 16:30:00', '2019-01-03 16:45:00',
               '2019-01-03 17:15:00', '2019-01-03 18:30:00'],
              dtype='datetime64[ns]', name=0, length=8685, freq=None)
2017

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3077             try:
-> 3078                 return self._engine.get_loc(key)
   3079             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 2017

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-19-f31493ccbf2a> in <module>
      9 for x in year_list:
     10     print(x)
---> 11     newDF = data[x]
     12     #newDF.head()
     13 

~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2686             return self._getitem_multilevel(key)
   2687         else:
-> 2688             return self._getitem_column(key)
   2689 
   2690     def _getitem_column(self, key):

~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2693         # get column
   2694         if self.columns.is_unique:
-> 2695             return self._get_item_cache(key)
   2696 
   2697         # duplicate columns & possible reduce dimensionality

~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   2487         res = cache.get(item)
   2488         if res is None:
-> 2489             values = self._data.get(item)
   2490             res = self._box_item_values(item, values)
   2491             cache[item] = res

~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   4113 
   4114             if not isna(item):
-> 4115                 loc = self.items.get_loc(item)
   4116             else:
   4117                 indexer = np.arange(len(self.items))[isna(self.items)]

~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3078                 return self._engine.get_loc(key)
   3079             except KeyError:
-> 3080                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   3081 
   3082         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 2017
[2017、2018、2019]
DatetimeIndex(['2017-10-02 10:15:00','2017-10-02 10:30:00',
'2017-10-02 10:45:00', '2017-10-02 11:00:00',
'2017-10-02 11:15:00', '2017-10-02 11:30:00',
'2017-10-02 11:45:00', '2017-10-02 12:00:00',
'2017-10-02 12:15:00', '2017-10-02 12:30:00',
...
'2019-01-03 14:45:00', '2019-01-03 15:00:00',
'2019-01-03 15:15:00', '2019-01-03 15:30:00',
'2019-01-03 15:45:00', '2019-01-03 16:00:00',
'2019-01-03 16:30:00', '2019-01-03 16:45:00',
'2019-01-03 17:15:00', '2019-01-03 18:30:00'],
dtype='datetime64[ns]',name=0,length=8685,freq=None)
2017
---------------------------------------------------------------------------
KeyError回溯(最近一次呼叫最后一次)
get_loc中的~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/index/base.py(self、key、method、tolerance)
3077尝试:
->3078返回发动机。获取位置(钥匙)
3079键错误除外:
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.Int64HashTable.get_item()中
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.Int64HashTable.get_item()中
关键错误:2017年
在处理上述异常期间,发生了另一个异常:
KeyError回溯(最近一次呼叫最后一次)
在里面
9对于年度清单中的x:
10份打印件(x)
--->11 newDF=数据[x]
12#newDF.head()
13
~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in\uuuu getitem\uuuu(self,key)
2686返回自我。\u获取项目\u多级(键)
2687其他:
->2688返回自我。\u获取项目\u列(键)
2689
2690 def_getitem_列(自身,键):
~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py(self,key)
2693#获取列
2694如果self.columns.u是唯一的:
->2695返回自我。获取项目缓存(密钥)
2696
2697#重复列和可能的降维
缓存中的~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py(self,item)
2487 res=cache.get(项)
2488如果res为无:
->2489 values=self.\u data.get(项目)
2490 res=自身。\框\项\值(项,值)
2491缓存[项目]=res
get中的~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/internals.py(self、item、fastpath)
4113
4114如果不是isna(项目):
->4115 loc=自身项目。获取loc(项目)
4116其他:
4117索引器=np.arange(len(self.items))[isna(self.items)]
get_loc中的~/Applications/anaconda3/lib/python3.7/site-packages/pandas/core/index/base.py(self、key、method、tolerance)
3078返回发动机。获取位置(钥匙)
3079键错误除外:
->3080返回自我。引擎。获取位置(自我。可能施法索引器(键))
3081
3082索引器=自身。获取索引器([key],方法=方法,公差=公差)
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.Int64HashTable.get_item()中
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.Int64HashTable.get_item()中
关键错误:2017年

我还没有测试过这个,但我认为它对您有用

data.iloc[:, 0] = pd.to_datetime(data.iloc[:, 0])
data['year'] = data.iloc[:, 0].apply(lambda x: x.year)
year_list = data['year'].unique().tolist()
它首先将第一列转换为日期时间格式。然后,它创建一个新列,其中只包含每个DateTime的year组件。最后,它将输出该列中每个唯一值的列表

如果还希望将结果列表转换为新的数据帧,只需在以下内容后添加此行:

df = pd.DataFrame({'years':year_list})
编辑如果要将列表中的每个项目转换为新的数据帧,可以添加以下内容:

df = []
for x in year_list:
    df.append(pd.DataFrame({'years':[x]}))

在您的情况下,最简单的方法是:

data = pd.read_csv(inFileName, header=None, parse_dates=[0])
data[0].dt.year.unique().tolist()

这就利用了快速且矢量化的

首先,您需要确保从
datetime
类型中提取年份。假设您知道存储日期的列的名称,则可以执行以下操作:

df['datetime'] = pd.to_datetime(df['datetime'])
df['year'] = df['datetime'].apply(lambda x: x.year)
如果日期在索引中,则执行以下操作
df['datetime'] = pd.to_datetime(df.reset_index()['index'])
df['datetime'] = pd.to_datetime(df['datetime'])
df['year'] = df['datetime'].apply(lambda x: x.year)
years =  df['year'].unique().tolist()
dfs = {
    year: sub_df.drop(columns=["year"])
    for year, sub_df in data.assign(year=lambda df: df[0].dt.year)\
                            .groupby("year")
}
{1998:                     0       1       2       3       4        5
 0 1998-01-02 09:30:00  0.4298  0.4337  0.4258  0.4317  6426369,
 1999:                     0       1       2       3       4         5
 1 1999-01-02 09:45:00  0.4317  0.4337  0.4258  0.4298  10589080,
 2000:                     0       1       2       3       4        5
 2 2000-01-02 10:00:00  0.4298  0.4337  0.4278  0.4337  9507980,
 2001:                     0       1       2       3       4         5
 3 2001-01-02 10:15:00  0.4337  0.4416  0.4298  0.4416  13639022}
for year, df in dfs.items():
    filename = "base_name_{}.csv".format(year)
    df.to_csv(filename, index=False)