Python 大熊猫的花式索引_Python_Pandas

Python 大熊猫的花式索引

python pandas

Python 大熊猫的花式索引,python,pandas,Python,Pandas,我有一个pandas.Series对象，它的层次索引由两个级别组成：（code，date）。我还有一张地图{date->code}。我希望只按日期索引一个序列，这样每个日期的代码都会在提供的映射中查找，然后在原始序列中查找该对（代码，日期）。在熊猫身上实现这一点的最佳方式是什么？非常感谢你的帮助简短回答：通常NDFrame（如系列）是按标签索引的。但也可以逐个索引NDFrame。也就是说，您可以使用索引对NDFrame进行索引因此，将dict转换为多索引。使用多索引从列表中选择行系列：系

我有一个pandas.Series对象，它的层次索引由两个级别组成：（code，date）。我还有一张地图{date->code}。我希望只按日期索引一个序列，这样每个日期的代码都会在提供的映射中查找，然后在原始序列中查找该对（代码，日期）。在熊猫身上实现这一点的最佳方式是什么？

非常感谢你的帮助

简短回答：通常NDFrame（如系列）是按标签索引的。但也可以逐个索引NDFrame。也就是说，您可以使用索引对NDFrame进行索引

因此，将dict转换为多索引。使用多索引从列表中选择行系列：

系列[索引]

假设您的系列如下所示：

import numpy as np
import pandas as pd
np.random.seed(0)

N, M = 3, 5
big_dates = pd.date_range('2000-1-1', periods=M, freq='D')
index = pd.MultiIndex.from_product([np.arange(N), big_dates])
series = pd.Series(np.random.randint(10, size=N*M), index=index)
print(series)
# 0  2000-01-01    5
#    2000-01-02    0
#    2000-01-03    3
#    2000-01-04    3
#    2000-01-05    7
# 1  2000-01-01    9
#    2000-01-02    3
#    2000-01-03    5
#    2000-01-04    2
#    2000-01-05    4
# 2  2000-01-01    7
#    2000-01-02    6
#    2000-01-03    8
#    2000-01-04    8
#    2000-01-05    1
# dtype: int64

dates = pd.date_range('2000-1-1', periods=N, freq='D')
codes = np.arange(N)
np.random.shuffle(codes)
codemap = dict(zip(dates, codes))
# {Timestamp('2000-01-01 00:00:00', offset='D'): 0,
#  Timestamp('2000-01-02 00:00:00', offset='D'): 1,
#  Timestamp('2000-01-03 00:00:00', offset='D'): 2}

假设dict（我们称之为

codemap

）如下所示：

import numpy as np
import pandas as pd
np.random.seed(0)

N, M = 3, 5
big_dates = pd.date_range('2000-1-1', periods=M, freq='D')
index = pd.MultiIndex.from_product([np.arange(N), big_dates])
series = pd.Series(np.random.randint(10, size=N*M), index=index)
print(series)
# 0  2000-01-01    5
#    2000-01-02    0
#    2000-01-03    3
#    2000-01-04    3
#    2000-01-05    7
# 1  2000-01-01    9
#    2000-01-02    3
#    2000-01-03    5
#    2000-01-04    2
#    2000-01-05    4
# 2  2000-01-01    7
#    2000-01-02    6
#    2000-01-03    8
#    2000-01-04    8
#    2000-01-05    1
# dtype: int64

dates = pd.date_range('2000-1-1', periods=N, freq='D')
codes = np.arange(N)
np.random.shuffle(codes)
codemap = dict(zip(dates, codes))
# {Timestamp('2000-01-01 00:00:00', offset='D'): 0,
#  Timestamp('2000-01-02 00:00:00', offset='D'): 1,
#  Timestamp('2000-01-03 00:00:00', offset='D'): 2}

然后，您可以从

codemap

dict中形成第二个多索引：

codemap_index = pd.MultiIndex.from_arrays([codemap.values(), codemap.keys()])

并使用它为系列编制索引：

result = series[codemap_index]
# 0  2000-01-01    5
# 1  2000-01-02    3
# 2  2000-01-03    8
# dtype: int64

最后，使用droplevel删除索引中的代码级别：

result.index = result.index.droplevel(0)
print(result)

屈服

2000-01-01    5
2000-01-02    3
2000-01-03    8
dtype: int64

令人惊叹的！非常感谢，Ubuntu。我不知道的是我可以用索引来索引一个系列。