Python 使用范围索引和日期列获取Pandas数据框中每个元素的最新值？_Python_Pandas_Where_Dataframe_Restrict

Python 使用范围索引和日期列获取Pandas数据框中每个元素的最新值？

python pandas dataframe

Python 使用范围索引和日期列获取Pandas数据框中每个元素的最新值？,python,pandas,where,dataframe,restrict,Python,Pandas,Where,Dataframe,Restrict,我有一个示例数据帧： df = pd.DataFrame(data=[('foo', datetime.date(2014, 10, 1)), ('foo', datetime.date(2014, 10, 2)), ('bar', datetime.date(2014, 10, 3)), ('bar', datetime.date(2014, 1

我有一个示例数据帧：

df = pd.DataFrame(data=[('foo', datetime.date(2014, 10, 1)), 
                        ('foo', datetime.date(2014, 10, 2)), 
                        ('bar', datetime.date(2014, 10, 3)), 
                        ('bar', datetime.date(2014, 10, 1))], 
                  columns=('name', 'date'))

看起来是这样的：

  name        date
0  foo  2014-10-01
1  foo  2014-10-02
2  bar  2014-10-03
3  bar  2014-10-01

pd[latest_name]

  name        date
0  foo  2014-10-01
1  foo  2014-10-02
2  bar  2014-10-03
3  bar  2014-10-01

last = df.sort(columns=('date',)).drop_duplicates(cols=('name',), take_last=True)
# note cols is deprecated in more recent versions of pandas,
# and you should use subset='name' if available to you

我想将dataframe限制为name列中每个元素的最后一个事件，如何做到这一点

我可以笨拙地（至少我认为这会很尴尬）构造一个boolean Series对象来执行此操作，并将其传递给数据帧的

\uuu getitem\uuuu

，如下所示：

  name        date
0  foo  2014-10-01
1  foo  2014-10-02
2  bar  2014-10-03
3  bar  2014-10-01

pd[latest_name]

  name        date
0  foo  2014-10-01
1  foo  2014-10-02
2  bar  2014-10-03
3  bar  2014-10-01

last = df.sort(columns=('date',)).drop_duplicates(cols=('name',), take_last=True)
# note cols is deprecated in more recent versions of pandas,
# and you should use subset='name' if available to you

如何最优雅地获取每个

名称元素的最新条目？
一位同事刚才提出了一个与此非常相似的问题
使用如下数据帧对象：
  name        date
0  foo  2014-10-01
1  foo  2014-10-02
2  bar  2014-10-03
3  bar  2014-10-01

pd[latest_name]

  name        date
0  foo  2014-10-01
1  foo  2014-10-02
2  bar  2014-10-03
3  bar  2014-10-01

last = df.sort(columns=('date',)).drop_duplicates(cols=('name',), take_last=True)
# note cols is deprecated in more recent versions of pandas,
# and you should use subset='name' if available to you

您可以按日期排序，然后删除重复项，保留最后一个，如下所示：
  name        date
0  foo  2014-10-01
1  foo  2014-10-02
2  bar  2014-10-03
3  bar  2014-10-01

pd[latest_name]

  name        date
0  foo  2014-10-01
1  foo  2014-10-02
2  bar  2014-10-03
3  bar  2014-10-01

last = df.sort(columns=('date',)).drop_duplicates(cols=('name',), take_last=True)
# note cols is deprecated in more recent versions of pandas,
# and you should use subset='name' if available to you

最后一个
现在是：
  name        date
1  foo  2014-10-02
2  bar  2014-10-03

[2 rows x 2 columns]

           name
date           
2014-10-02  foo
2014-10-03  bar

但如果我们可以删除旧索引，然后只按索引排序，则最好将日期设置为索引：
df = df.set_index('date')
df = df.sort_index() # inplace=True is deprecated, so must assign

df
现在返回：
           name
date           
2014-10-01  foo
2014-10-01  bar
2014-10-02  foo
2014-10-03  bar

现在，让我们来看最后一个元素：
last_elements_frame = df.drop_duplicates(take_last=True)

最后一个元素帧现在是：
  name        date
1  foo  2014-10-02
2  bar  2014-10-03

[2 rows x 2 columns]

           name
date           
2014-10-02  foo
2014-10-03  bar

六羟甲基三聚氰胺六甲醚。。。两票赞成，两票反对。我想知道为什么？