Python 从数据框中选择天_Python_Pandas_Dataframe

Python 从数据框中选择天

python pandas dataframe

Python 从数据框中选择天,python,pandas,dataframe,Python,Pandas,Dataframe,我有这样一个熊猫数据框： ╔════════════╦═══════╗ ║ DATE ║ VALUE ║ ╠════════════╬═══════╣ ║ 2011-01-07 ║ 1 ║ ╠════════════╬═══════╣ ║ 2011-01-08 ║ 2 ║ ╠════════════╬═══════╣ ║ 2011-01-09 ║ 1 ║ ╠════════════╬═══════╣ ║ 2011-01-10 ║ 1 ║ ╠══════

我有这样一个熊猫数据框：

╔════════════╦═══════╗ ║ DATE ║ VALUE ║ ╠════════════╬═══════╣ ║ 2011-01-07 ║ 1 ║ ╠════════════╬═══════╣ ║ 2011-01-08 ║ 2 ║ ╠════════════╬═══════╣ ║ 2011-01-09 ║ 1 ║ ╠════════════╬═══════╣ ║ 2011-01-10 ║ 1 ║ ╠════════════╬═══════╣ ║ 2011-01-20 ║ 1 ║ ╠════════════╬═══════╣ ║ 2011-01-20 ║ 1 ║ ╚════════════╩═══════╝ 我要完成的是以下数据帧：

╔════════════╦═══════╗ ║ DATE ║ VALUE ║ ╠════════════╬═══════╣ ║ 2011-01-09 ║ 1 ║ ╠════════════╬═══════╣ ║ 2011-01-10 ║ 1 ║ ╠════════════╬═══════╣ ║ 2011-01-20 ║ 1 ║ ╠════════════╬═══════╣ ║ 2011-01-20 ║ 1 ║ ╚════════════╩═══════╝ ╔════════════╦═══════╗ ║ 日期║ 价值║ ╠════════════╬═══════╣ ║ 2011-01-09 ║ 1.║ ╠════════════╬═══════╣ ║ 2011-01-10 ║ 1.║ ╠════════════╬═══════╣ ║ 2011-01-20 ║ 1.║ ╠════════════╬═══════╣ ║ 2011-01-20 ║ 1.║ ╚════════════╩═══════╝

我不想做的是

groupby

或对数据帧进行重采样或诸如此类的事情，因为我需要为下面的处理保留结构。有人知道我如何解决这个问题吗？提前谢谢

您可以创建一个连续的id列，以便每个日期都有一个唯一的id，该id随日期增加，然后根据id列创建子集：

import pandas as pd
# sort the `DATE` column and create an id for each date
df['DATE'] = pd.to_datetime(df.DATE).sort_values()
df['DateId'] = df.groupby('DATE').grouper.group_info[0]

# find out the id for the target date
MaxId = df.DateId[df.DATE == '2011-01-20'].drop_duplicates().values

# subset based on the id column and the MaxId
df.loc[df.DateId.isin(range(MaxId - 2, MaxId + 1)),['DATE', 'VALUE']]

#         DATE  VALUE
# 2 2011-01-09      1
# 3 2011-01-10      1
# 4 2011-01-20      1
# 5 2011-01-20      1

用这个试试提示：

df.ix（启动、停止）

不清楚你想要实现什么。。。您想选择

2011-01-17

和

2011-01-20

之间的所有日期吗？在这种情况下，我不了解您想要的数据集…不，我不想对数据帧重新采样。我想在2011-01-20之前（但包括之前）获得三个不同的日期，并保留所有多次出现的日期，形成上表所示的数据框。这是一个多么聪明的想法。它解决了我的问题。谢谢！我喜欢你的“排名”理念！我想我们也可以使用rank（）方法-

df.assign（date\u rank=df.date.rank（method='min'）。astype（int））

@MaxU

rank（）

方法是解决这个问题的更简洁的方法。我以前不知道这个方法。答案肯定是对的。

import pandas as pd
# sort the `DATE` column and create an id for each date
df['DATE'] = pd.to_datetime(df.DATE).sort_values()
df['DateId'] = df.groupby('DATE').grouper.group_info[0]

# find out the id for the target date
MaxId = df.DateId[df.DATE == '2011-01-20'].drop_duplicates().values

# subset based on the id column and the MaxId
df.loc[df.DateId.isin(range(MaxId - 2, MaxId + 1)),['DATE', 'VALUE']]

#         DATE  VALUE
# 2 2011-01-09      1
# 3 2011-01-10      1
# 4 2011-01-20      1
# 5 2011-01-20      1

df['Date'] =pd.to_datetime(df['Date']).sort_values()
df.ix[df[df.Date =='2011-01-20'].index[0]-2: max(df[df.Date =='2011-01-20'].index)] 

       Date   Value
2 2011-01-09      1
3 2011-01-10      1
4 2011-01-20      1
5 2011-01-20      1
6 2011-01-20      1