Python 按时间单位而不是按行迭代数据帧_Python_Pandas_Dataframe

Python 按时间单位而不是按行迭代数据帧

python pandas dataframe

Python 按时间单位而不是按行迭代数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个pandas.DataFrame如下所示：时间（分钟）列2列1 420 1 5 420 2 10 420 3 8 421 1 4 421 2 9 421 3 7 我知道如何使用iterro

我有一个

pandas.DataFrame

如下所示：

时间（分钟）列2列1
420              1             5
420              2             10
420              3             8
421              1             4
421              2             9
421              3             7

我知道如何使用iterrows（）逐行迭代，但是有没有一种有效的方法可以在列（time）中按时间单位进行迭代，以便在每次迭代中使用给定时间的数据？比如：

time = 420
while(time <= max_time):
   temp <- fetch the sub-dataframe for given time
   process(temp)
   update original df with temp #guaranteed it won't affect any other rows other than the current set of rows
   time += 1

time=420
while（time您可以使用按时间而不是按行进行迭代，如：
代码：
测试代码：
结果：
有两种方法。第一种方法基本上保持迭代格式，即手动对数据帧进行子集划分：
for time in df['time_minutes'].unique():
    temp = df.loc[df['time_minutes'] == time] 
    process(temp)
    # or alternatively, make your changes directly on temp (depending what they are),
    # for example, something like this:
    # df.loc[df['time_minutes'] == time, 'some_column_name'] = assign_something_here

另一种可能更有效的方法是使用上面所建议的groupby
，process（temp）
在这种情况下做什么？看起来您可以从.groupby（）…有没有有效的方法更新旧表（即用新组替换旧组？）
df = pd.read_fwf(StringIO(u"""
    Time(minutes)    column2       column1
    420              1             5
    420              2             10
    420              3             8
    421              1             4
    421              2             9
    421              3             7"""), header=1)

print(df)
for grp in df.groupby('Time(minutes)'):
    print(grp)

   Time(minutes)  column2  column1
0            420        1        5
1            420        2       10
2            420        3        8
3            421        1        4
4            421        2        9
5            421        3        7

(420,    Time(minutes)  column2  column1
0            420        1        5
1            420        2       10
2            420        3        8)
(421,    Time(minutes)  column2  column1
3            421        1        4
4            421        2        9
5            421        3        7)

for time in df['time_minutes'].unique():
    temp = df.loc[df['time_minutes'] == time] 
    process(temp)
    # or alternatively, make your changes directly on temp (depending what they are),
    # for example, something like this:
    # df.loc[df['time_minutes'] == time, 'some_column_name'] = assign_something_here