Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/336.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在熊猫中的分组数据帧之间减去值_Python_Pandas_Dataframe_Data Analysis - Fatal编程技术网

Python 在熊猫中的分组数据帧之间减去值

Python 在熊猫中的分组数据帧之间减去值,python,pandas,dataframe,data-analysis,Python,Pandas,Dataframe,Data Analysis,我有一组ID和时间戳,并希望通过获得最早/最早时间戳的差值来计算“每个ID经过的总时间”,按ID分组 资料 预期结果 (我希望增量转换为分钟) 我有一个for循环,但它非常慢(1米以上行10分钟以上)。我想知道这是否可以通过功能实现 # gb returns a DataFrameGroupedBy object, grouped by ID gb = df.groupby(['id']) # Create the resulting df cycletime = pd.DataFrame(c

我有一组ID和时间戳,并希望通过获得最早/最早时间戳的差值来计算“每个ID经过的总时间”,按ID分组

资料

预期结果

(我希望增量转换为分钟)

我有一个for循环,但它非常慢(1米以上行10分钟以上)。我想知道这是否可以通过功能实现

# gb returns a DataFrameGroupedBy object, grouped by ID
gb = df.groupby(['id'])

# Create the resulting df
cycletime = pd.DataFrame(columns=['id','timeDeltaMin'])

def calculate_delta():
    for id, groupdf in gb:
        time = groupdf.timestamp
        # returns timestamp rows for the current id

        time_delta = time.max() - time.min()

        # convert Timedelta object to minutes
        time_delta = time_delta / pd.Timedelta(minutes=1) 

        # insert result to cycletime df
        cycletime.loc[-1] = [id,time_delta]
        cycletime.index += 1
考虑下一步尝试:

-多处理

您可以按
id
tiemstamp
排序,然后按
id
分组,然后查找每组的最小和最大时间戳之间的差异

df['timestamp'] = pd.to_datetime(df['timestamp'])
result = df.sort_values(['id']).groupby('id')['timestamp'].agg(['min', 'max'])
result['diff'] = (result['max']-result['min']) / np.timedelta64(1, 'm')
result.reset_index()[['id', 'diff']]
输出:

    id  diff
0   1   1.0
1   2   62.0

您可以按
id
tiemstamp
排序,然后按
id
分组,然后查找每个组的最小和最大时间戳之间的差异

df['timestamp'] = pd.to_datetime(df['timestamp'])
result = df.sort_values(['id']).groupby('id')['timestamp'].agg(['min', 'max'])
result['diff'] = (result['max']-result['min']) / np.timedelta64(1, 'm')
result.reset_index()[['id', 'diff']]
输出:

    id  diff
0   1   1.0
1   2   62.0

首先确保日期时间正常:

df.timestamp = pd.to_datetime(df.timestamp)
现在在每个id的最大和最小值之间的差值中查找分钟数:

import numpy as np

>>> (df.timestamp.groupby(df.id).max() - df.timestamp.groupby(df.id).min()) / np.timedelta64(1, 'm')
id
1     1.0
2    62.0
Name: timestamp, dtype: float64

首先确保日期时间正常:

df.timestamp = pd.to_datetime(df.timestamp)
现在在每个id的最大和最小值之间的差值中查找分钟数:

import numpy as np

>>> (df.timestamp.groupby(df.id).max() - df.timestamp.groupby(df.id).min()) / np.timedelta64(1, 'm')
id
1     1.0
2    62.0
Name: timestamp, dtype: float64
另一个:

import pandas as pd
import numpy as np
import datetime
ids = [1,1,2,2,2]
times = ['2018-02-01 03:00:00','2018-02-01 03:01:00','2018-02-02 
10:03:00','2018-02-02 10:04:00','2018-02-02 11:05:00']
df = pd.DataFrame({'id':ids,'timestamp':pd.to_datetime(pd.Series(times))})
df.set_index('id', inplace=True)
print(df.groupby(level=0).diff().sum(level=0)['timestamp'].dt.seconds/60)
另一个:

import pandas as pd
import numpy as np
import datetime
ids = [1,1,2,2,2]
times = ['2018-02-01 03:00:00','2018-02-01 03:01:00','2018-02-02 
10:03:00','2018-02-02 10:04:00','2018-02-02 11:05:00']
df = pd.DataFrame({'id':ids,'timestamp':pd.to_datetime(pd.Series(times))})
df.set_index('id', inplace=True)
print(df.groupby(level=0).diff().sum(level=0)['timestamp'].dt.seconds/60)