Python 组合Groupby函数代码,带和不带grouper

Python 组合Groupby函数代码,带和不带grouper,python,pandas,group-by,Python,Pandas,Group By,我已经在我的数据集上编写了这两个groupby函数,第一个将我的数据分组,并将数据的日期时间分隔为开始日期时间和结束日期时间 这是数据集: 炮孔东坐标北坐标轴环理论深度标签探测器ID日期和时间检测\u位置检测日期和时间 64 16745.42 107390.32 2634.45 15.95 385656531 23-08-2018 2:39:34下午CV23 2018-09-08 14:18:17 61 16773.48 107382.6 2634.68 16.18 385760755 23-

我已经在我的数据集上编写了这两个groupby函数,第一个将我的数据分组,并将数据的日期时间分隔为开始日期时间和结束日期时间

这是数据集:

炮孔东坐标北坐标轴环理论深度标签探测器ID日期和时间检测\u位置检测日期和时间
64 16745.42 107390.32 2634.45 15.95 385656531 23-08-2018 2:39:34下午CV23 2018-09-08 14:18:17
61 16773.48 107382.6 2634.68 16.18 385760755 23-08-2018 2:38:32下午CV23 2018-09-08 14:24:19
63 16755.07 107387.68 2634.58 16.08 385262370 23-08-2018 2:39:30下午CV23 2018-09-08 14:12:42
105 16764.83 107347.67 2634.74 16.24 385742468 23-08-2018 2:41:29下午CV22 2018-09-06 20:02:46
100 16752.74 107360.32 2634.33 15.83 385112050 23-08-2018 2:41:08下午CV22 2018-09-06 20:15:42
99 16743.1 107362.96 2634.36 15.86 385087366 23-08-2018 2:41:05 PM CV22 2018-09-06 20:49:21
35 16747.75 107417.68 2635.9 17.4 385453358 23-08-2018 2:36:09下午CV22 2018-09-23 05:47:44
5 16757.27 107452.4 2636 17.5 385662254 23-08-2018 2:35:03下午CV22 2018-09-23 05:01:12

19 16770.89 107420.83 2634.81 16.31 385826979 23-08-2018 2:35:50 PM CV22 2018-09-23 05:52:54第一个想法是如果需要在
groupby
中使用不同的值-第一个
df21
使用
石斑鱼
,第二个仅使用
石斑鱼

df1['Date and Time'] = pd.to_datetime(df1['Date and Time'])
df1['Detection Date & Time'] = pd.to_datetime(df1['Detection Date & Time'])


df21 = (df1.groupby([pd.Grouper(key = 'Detection Date & Time', freq = 'H'),
                     df1.Detection_Location])
      ['Detection Date & Time'].agg(['first','last','size']))
#print (df21)


f = lambda x: ','.join(x.astype(str))
df22=(df1.groupby(pd.Grouper(key = 'Detection Date & Time', freq = 'H')).agg({
        'Blast Hole': f,
        'East Coordinate': f,
        'North Coordinate': f,
        'Tag Detector ID': f,
        'Detection_Location': 'min',
        'Detection Date & Time' : 'size'})
        .dropna()
        .rename(columns = {'Detection Date & Time' : 'Tags'})
        .set_index('Detection_Location', append=True))

#print (df22)

编辑:

如果需要按
Grouper
和列进行分组:

df1['Date and Time'] = pd.to_datetime(df1['Date and Time'])
df1['Detection Date & Time'] = pd.to_datetime(df1['Detection Date & Time'])


f = lambda x: ','.join(x.astype(str))
df2=(df1.groupby([pd.Grouper(key='Detection Date & Time',freq='H'),
                 df1.Detection_Location]).agg({
        'Blast Hole': f,
        'East Coordinate': f,
        'North Coordinate': f,
        'Tag Detector ID': f,
        'Detection Date & Time' : ['first','last','size']})
               .reset_index()
               .rename(columns = {'Detection Date & Time' : '', '<lambda>':''}))

df2.columns = df2.columns.map(''.join)
df2 = df2.rename(columns = {'' : 'Detection Date & Time'})
这可能对你有用(我从你前面的问题中知道你的数据是什么样子的) 您可以使用
agg(list)

然后,将另一个问题(从另一个问题到结果,这里是df2)合并如下

获得的输出如下所示

Detection_Date&Time     Detection_Location  first   last    size    Blast_Hole  East_Coordinate     North_Coordinate    Collar  Theoritical_Depth   Tag_Detector_ID     Date_and_Time
0   2018-09-08 14:00:00     CV23    2018-09-08 14:18:00     2018-09-08 14:12:00     3   [64, 61, 63]    [16745.42, 16773.48, 16755.07]  [107390.32, 107382.6, 107387.68]    [2634.45, 2634.68, 2634.58]     [15.95, 16.18, 16.08]   [385656531, 385760755, 385262370]   [23-08-2018 2:39:34 PM, 23-08-2018 2:38:32 PM,...
1   2018-09-06 20:00:00     CV22    2018-09-06 20:02:00     2018-09-06 20:49:00     3   [105, 100, 99]  [16764.83, 16752.74, 16743.1]   [107347.67, 107360.32, 107362.96]   [2634.74, 2634.33, 2634.36]     [16.24, 15.83, 15.86]   [385742468, 385112050, 385087366]   [23-08-2018 2:41:29 PM, 23-08-2018 2:41:08 PM,...
2   2018-09-23 05:00:00     CV22    2018-09-23 05:47:00     2018-09-23 05:52:00     3   [35, 5, 19]     [16747.75, 16757.27, 16770.89]  [107417.68, 107452.4, 107420.83]    [2635.9, 2636.0, 2634.81]   [17.4, 17.5, 16.31]     [385453358, 385662254, 385826979]   [23-08-2018 2:36:09 PM, 23-08-2018 2:35:03 PM,...

Hi@jezrael-这给了我这个错误:列重叠但没有指定后缀:Index(['Detection\u Location'],dtype='object')我认为重叠的列很少,我需要剪切几行代码。我会努力的。Thanks@ShrutiGaur-Try
df2=df21.join(df22,lsuffix='')
@jexrael-df22在加入df2Perfect后显示为空!!谢谢你,汤姆!在数据帧中,“检测日期和时间”和“检测位置”是相同的。它对我有用。请看前面的问题。在那里,有一些照片是人们所期待的。这就是为什么我提到我知道这些数据@莫哈尼-谢谢。虽然当我们创建第一个数据帧df3时,列检测位置已经存在,并且当我们将(list)传递给聚合函数时,它给出了以下错误:无法插入检测位置,已经存在exists@jezrael-是,此问题的数据和预期结果仅在此问题中。你是right@mohanys-您显示的输出非常完美,我将再次尝试代码。谢谢
print (df2)
  Detection Date & Time Detection_Location  Blast Hole  \
0   2018-09-06 20:00:00               CV22  105,100,99   
1   2018-09-08 14:00:00               CV23    64,61,63   
2   2018-09-23 05:00:00               CV22     35,5,19   

              East Coordinate               North Coordinate  \
0   16764.83,16752.74,16743.1  107347.67,107360.32,107362.96   
1  16745.42,16773.48,16755.07   107390.32,107382.6,107387.68   
2  16747.75,16757.27,16770.89   107417.68,107452.4,107420.83   

                 Tag Detector ID               first                last  size  
0  385742468,385112050,385087366 2018-09-06 20:02:46 2018-09-06 20:49:21     3  
1  385656531,385760755,385262370 2018-09-08 14:18:17 2018-09-08 14:12:42     3  
2  385453358,385662254,385826979 2018-09-23 05:47:44 2018-09-23 05:52:54     3  
df3=df.groupby([pd.Grouper(key = 'Detection_Date&Time', freq = 'H'),df.Detection_Location], sort=False).agg(list).reset_index()
df2 = (df.groupby([pd.Grouper(key = 'Detection_Date&Time', freq = 'H'),df.Detection_Location], sort=False)['Detection_Date&Time']
   .agg(['first','last','size'])).reset_index()

df4 = pd.merge(df2, df3, on=['Detection_Date&Time','Detection_Location'])
Detection_Date&Time     Detection_Location  first   last    size    Blast_Hole  East_Coordinate     North_Coordinate    Collar  Theoritical_Depth   Tag_Detector_ID     Date_and_Time
0   2018-09-08 14:00:00     CV23    2018-09-08 14:18:00     2018-09-08 14:12:00     3   [64, 61, 63]    [16745.42, 16773.48, 16755.07]  [107390.32, 107382.6, 107387.68]    [2634.45, 2634.68, 2634.58]     [15.95, 16.18, 16.08]   [385656531, 385760755, 385262370]   [23-08-2018 2:39:34 PM, 23-08-2018 2:38:32 PM,...
1   2018-09-06 20:00:00     CV22    2018-09-06 20:02:00     2018-09-06 20:49:00     3   [105, 100, 99]  [16764.83, 16752.74, 16743.1]   [107347.67, 107360.32, 107362.96]   [2634.74, 2634.33, 2634.36]     [16.24, 15.83, 15.86]   [385742468, 385112050, 385087366]   [23-08-2018 2:41:29 PM, 23-08-2018 2:41:08 PM,...
2   2018-09-23 05:00:00     CV22    2018-09-23 05:47:00     2018-09-23 05:52:00     3   [35, 5, 19]     [16747.75, 16757.27, 16770.89]  [107417.68, 107452.4, 107420.83]    [2635.9, 2636.0, 2634.81]   [17.4, 17.5, 16.31]     [385453358, 385662254, 385826979]   [23-08-2018 2:36:09 PM, 23-08-2018 2:35:03 PM,...