Pandas 如何强制应用以返回父数据帧的所有列？_Pandas_Pandas Groupby_Pandas Apply

Pandas 如何强制应用以返回父数据帧的所有列？

pandas

Pandas 如何强制应用以返回父数据帧的所有列？,pandas,pandas-groupby,pandas-apply,Pandas,Pandas Groupby,Pandas Apply,在数据帧的某些列上使用groupby并随后使用apply测试另一列中是否存在字符串后，pandas仅返回按分组的列以及使用apply创建的最后一列。是否可以返回与groupby和test关联的所有列？例如，通过会话线程的唯一标识符进行分组，并测试字符串是否存在于另一列中，但随后是否包括存在于数据帧中但属于特定组的其他列我尝试过使用groupby，然后使用apply作为匿名函数 df.head() shipment_id shipper_id courier_id Question

在数据帧的某些列上使用groupby并随后使用apply测试另一列中是否存在字符串后，pandas仅返回按分组的列以及使用apply创建的最后一列。是否可以返回与groupby和test关联的所有列？例如，通过会话线程的唯一标识符进行分组，并测试字符串是否存在于另一列中，但随后是否包括存在于数据帧中但属于特定组的其他列

我尝试过使用groupby，然后使用apply作为匿名函数

df.head()

 shipment_id shipper_id courier_id  Question                                sender
0   14      9962    228898  Let's get your furbabys home Apple pet transpo...   courier
1   91919   190872  196838  Hi I'm kevin thims and I'm happy to do the job...   courier
2   92187   191128  196838  Hi I'm kevin thims and I'm happy to do the job...   shipper

unique_thread_indentifier = ['shipment_id', 'shipper_id', 'courier_id']
required_variables = ['shipment_id', 'shipper_id', 'courier_id', 'Question', 'sender']

df_new = (
    df
    .groupby(unique_thread_indentifier)[required_variables]
    .apply(lambda group: 'shipper' in group['sender'].unique())
    .to_frame(name='shipper_replied')
    .reset_index()
)

df_new.head()
    shipment_id shipper_id  courier_id  shipper_replied
0   14      9962            228898          False
1   91919   190872          196838          False
2   92187   191128          196838          True

我的目标是将

Question

和

sender

列包含在最后的数据帧中。预期输出如下所示：

 shipment_id shipper_id courier_id  Question                                sender        shipper_replied
0   14      9962    228898  Let's get your furbabys home Apple pet transpo...   courier       False
1   91919   190872  196838  Hi I'm kevin thims and I'm happy to do the job...   courier       False
2   92187   191128  196838  Hi I'm kevin thims and I'm happy to do the job...   shipper       True

我相信你需要：

另一个解决方案：

df['shipper_replied'] = (df.assign(new = df['sender'].eq('shipper'))
                           .groupby(unique_thread_indentifier)['new']
                           .transform('any'))

print (df)
   shipment_id  shipper_id  courier_id  \
0           14        9962      228898   
1        91919      190872      196838   
2        92187      191128      196838   

                                          Question   sender  shipper_replied  
0  Let's get your furbabys home Apple pet transpo.  courier            False  
1   Hi I'm kevin thims and I'm happy to do the job  courier            False  
2   Hi I'm kevin thims and I'm happy to do the job  shipper             True

您可以将预期输出添加到问题中吗？这不是我想要的-我希望将列

question

包含到结果数据框中，如预期的那样output@homeStayProg-你现在能查一下吗？

df['shipper_replied'] = (df.assign(new = df['sender'].eq('shipper'))
                           .groupby(unique_thread_indentifier)['new']
                           .transform('any'))

print (df)
   shipment_id  shipper_id  courier_id  \
0           14        9962      228898   
1        91919      190872      196838   
2        92187      191128      196838   

                                          Question   sender  shipper_replied  
0  Let's get your furbabys home Apple pet transpo.  courier            False  
1   Hi I'm kevin thims and I'm happy to do the job  courier            False  
2   Hi I'm kevin thims and I'm happy to do the job  shipper             True