Python 在单个数据帧中将结果合并为多个列_Python_Pandas

Python 在单个数据帧中将结果合并为多个列

python pandas

Python 在单个数据帧中将结果合并为多个列,python,pandas,Python,Pandas,摘要假设您将applyafunction应用于groupby对象，以便df.groupby（…）中的每个g.apply都为每个g提供一个序列/数据帧。如何将这些结果合并到单个数据帧中，但将组名作为列详细信息我有一个数据帧事件_df，如下所示： index event note time 0 on C 0.5 1 on D 0.75 2 off C 1.0 ... 我想为每个注释创建一

摘要

假设您将

apply

function

应用于groupby对象，以便

df.groupby（…）

中的每个

g.apply

都为每个

提供一个序列/数据帧。如何将这些结果合并到单个数据帧中，但将组名作为列

详细信息

我有一个数据帧

事件_df

，如下所示：

index   event   note   time
0       on      C      0.5
1       on      D      0.75
2       off     C      1.0
...

我想为每个

注释

创建一个

事件

的采样，采样在

t_df

给定的时间进行：

index    t
0        0
1        0.5
2        1.0
...

这样我就能得到这样的东西

t        C         D        
0        off       off
0.5      on        off
1.0      off       on
...

到目前为止我所做的：

def get_t_note_series(notedata_row, t_arr):
   """Return the time index in the sampling that corresponds to the event."""
   t_idx = np.argwhere(t_arr >= notedata_row['time']).flatten()[0]
   return t_idx

def get_t_for_gb(group, **kwargs):
   t_idxs = group.apply(get_t_note_series, args=(t_arr,), axis=1)
   t_idxs.rename('t_arr_idx', inplace=True)
   group_with_t = pd.concat([group, t_idxs], axis=1).set_index('t_arr_idx')
   print(group_with_t)
   return group_with_t


t_arr = np.arange(0,10,0.5)
t_df = pd.DataFrame({'t': t_arr}).rename_axis('t_arr_idx')
gb = event_df.groupby('note')
gb.apply(get_t_for_gb, **kwargs)

所以我得到的是每个音符的一些数据帧，都是相同大小的（与t_df相同）：

我如何从这里转到我想要的数据帧，每组对应于一个新数据帧中的一列，索引是

？

编辑：对不起，我没有考虑到下面的问题，您重新缩放了

时间

列，现在无法提供完整的解决方案，因为我必须离开，但我认为，您可以通过使用pandas.merge\u asof和您的两个数据帧进行重新缩放，以获得最近的“重新缩放”时间，从合并的数据帧中，您可以尝试下面的代码。我希望这就是你想要的

import pandas as pd
import io 

sio= io.StringIO("""index   event   note   time
0       on      C      0.5
1       on      D      0.75
2       off     C      1.0""")
df= pd.read_csv(sio, sep='\s+', index_col=0)

df.groupby(['time', 'note']).agg({'event': 'first'}).unstack(-1).fillna('off')

通过

agg（{'event'：'first'}）

获取每个时间注释组中的第一行，然后使用

note

-索引列并对其进行转置，使

note

值成为列。然后在最后填充所有单元格，对于这些单元格，

fillna

无法找到带有“off”的数据点

这将产生：

Out[28]: 
     event     
note     C    D
time           
0.50    on  off
0.75   off   on
1.00   off  off

您可能还希望尝试

min

或

max

，以防时间/注释组合的开/关不是明确的（如果同一时间/注释中有更多行，其中一些行打开，一些行关闭），并且您更喜欢这些值中的一个（例如，如果有一个打开，那么无论有多少个关闭，您都希望打开等等）。如果您想要市长投票，我建议在聚合数据框中添加市长投票列（在

取消堆栈（）之前）。
哦，我找到了！我所要做的就是unstack
groupby结果。返回到生成groupby结果：
def get_t_note_series(notedata_row, t_arr):
   """Return the time index in the sampling that corresponds to the event."""
   t_idx = np.argwhere(t_arr >= notedata_row['time']).flatten()[0]
   return t_idx

def get_t_for_gb(group, **kwargs):
   t_idxs = group.apply(get_t_note_series, args=(t_arr,), axis=1)
   t_idxs.rename('t_arr_idx', inplace=True)
   group_with_t = pd.concat([group, t_idxs], axis=1).set_index('t_arr_idx')
   ## print(group_with_t) ## unnecessary!
   return group_with_t


t_arr = np.arange(0,10,0.5)
t_df = pd.DataFrame({'t': t_arr}).rename_axis('t_arr_idx')
gb = event_df.groupby('note')
result = gb.apply(get_t_for_gb, **kwargs)

此时，result
是一个以note
作为索引的数据帧：
>> print(result)

          event
note  t
C     0    off
      0.5  on
      1.0  off
....
D     0    off
      0.5  off
      1.0  on
....

执行result=result.unstack（'note'）
可以达到以下目的：
>> result = result.unstack('note')
>> print(result)

         event
note     C      D
t
0        off    off
0.5      on     on
1.0      off    off
....
D     0    off
      0.5  off
      1.0  on
....

我想我已经得到了我的原始帖子中提到的所有我想要的专栏。唯一的问题是我不知道如何将所有结果作为多个列放在一个数据帧中。一种选择是对gb中的n，g进行运算。groups:g.apply（…）
，然后创建列列表以进行连接，但我想知道是否有一种更简单的方法不需要对组进行迭代。总之，我找到了问题的答案，但这仍然是一个很好的答案，说明了如何从原始数据帧到类似的最终结果。谢谢！更好的result=result['event']。取消堆栈（0）
>> result = result.unstack('note')
>> print(result)

         event
note     C      D
t
0        off    off
0.5      on     on
1.0      off    off
....
D     0    off
      0.5  off
      1.0  on
....