Python 重新组织数据帧

Python 重新组织数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我在python数据框中有一些数据,如下所示: Sample Signal 225 TGBb_0m-2 1.943295 226 TGBb_5m-2 4.659431 227 TGBb_15m-2 1.713407 228 TGBb_30m-2 2.524867 229 TGBb_45m-2 2.776531 230 TGBb_90m-2 2.196248 231 TGBb_0m-1 2.329916 232 TGBb_5m-1 1

我在python数据框中有一些数据,如下所示:

         Sample    Signal
225   TGBb_0m-2  1.943295
226   TGBb_5m-2  4.659431
227  TGBb_15m-2  1.713407
228  TGBb_30m-2  2.524867
229  TGBb_45m-2  2.776531
230  TGBb_90m-2  2.196248
231   TGBb_0m-1  2.329916
232   TGBb_5m-1  1.916303
233  TGBb_15m-1  3.892828
234  TGBb_30m-1  2.380105
235  TGBb_45m-1  2.667500
236  TGBb_90m-1  2.377786
237   TGBb_0m-3  1.836953
238  TGBb_15m-3  2.208754
239  TGBb_30m-3  1.561843
240  TGBb_45m-3  2.613384
241  TGBb_90m-3  2.081838
df[['Time', 'Experiment']] = df['Sample'].str.extract(r'(.+)-(\d+)')
这里我有三个重复实验,每个实验有6个时间点,除了重复3只有5个时间点。我希望对这个数据帧进行重新排序,使其按时间点分组,而不是进行实验。我认为最好的方法是将大数据帧分割成更小的数据帧,由单个时间点的所有数据组成。有人知道我怎样才能做到这一点吗

例如,所需的输出可能如下所示:

         Sample    Signal
225   TGBb_0m-2  1.943295
231   TGBb_0m-1  2.329916
237   TGBb_0m-3  1.836953


         Sample    Signal
226   TGBb_5m-2  4.659431
232   TGBb_5m-1  1.916303    #missing third data point


227  TGBb_15m-2  1.713407
227  TGBb_15m-2  1.713407
238  TGBb_15m-3  2.208754

我认为通过命令dataframe.group_可以实现这一目标。然而,您可能需要稍微更改列以拆分实验的时间点和数量(例如,“样本”列中的“TGBb_0m-2”必须拆分为“TGBb_0m”,而“新列”列中的“2”必须拆分为“TGBb_0m”)

请注意,通过这种方式可以创建一种特定类型的数据帧,即groupby数据帧。因此,如果您想打印它,只需键入它的名称是行不通的,您必须使用:

df.head()

我认为通过命令dataframe.group_可以实现这一目标。然而,您可能需要稍微更改列以拆分实验的时间点和数量(例如,“样本”列中的“TGBb_0m-2”必须拆分为“TGBb_0m”,而“新列”列中的“2”必须拆分为“TGBb_0m”)

请注意,通过这种方式可以创建一种特定类型的数据帧,即groupby数据帧。因此,如果您想打印它,只需键入它的名称是行不通的,您必须使用:

df.head()
您的数据(出于可复制的目的):

由于您需要在
Sample
列中按文本的一部分进行分组,因此我可以使用
str.extract
如下:

         Sample    Signal
225   TGBb_0m-2  1.943295
226   TGBb_5m-2  4.659431
227  TGBb_15m-2  1.713407
228  TGBb_30m-2  2.524867
229  TGBb_45m-2  2.776531
230  TGBb_90m-2  2.196248
231   TGBb_0m-1  2.329916
232   TGBb_5m-1  1.916303
233  TGBb_15m-1  3.892828
234  TGBb_30m-1  2.380105
235  TGBb_45m-1  2.667500
236  TGBb_90m-1  2.377786
237   TGBb_0m-3  1.836953
238  TGBb_15m-3  2.208754
239  TGBb_30m-3  1.561843
240  TGBb_45m-3  2.613384
241  TGBb_90m-3  2.081838
df[['Time', 'Experiment']] = df['Sample'].str.extract(r'(.+)-(\d+)')
如果要在“时间点”相同的数据帧上执行操作,我会使用
for
循环过滤初始
df
,其中时间点相当于一个唯一的时间段

for time_period in df['Time'].unique():
  df_group = df[df['Time'] == time_period][['Sample', 'Signal']]
  print(df_group)
生成以下结果:

        Sample    Signal
225  TGBb_0m-2  1.943295
231  TGBb_0m-1  2.329916
237  TGBb_0m-3  1.836953
        Sample    Signal
226  TGBb_5m-2  4.659431
232  TGBb_5m-1  1.916303
         Sample    Signal
227  TGBb_15m-2  1.713407
233  TGBb_15m-1  3.892828
238  TGBb_15m-3  2.208754
         Sample    Signal
228  TGBb_30m-2  2.524867
234  TGBb_30m-1  2.380105
239  TGBb_30m-3  1.561843
         Sample    Signal
229  TGBb_45m-2  2.776531
235  TGBb_45m-1  2.667500
240  TGBb_45m-3  2.613384
         Sample    Signal
230  TGBb_90m-2  2.196248
236  TGBb_90m-1  2.377786
241  TGBb_90m-3  2.081838
         Sample    Signal
231   TGBb_0m-1  2.329916
237   TGBb_0m-3  1.836953
225   TGBb_0m-2  1.943295
233  TGBb_15m-1  3.892828
227  TGBb_15m-2  1.713407
238  TGBb_15m-3  2.208754
228  TGBb_30m-2  2.524867
234  TGBb_30m-1  2.380105
239  TGBb_30m-3  1.561843
229  TGBb_45m-2  2.776531
235  TGBb_45m-1  2.667500
240  TGBb_45m-3  2.613384
226   TGBb_5m-2  4.659431
232   TGBb_5m-1  1.916303
236  TGBb_90m-1  2.377786
230  TGBb_90m-2  2.196248
241  TGBb_90m-3  2.081838
如果您的目标是按照时间而不是实验对数据帧重新排序,那么在使用上面的
str.extract
之后,您只需要使用
df.sort_值('time')[[['Sample','Signal']]
即可获得以下结果:

        Sample    Signal
225  TGBb_0m-2  1.943295
231  TGBb_0m-1  2.329916
237  TGBb_0m-3  1.836953
        Sample    Signal
226  TGBb_5m-2  4.659431
232  TGBb_5m-1  1.916303
         Sample    Signal
227  TGBb_15m-2  1.713407
233  TGBb_15m-1  3.892828
238  TGBb_15m-3  2.208754
         Sample    Signal
228  TGBb_30m-2  2.524867
234  TGBb_30m-1  2.380105
239  TGBb_30m-3  1.561843
         Sample    Signal
229  TGBb_45m-2  2.776531
235  TGBb_45m-1  2.667500
240  TGBb_45m-3  2.613384
         Sample    Signal
230  TGBb_90m-2  2.196248
236  TGBb_90m-1  2.377786
241  TGBb_90m-3  2.081838
         Sample    Signal
231   TGBb_0m-1  2.329916
237   TGBb_0m-3  1.836953
225   TGBb_0m-2  1.943295
233  TGBb_15m-1  3.892828
227  TGBb_15m-2  1.713407
238  TGBb_15m-3  2.208754
228  TGBb_30m-2  2.524867
234  TGBb_30m-1  2.380105
239  TGBb_30m-3  1.561843
229  TGBb_45m-2  2.776531
235  TGBb_45m-1  2.667500
240  TGBb_45m-3  2.613384
226   TGBb_5m-2  4.659431
232   TGBb_5m-1  1.916303
236  TGBb_90m-1  2.377786
230  TGBb_90m-2  2.196248
241  TGBb_90m-3  2.081838
您的数据(出于可复制的目的):

由于您需要在
Sample
列中按文本的一部分进行分组,因此我可以使用
str.extract
如下:

         Sample    Signal
225   TGBb_0m-2  1.943295
226   TGBb_5m-2  4.659431
227  TGBb_15m-2  1.713407
228  TGBb_30m-2  2.524867
229  TGBb_45m-2  2.776531
230  TGBb_90m-2  2.196248
231   TGBb_0m-1  2.329916
232   TGBb_5m-1  1.916303
233  TGBb_15m-1  3.892828
234  TGBb_30m-1  2.380105
235  TGBb_45m-1  2.667500
236  TGBb_90m-1  2.377786
237   TGBb_0m-3  1.836953
238  TGBb_15m-3  2.208754
239  TGBb_30m-3  1.561843
240  TGBb_45m-3  2.613384
241  TGBb_90m-3  2.081838
df[['Time', 'Experiment']] = df['Sample'].str.extract(r'(.+)-(\d+)')
如果要在“时间点”相同的数据帧上执行操作,我会使用
for
循环过滤初始
df
,其中时间点相当于一个唯一的时间段

for time_period in df['Time'].unique():
  df_group = df[df['Time'] == time_period][['Sample', 'Signal']]
  print(df_group)
生成以下结果:

        Sample    Signal
225  TGBb_0m-2  1.943295
231  TGBb_0m-1  2.329916
237  TGBb_0m-3  1.836953
        Sample    Signal
226  TGBb_5m-2  4.659431
232  TGBb_5m-1  1.916303
         Sample    Signal
227  TGBb_15m-2  1.713407
233  TGBb_15m-1  3.892828
238  TGBb_15m-3  2.208754
         Sample    Signal
228  TGBb_30m-2  2.524867
234  TGBb_30m-1  2.380105
239  TGBb_30m-3  1.561843
         Sample    Signal
229  TGBb_45m-2  2.776531
235  TGBb_45m-1  2.667500
240  TGBb_45m-3  2.613384
         Sample    Signal
230  TGBb_90m-2  2.196248
236  TGBb_90m-1  2.377786
241  TGBb_90m-3  2.081838
         Sample    Signal
231   TGBb_0m-1  2.329916
237   TGBb_0m-3  1.836953
225   TGBb_0m-2  1.943295
233  TGBb_15m-1  3.892828
227  TGBb_15m-2  1.713407
238  TGBb_15m-3  2.208754
228  TGBb_30m-2  2.524867
234  TGBb_30m-1  2.380105
239  TGBb_30m-3  1.561843
229  TGBb_45m-2  2.776531
235  TGBb_45m-1  2.667500
240  TGBb_45m-3  2.613384
226   TGBb_5m-2  4.659431
232   TGBb_5m-1  1.916303
236  TGBb_90m-1  2.377786
230  TGBb_90m-2  2.196248
241  TGBb_90m-3  2.081838
如果您的目标是按照时间而不是实验对数据帧重新排序,那么在使用上面的
str.extract
之后,您只需要使用
df.sort_值('time')[[['Sample','Signal']]
即可获得以下结果:

        Sample    Signal
225  TGBb_0m-2  1.943295
231  TGBb_0m-1  2.329916
237  TGBb_0m-3  1.836953
        Sample    Signal
226  TGBb_5m-2  4.659431
232  TGBb_5m-1  1.916303
         Sample    Signal
227  TGBb_15m-2  1.713407
233  TGBb_15m-1  3.892828
238  TGBb_15m-3  2.208754
         Sample    Signal
228  TGBb_30m-2  2.524867
234  TGBb_30m-1  2.380105
239  TGBb_30m-3  1.561843
         Sample    Signal
229  TGBb_45m-2  2.776531
235  TGBb_45m-1  2.667500
240  TGBb_45m-3  2.613384
         Sample    Signal
230  TGBb_90m-2  2.196248
236  TGBb_90m-1  2.377786
241  TGBb_90m-3  2.081838
         Sample    Signal
231   TGBb_0m-1  2.329916
237   TGBb_0m-3  1.836953
225   TGBb_0m-2  1.943295
233  TGBb_15m-1  3.892828
227  TGBb_15m-2  1.713407
238  TGBb_15m-3  2.208754
228  TGBb_30m-2  2.524867
234  TGBb_30m-1  2.380105
239  TGBb_30m-3  1.561843
229  TGBb_45m-2  2.776531
235  TGBb_45m-1  2.667500
240  TGBb_45m-3  2.613384
226   TGBb_5m-2  4.659431
232   TGBb_5m-1  1.916303
236  TGBb_90m-1  2.377786
230  TGBb_90m-2  2.196248
241  TGBb_90m-3  2.081838