Python 重新组织数据帧
我在python数据框中有一些数据,如下所示:Python 重新组织数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我在python数据框中有一些数据,如下所示: Sample Signal 225 TGBb_0m-2 1.943295 226 TGBb_5m-2 4.659431 227 TGBb_15m-2 1.713407 228 TGBb_30m-2 2.524867 229 TGBb_45m-2 2.776531 230 TGBb_90m-2 2.196248 231 TGBb_0m-1 2.329916 232 TGBb_5m-1 1
Sample Signal
225 TGBb_0m-2 1.943295
226 TGBb_5m-2 4.659431
227 TGBb_15m-2 1.713407
228 TGBb_30m-2 2.524867
229 TGBb_45m-2 2.776531
230 TGBb_90m-2 2.196248
231 TGBb_0m-1 2.329916
232 TGBb_5m-1 1.916303
233 TGBb_15m-1 3.892828
234 TGBb_30m-1 2.380105
235 TGBb_45m-1 2.667500
236 TGBb_90m-1 2.377786
237 TGBb_0m-3 1.836953
238 TGBb_15m-3 2.208754
239 TGBb_30m-3 1.561843
240 TGBb_45m-3 2.613384
241 TGBb_90m-3 2.081838
df[['Time', 'Experiment']] = df['Sample'].str.extract(r'(.+)-(\d+)')
这里我有三个重复实验,每个实验有6个时间点,除了重复3只有5个时间点。我希望对这个数据帧进行重新排序,使其按时间点分组,而不是进行实验。我认为最好的方法是将大数据帧分割成更小的数据帧,由单个时间点的所有数据组成。有人知道我怎样才能做到这一点吗
例如,所需的输出可能如下所示:
Sample Signal
225 TGBb_0m-2 1.943295
231 TGBb_0m-1 2.329916
237 TGBb_0m-3 1.836953
Sample Signal
226 TGBb_5m-2 4.659431
232 TGBb_5m-1 1.916303 #missing third data point
227 TGBb_15m-2 1.713407
227 TGBb_15m-2 1.713407
238 TGBb_15m-3 2.208754
我认为通过命令dataframe.group_可以实现这一目标。然而,您可能需要稍微更改列以拆分实验的时间点和数量(例如,“样本”列中的“TGBb_0m-2”必须拆分为“TGBb_0m”,而“新列”列中的“2”必须拆分为“TGBb_0m”) 请注意,通过这种方式可以创建一种特定类型的数据帧,即groupby数据帧。因此,如果您想打印它,只需键入它的名称是行不通的,您必须使用:
df.head()
我认为通过命令dataframe.group_可以实现这一目标。然而,您可能需要稍微更改列以拆分实验的时间点和数量(例如,“样本”列中的“TGBb_0m-2”必须拆分为“TGBb_0m”,而“新列”列中的“2”必须拆分为“TGBb_0m”) 请注意,通过这种方式可以创建一种特定类型的数据帧,即groupby数据帧。因此,如果您想打印它,只需键入它的名称是行不通的,您必须使用:
df.head()
您的数据(出于可复制的目的):
由于您需要在Sample
列中按文本的一部分进行分组,因此我可以使用str.extract
如下:
Sample Signal
225 TGBb_0m-2 1.943295
226 TGBb_5m-2 4.659431
227 TGBb_15m-2 1.713407
228 TGBb_30m-2 2.524867
229 TGBb_45m-2 2.776531
230 TGBb_90m-2 2.196248
231 TGBb_0m-1 2.329916
232 TGBb_5m-1 1.916303
233 TGBb_15m-1 3.892828
234 TGBb_30m-1 2.380105
235 TGBb_45m-1 2.667500
236 TGBb_90m-1 2.377786
237 TGBb_0m-3 1.836953
238 TGBb_15m-3 2.208754
239 TGBb_30m-3 1.561843
240 TGBb_45m-3 2.613384
241 TGBb_90m-3 2.081838
df[['Time', 'Experiment']] = df['Sample'].str.extract(r'(.+)-(\d+)')
如果要在“时间点”相同的数据帧上执行操作,我会使用for
循环过滤初始df
,其中时间点相当于一个唯一的时间段
for time_period in df['Time'].unique():
df_group = df[df['Time'] == time_period][['Sample', 'Signal']]
print(df_group)
生成以下结果:
Sample Signal
225 TGBb_0m-2 1.943295
231 TGBb_0m-1 2.329916
237 TGBb_0m-3 1.836953
Sample Signal
226 TGBb_5m-2 4.659431
232 TGBb_5m-1 1.916303
Sample Signal
227 TGBb_15m-2 1.713407
233 TGBb_15m-1 3.892828
238 TGBb_15m-3 2.208754
Sample Signal
228 TGBb_30m-2 2.524867
234 TGBb_30m-1 2.380105
239 TGBb_30m-3 1.561843
Sample Signal
229 TGBb_45m-2 2.776531
235 TGBb_45m-1 2.667500
240 TGBb_45m-3 2.613384
Sample Signal
230 TGBb_90m-2 2.196248
236 TGBb_90m-1 2.377786
241 TGBb_90m-3 2.081838
Sample Signal
231 TGBb_0m-1 2.329916
237 TGBb_0m-3 1.836953
225 TGBb_0m-2 1.943295
233 TGBb_15m-1 3.892828
227 TGBb_15m-2 1.713407
238 TGBb_15m-3 2.208754
228 TGBb_30m-2 2.524867
234 TGBb_30m-1 2.380105
239 TGBb_30m-3 1.561843
229 TGBb_45m-2 2.776531
235 TGBb_45m-1 2.667500
240 TGBb_45m-3 2.613384
226 TGBb_5m-2 4.659431
232 TGBb_5m-1 1.916303
236 TGBb_90m-1 2.377786
230 TGBb_90m-2 2.196248
241 TGBb_90m-3 2.081838
如果您的目标是按照时间而不是实验对数据帧重新排序,那么在使用上面的str.extract
之后,您只需要使用df.sort_值('time')[[['Sample','Signal']]
即可获得以下结果:
Sample Signal
225 TGBb_0m-2 1.943295
231 TGBb_0m-1 2.329916
237 TGBb_0m-3 1.836953
Sample Signal
226 TGBb_5m-2 4.659431
232 TGBb_5m-1 1.916303
Sample Signal
227 TGBb_15m-2 1.713407
233 TGBb_15m-1 3.892828
238 TGBb_15m-3 2.208754
Sample Signal
228 TGBb_30m-2 2.524867
234 TGBb_30m-1 2.380105
239 TGBb_30m-3 1.561843
Sample Signal
229 TGBb_45m-2 2.776531
235 TGBb_45m-1 2.667500
240 TGBb_45m-3 2.613384
Sample Signal
230 TGBb_90m-2 2.196248
236 TGBb_90m-1 2.377786
241 TGBb_90m-3 2.081838
Sample Signal
231 TGBb_0m-1 2.329916
237 TGBb_0m-3 1.836953
225 TGBb_0m-2 1.943295
233 TGBb_15m-1 3.892828
227 TGBb_15m-2 1.713407
238 TGBb_15m-3 2.208754
228 TGBb_30m-2 2.524867
234 TGBb_30m-1 2.380105
239 TGBb_30m-3 1.561843
229 TGBb_45m-2 2.776531
235 TGBb_45m-1 2.667500
240 TGBb_45m-3 2.613384
226 TGBb_5m-2 4.659431
232 TGBb_5m-1 1.916303
236 TGBb_90m-1 2.377786
230 TGBb_90m-2 2.196248
241 TGBb_90m-3 2.081838
您的数据(出于可复制的目的):
由于您需要在Sample
列中按文本的一部分进行分组,因此我可以使用str.extract
如下:
Sample Signal
225 TGBb_0m-2 1.943295
226 TGBb_5m-2 4.659431
227 TGBb_15m-2 1.713407
228 TGBb_30m-2 2.524867
229 TGBb_45m-2 2.776531
230 TGBb_90m-2 2.196248
231 TGBb_0m-1 2.329916
232 TGBb_5m-1 1.916303
233 TGBb_15m-1 3.892828
234 TGBb_30m-1 2.380105
235 TGBb_45m-1 2.667500
236 TGBb_90m-1 2.377786
237 TGBb_0m-3 1.836953
238 TGBb_15m-3 2.208754
239 TGBb_30m-3 1.561843
240 TGBb_45m-3 2.613384
241 TGBb_90m-3 2.081838
df[['Time', 'Experiment']] = df['Sample'].str.extract(r'(.+)-(\d+)')
如果要在“时间点”相同的数据帧上执行操作,我会使用for
循环过滤初始df
,其中时间点相当于一个唯一的时间段
for time_period in df['Time'].unique():
df_group = df[df['Time'] == time_period][['Sample', 'Signal']]
print(df_group)
生成以下结果:
Sample Signal
225 TGBb_0m-2 1.943295
231 TGBb_0m-1 2.329916
237 TGBb_0m-3 1.836953
Sample Signal
226 TGBb_5m-2 4.659431
232 TGBb_5m-1 1.916303
Sample Signal
227 TGBb_15m-2 1.713407
233 TGBb_15m-1 3.892828
238 TGBb_15m-3 2.208754
Sample Signal
228 TGBb_30m-2 2.524867
234 TGBb_30m-1 2.380105
239 TGBb_30m-3 1.561843
Sample Signal
229 TGBb_45m-2 2.776531
235 TGBb_45m-1 2.667500
240 TGBb_45m-3 2.613384
Sample Signal
230 TGBb_90m-2 2.196248
236 TGBb_90m-1 2.377786
241 TGBb_90m-3 2.081838
Sample Signal
231 TGBb_0m-1 2.329916
237 TGBb_0m-3 1.836953
225 TGBb_0m-2 1.943295
233 TGBb_15m-1 3.892828
227 TGBb_15m-2 1.713407
238 TGBb_15m-3 2.208754
228 TGBb_30m-2 2.524867
234 TGBb_30m-1 2.380105
239 TGBb_30m-3 1.561843
229 TGBb_45m-2 2.776531
235 TGBb_45m-1 2.667500
240 TGBb_45m-3 2.613384
226 TGBb_5m-2 4.659431
232 TGBb_5m-1 1.916303
236 TGBb_90m-1 2.377786
230 TGBb_90m-2 2.196248
241 TGBb_90m-3 2.081838
如果您的目标是按照时间而不是实验对数据帧重新排序,那么在使用上面的str.extract
之后,您只需要使用df.sort_值('time')[[['Sample','Signal']]
即可获得以下结果:
Sample Signal
225 TGBb_0m-2 1.943295
231 TGBb_0m-1 2.329916
237 TGBb_0m-3 1.836953
Sample Signal
226 TGBb_5m-2 4.659431
232 TGBb_5m-1 1.916303
Sample Signal
227 TGBb_15m-2 1.713407
233 TGBb_15m-1 3.892828
238 TGBb_15m-3 2.208754
Sample Signal
228 TGBb_30m-2 2.524867
234 TGBb_30m-1 2.380105
239 TGBb_30m-3 1.561843
Sample Signal
229 TGBb_45m-2 2.776531
235 TGBb_45m-1 2.667500
240 TGBb_45m-3 2.613384
Sample Signal
230 TGBb_90m-2 2.196248
236 TGBb_90m-1 2.377786
241 TGBb_90m-3 2.081838
Sample Signal
231 TGBb_0m-1 2.329916
237 TGBb_0m-3 1.836953
225 TGBb_0m-2 1.943295
233 TGBb_15m-1 3.892828
227 TGBb_15m-2 1.713407
238 TGBb_15m-3 2.208754
228 TGBb_30m-2 2.524867
234 TGBb_30m-1 2.380105
239 TGBb_30m-3 1.561843
229 TGBb_45m-2 2.776531
235 TGBb_45m-1 2.667500
240 TGBb_45m-3 2.613384
226 TGBb_5m-2 4.659431
232 TGBb_5m-1 1.916303
236 TGBb_90m-1 2.377786
230 TGBb_90m-2 2.196248
241 TGBb_90m-3 2.081838