Python 如何将行数堆叠到一行并分配id_Python_Pandas_Dataframe_Stack_Pandas Groupby

Python 如何将行数堆叠到一行并分配id

python pandas dataframe

Python 如何将行数堆叠到一行并分配id,python,pandas,dataframe,stack,pandas-groupby,Python,Pandas,Dataframe,Stack,Pandas Groupby,我有这样一个数据框： band mean raster 1 894.343482 D:/Python/Copied/selection/20170219_095504.tif 2 1159.282304 D:/Python/Copied/selection/20170219_095504.tif 3 1342.291595 D:/Python/Copied/selection/20170219_095504.tif 4 3056.809463 D:/Python/C

我有这样一个数据框：

band    mean    raster
1   894.343482  D:/Python/Copied/selection/20170219_095504.tif
2   1159.282304 D:/Python/Copied/selection/20170219_095504.tif
3   1342.291595 D:/Python/Copied/selection/20170219_095504.tif
4   3056.809463 D:/Python/Copied/selection/20170219_095504.tif
1   516.9624071 D:/Python/Copied/selection/20170325_095551.tif
2   720.1932533 D:/Python/Copied/selection/20170325_095551.tif
3   689.6287879 D:/Python/Copied/selection/20170325_095551.tif
4   4561.576329 D:/Python/Copied/selection/20170325_095551.tif
1   566.2016867 D:/Python/Copied/selection/20170527_095700.tif
2   812.9927101 D:/Python/Copied/selection/20170527_095700.tif
3   760.4621212 D:/Python/Copied/selection/20170527_095700.tif
4   5009.537164 D:/Python/Copied/selection/20170527_095700.tif

我想将其格式化为：

band1_mean  band2_mean  band3_mean  band4_mean  raster_name         id
894.343482  1159.282304 1342.291595 3056.809463 20170219_095504.tif 1
516.9624071 720.1932533 689.6287879 4561.576329 20170325_095551.tif 2
566.2016867 812.9927101 760.4621212 5009.537164 20170527_095700.tif 3

所有4个标注栏都属于一个光栅，因此值必须全部位于一行中。我不知道如何在没有每个光栅的密钥id的情况下堆叠它们。谢谢

使用

df.pivot（“光栅”、“波段”、“平均值”）

您将获得

band                          1            2            3            4
raster                                                                
20170219_095504.tif  894.343482  1159.282304  1342.291595  3056.809463
20170325_095551.tif  516.962407   720.193253   689.628788  4561.576329
20170527_095700.tif  566.201687   812.992710   760.462121  5009.537164

这是透视的一种情况：

# extract the raster name:
df['raster_name'] = df.raster.str.extract('(\d+_\d+\.tif)')

# pivot
new_df = df.pivot(index='raster_name', columns='band', values='mean')

# rename the columns:
new_df.columns = [f'band{i}_mean' for i in new_df.columns]

输出：

                     band1_mean   band2_mean   band3_mean   band4_mean
raster_name                                                           
20170219_095504.tif  894.343482  1159.282304  1342.291595  3056.809463
20170325_095551.tif  516.962407   720.193253   689.628788  4561.576329
20170527_095700.tif  566.201687   812.992710   760.462121  5009.537164

如果希望

光栅名称

成为一个普通列，您可以在

new\u df

上

reset\u index

。

谢谢。太酷了。我得到第一行的Nan值。我认为str.extract（“（\d+\ud+\.tif）”的格式不正确。该部分extract

digits\u digits.tif

。因此，如果某个文件名不遵循该模式，它将返回

NaN

。您可以使用其他方式替换该零件，例如，通过

拆分。使用df['raster\u name']=df.raster\u name.str.split（'/'）.str[4]解决该问题。我原来的路线长了一点。谢谢你的大力帮助！：）谢谢你的帮助：）