Python 基于列值重塑数据帧的形状_Python_Pandas_Dataframe_Reshape

Python 基于列值重塑数据帧的形状

python pandas dataframe

Python 基于列值重塑数据帧的形状,python,pandas,dataframe,reshape,Python,Pandas,Dataframe,Reshape,我想根据特定列中的值来重塑数据帧，以便为起始数据帧中的每个值列对获得一个新列。我想从中得到： import pandas as pd d = {'city': ['Berlin', 'Berlin', 'Berlin', 'London', 'London', 'London'], 'weather': ['sunny', 'sunny', 'cloudy','sunny', 'cloudy', 'cloudy'], 'temp': [20,22,19, 21, 18, 17]} d

我想根据特定列中的值来重塑数据帧，以便为起始数据帧中的每个值列对获得一个新列。我想从中得到：

import pandas as pd

d = {'city': ['Berlin', 'Berlin', 'Berlin', 'London', 'London', 'London'],
     'weather': ['sunny', 'sunny', 'cloudy','sunny', 'cloudy', 'cloudy'], 'temp': [20,22,19, 21, 18, 17]}
df = pd.DataFrame(data=d)
df

    city    weather temp
0   Berlin  sunny   20
1   Berlin  sunny   22
2   Berlin  cloudy  19
3   London  sunny   21
4   London  cloudy  18
5   London  cloudy  17

为此：

d_2 = {'Berlin_weather': ['sunny', 'sunny', 'cloudy'], 'Berlin_temp': [20,22,19],
     'London_weather': ['sunny', 'cloudy', 'cloudy'], 'London_temp': [21, 18, 17]}
df_2 = pd.DataFrame(data=d_2)
df_2

    Berlin_weather  Berlin_temp London_weather  London_temp
0   sunny           20          sunny           21
1   sunny           22          cloudy          18
2   cloudy          19          cloudy          17

我已尝试使用.unstack（），但无法使其正常工作。循环是显而易见的，但是我的实际数据集的大小使得这有点不可行。

让我们创建一个新索引，然后使用

取消堆栈
df1 = df.set_index([df['city'],df.groupby('city').cumcount()]).drop('city',1).unstack(0)

然后展平多索引列
df1.columns = [f'{y}_{x}' for x,y in df1.columns]


如果顺序很重要，我们可以在展平列之前使用pd.CategoricalIndex

cati = pd.CategoricalIndex(df1.columns.get_level_values(0).unique(),
                    ['weather','temp'],
                    ordered=True)

df1.columns = df1.columns.set_levels(cati, level=0)

df1 = df1.sort_index(1,1) # level = 1 and axis = 1 -- columns.
df1.columns = [f'{y}_{x}' for x,y in df1.columns]


  Berlin_weather  Berlin_temp London_weather  London_temp
0          sunny           20          sunny           21
1          sunny           22         cloudy           18
2         cloudy           19         cloudy           17

cati = pd.CategoricalIndex(df1.columns.get_level_values(0).unique(),
                    ['weather','temp'],
                    ordered=True)

df1.columns = df1.columns.set_levels(cati, level=0)

df1 = df1.sort_index(1,1) # level = 1 and axis = 1 -- columns.
df1.columns = [f'{y}_{x}' for x,y in df1.columns]


  Berlin_weather  Berlin_temp London_weather  London_temp
0          sunny           20          sunny           21
1          sunny           22         cloudy           18
2         cloudy           19         cloudy           17