Python 重塑数据帧
假设我有以下数据框,其中我有一些变量的计数,在2个不同的季节,2个不同的年份,3个不同的位置。数据目前的结构是,每一行都是季节/地点组合,每年都有统计列。它看起来像这样:Python 重塑数据帧,python,pandas,Python,Pandas,假设我有以下数据框,其中我有一些变量的计数,在2个不同的季节,2个不同的年份,3个不同的位置。数据目前的结构是,每一行都是季节/地点组合,每年都有统计列。它看起来像这样: >>> df=pd.DataFrame([['Summer', 'A', 1, 2], ['Winter', 'A', 3, 4], ['Summer', 'B', 5, 6], ['
>>> df=pd.DataFrame([['Summer', 'A', 1, 2],
['Winter', 'A', 3, 4],
['Summer', 'B', 5, 6],
['Winter', 'B', 7, 8],
['Summer', 'C', 9, 10],
['Winter', 'C', 11, 12]],
columns=['Season', 'Location', 'Count_2014', 'Count_2015'])
>>> df
Season Location Count_2014 Count_2015
0 Summer A 1 2
1 Winter A 3 4
2 Summer B 5 6
3 Winter B 7 8
4 Summer C 9 10
5 Winter C 11 12
我想重新构造数据,以便为每个季节、位置和年份组合创建一行(这意味着我将有2 x 3 x 2=12行)。我目前的做法肯定不是最有效的(见下文)。关于重构此数据集的最佳方法有何建议
df.set_index(['Season', 'Location'], inplace=True)
ListOfDFs = []
for Year in [x[-4:] for x in df.columns]:
SubD = df[['Count_' + Year]]
SubD.columns = ['Count']
SubD['Year'] = Year
SubD.set_index('Year', append=True, inplace=True)
ListOfDFs.append(SubD)
df2=pd.concat(ListOfDFs)
>>> df2
Count
Season Location Year
Summer A 2014 1
Winter A 2014 3
Summer B 2014 5
Winter B 2014 7
Summer C 2014 9
Winter C 2014 11
Summer A 2015 2
Winter A 2015 4
Summer B 2015 6
Winter B 2015 8
Summer C 2015 10
Winter C 2015 12
您正在寻找,这将允许您在一行中完成这项工作:
df_new = pd.melt(df,id_vars=['Season', 'Location'], value_vars=['Count_2014', 'Count_2015'],
var_name='Year',
value_name='Count')
然后您可以使用apply
(或者可能有更好的方法)来获得上面的输出:
df_new['Year'] = df_new['Year'].apply(lambda x: x[-4:])
输出:
Season Location Year Count
0 Summer A 2014 1
1 Winter A 2014 3
2 Summer B 2014 5
3 Winter B 2014 7
4 Summer C 2014 9
5 Winter C 2014 11
6 Summer A 2015 2
7 Winter A 2015 4
8 Summer B 2015 6
9 Winter B 2015 8
10 Summer C 2015 10
11 Winter C 2015 12
您正在寻找,这将允许您在一行中完成这项工作:
df_new = pd.melt(df,id_vars=['Season', 'Location'], value_vars=['Count_2014', 'Count_2015'],
var_name='Year',
value_name='Count')
然后您可以使用apply
(或者可能有更好的方法)来获得上面的输出:
df_new['Year'] = df_new['Year'].apply(lambda x: x[-4:])
输出:
Season Location Year Count
0 Summer A 2014 1
1 Winter A 2014 3
2 Summer B 2014 5
3 Winter B 2014 7
4 Summer C 2014 9
5 Winter C 2014 11
6 Summer A 2015 2
7 Winter A 2015 4
8 Summer B 2015 6
9 Winter B 2015 8
10 Summer C 2015 10
11 Winter C 2015 12
作为另一个选项,看起来stack()也能完成任务:
>>> df=pd.DataFrame([['Summer','A',1,2],['Winter','A',3,4],['Summer','B',5,6],['Winter','B',7,8],['Summer','C',9,10],['Winter','C',11,12]], columns=['Season','Location','Count_2014','Count_2015'])
>>>
>>> df.set_index(['Season','Location'], inplace=True)
>>> df.columns=pd.MultiIndex.from_tuples([(col[-4:],col[:-5]) for col in df.columns], names=['Year','Count'])
>>> df=df.stack(level=0)
>>> df
Count Count
Season Location Year
Summer A 2014 1
2015 2
Winter A 2014 3
2015 4
Summer B 2014 5
2015 6
Winter B 2014 7
2015 8
Summer C 2014 9
2015 10
Winter C 2014 11
2015 12
>>>
作为另一个选项,看起来stack()也能完成任务:
>>> df=pd.DataFrame([['Summer','A',1,2],['Winter','A',3,4],['Summer','B',5,6],['Winter','B',7,8],['Summer','C',9,10],['Winter','C',11,12]], columns=['Season','Location','Count_2014','Count_2015'])
>>>
>>> df.set_index(['Season','Location'], inplace=True)
>>> df.columns=pd.MultiIndex.from_tuples([(col[-4:],col[:-5]) for col in df.columns], names=['Year','Count'])
>>> df=df.stack(level=0)
>>> df
Count Count
Season Location Year
Summer A 2014 1
2015 2
Winter A 2014 3
2015 4
Summer B 2014 5
2015 6
Winter B 2014 7
2015 8
Summer C 2014 9
2015 10
Winter C 2014 11
2015 12
>>>
您可以使用
str
methods:df\u new['Year']=df\u new['Year']=apply(lambda x:x[-4:])代替strstrdf\u new['Year']=df\u new['Year']。