Python 3.x 对多索引数据帧进行排序
我有一个数据框Python 3.x 对多索引数据帧进行排序,python-3.x,sorting,pandas,dataframe,Python 3.x,Sorting,Pandas,Dataframe,我有一个数据框census_df,其中包含美国的县和州数据。数据框使用嵌套索引,第一个索引表示一个州,第二个索引表示一个县,第三个索引表示一个人口。下面给出了数据帧 CENSUS2010POP STNAME CTYNAME Alabama Alabama 4779736 Autauga County
census_df
,其中包含美国的县和州数据。数据框使用嵌套索引,第一个索引表示一个州,第二个索引表示一个县,第三个索引表示一个人口。下面给出了数据帧
CENSUS2010POP
STNAME CTYNAME
Alabama Alabama 4779736
Autauga County 54571
Baldwin County 182265
Barbour County 27457
Bibb County 22915
Blount County 57322
Bullock County 10914
Butler County 20947
Calhoun County 118572
Chambers County 34215
Cherokee County 25989
Chilton County 43643
Choctaw County 13859
Clarke County 25833
Clay County 13932
Cleburne County 14972
Coffee County 49948
Colbert County 54428
Conecuh County 13228
Coosa County 11539
Covington County 37765
Crenshaw County 13906
Cullman County 80406
Dale County 50251
Dallas County 43820
DeKalb County 71109
Elmore County 79303
Escambia County 38319
Etowah County 104430
Fayette County 17241
... ...
Wisconsin Washington County 131887
Waukesha County 389891
Waupaca County 52410
Waushara County 24496
Winnebago County 166994
Wood County 74749
Wyoming Wyoming 563626
Albany County 36299
Big Horn County 11668
Campbell County 46133
Carbon County 15885
Converse County 13833
Crook County 7083
Fremont County 40123
Goshen County 13249
Hot Springs County 4812
Johnson County 8569
Laramie County 91738
Lincoln County 18106
Natrona County 75450
Niobrara County 2484
Park County 28205
Platte County 8667
Sheridan County 29116
Sublette County 10247
Sweetwater County 43806
Teton County 21294
Uinta County 21118
Washakie County 8533
Weston County 7208
现在,我想根据population列的值对给定状态的dataframe的第二个索引进行排序。我试着用
census_df = census_df.sort('CENSUS2010POP')
但是,我会对所有值进行排序:
CENSUS2010POP
STNAME CTYNAME
Texas Loving County 82
Hawaii Kalawao County 90
Texas King County 286
Kenedy County 416
Nebraska Arthur County 460
如何根据每个州的人口对县进行排序?非常感谢您的帮助 我认为您需要先执行STNAME
的第一级操作,然后对CENSUS2010POP
列应用函数。因此,STNAME
保持静态,只按此列的值对第二级进行排序:
print (census_df.groupby(level=0)['CENSUS2010POP']
.apply(lambda x: x.sort_values())
.reset_index(level=0,drop=True))
STNAME CTYNAME
Alabama Bullock County 10914
Choctaw County 13859
Butler County 20947
Bibb County 22915
Clarke County 25833
Cherokee County 25989
Barbour County 27457
Chambers County 34215
Chilton County 43643
Autauga County 54571
Blount County 57322
Calhoun County 118572
Baldwin County 182265
Alabama 4779736
Wisconsin Waushara County 24496
Waupaca County 52410
Wood County 74749
Washington County 131887
Winnebago County 166994
Waukesha County 389891
Wyoming Crook County 7083
Big Horn County 11668
Goshen County 13249
Converse County 13833
Carbon County 15885
Albany County 36299
Fremont County 40123
Campbell County 46133
Wyoming 563626
Name: CENSUS2010POP, dtype: int64
census\u-df.sort\u index(level='CITYNAME')
请参阅文档:或census\u-df.sort\u index(level='CITYNAME',升序=False)
如果您需要从最高到最高排序smallest@EdChum:不幸的是,这没有给我想要的。我不想更改主索引。我只想根据人口对每个州内的县进行排序,人口以列的形式给出。@EdChum:它只按字母顺序对县名进行排序。相反,它们应该根据其人口进行分类。对不起,我误解了,排序后再分组怎么样census\u df=census\u df.sort('CENSUS2010POP')
与之前一样,然后是census\u df.groupby(级别=[0,1])
@EdChum:我想我无法正确地传达它。那么让我们问一下德克萨斯州人口最多的五个县是什么,内华达州也是如此,等等。因此,即使一个县的人口比其他州的县要少,如果该县在该州人口最多,它也应该作为该州的第一个条目显示。是的。这是对的。你能解释一下你的命令吗?好的,等一下。