Python 设置索引排序特定列
正在尝试以特定格式准备此数据Python 设置索引排序特定列,python,pandas,Python,Pandas,正在尝试以特定格式准备此数据 import pandas as pd voting = pd.read_json("GE2000.json") voting.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True) print(voting) 然后返回 vote county_fips candidate_name
import pandas as pd
voting = pd.read_json("GE2000.json")
voting.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True)
print(voting)
然后返回
vote
county_fips candidate_name pty vote_pct
2000 Howard Phillips CS 0 596
John Hagelin NL 0 919
Harry Browne LB 1 2636
George W. Bush R 59 167398
Al Gore D 28 79004
1001 Howard Phillips I 0 9
John Hagelin I 0 5
Harry Browne LB 0 51
George W. Bush R 70 11993
Al Gore D 29 4942
在这之后,我想对vote\u pct进行排序并获取最大值,就像这样(我尝试了排序\u值、排序\u索引等,但无法得到所需的输出)
下面是示例数据
[
{
"office" : "PRESIDENT",
"county_name" : "Alaska",
"vote_pct" : "0",
"county_fips" : "2000",
"pty" : "CS",
"candidate_name" : "Howard Phillips",
},
{
"office" : "PRESIDENT",
"county_name" : "Alaska",
"vote_pct" : "0",
"county_fips" : "2000",
"pty" : "NL",
"candidate_name" : "John Hagelin",
}
]
该数据继续您可以使用
groupby
例如voting.groupby('country\u fips')['candidate\u name'].max()
这里还有更详细的答案:
在执行
设置索引之前,您可以使用groupby
和apply
获得每个索引的最大值,然后再设置索引。这允许您在列上使用groupby
,而不是在索引上使用(这很奇怪):
你能提供原始数据的样本吗?@juanpa.arrivillaga updated,thk YOUYAY比我的答案更好;)
[
{
"office" : "PRESIDENT",
"county_name" : "Alaska",
"vote_pct" : "0",
"county_fips" : "2000",
"pty" : "CS",
"candidate_name" : "Howard Phillips",
},
{
"office" : "PRESIDENT",
"county_name" : "Alaska",
"vote_pct" : "0",
"county_fips" : "2000",
"pty" : "NL",
"candidate_name" : "John Hagelin",
}
]
voting = pd.read_json("GE2000.json")
get_largest_vote_pct = lambda row: row[row.vote_pct == row.vote_pct.max()]
largest = voting.groupby('county_fips').apply(get_largest_vote_pct)
largest.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True)
print(largest)
vote
county_fips candidate_name pty vote_pct
1001 George W. Bush R 70 11993
2000 George W. Bush R 59 167398