Python 从多索引列中选择
我已经阅读了其他关于从多索引列中选择的文章,但这对我的场景没有帮助 我有如下示例列:Python 从多索引列中选择,python,pandas,Python,Pandas,我已经阅读了其他关于从多索引列中选择的文章,但这对我的场景没有帮助 我有如下示例列: ('', 'X', 'Name'), ('', 'Y', 'Name'), ('S1', 'X', 'OVERALL TOTALS OF ALL SUBJECTS'), ('S1', 'X', 'OVERALL PERCENTAGES OF ALL SUBJECTS'), ('S2', 'Y', 'OVERALL TOTALS OF ALL SUBJECTS'), ('S2', 'Y', 'OVER
('', 'X', 'Name'),
('', 'Y', 'Name'),
('S1', 'X', 'OVERALL TOTALS OF ALL SUBJECTS'),
('S1', 'X', 'OVERALL PERCENTAGES OF ALL SUBJECTS'),
('S2', 'Y', 'OVERALL TOTALS OF ALL SUBJECTS'),
('S2', 'Y', 'OVERALL PERCENTAGES OF ALL SUBJECTS')
# create the dataframe
cols = [('', 'X', 'Name'),
('', 'Y', 'Name'),
('S1', 'X', 'OVERALL TOTALS OF ALL SUBJECTS'),
('S1', 'X', 'OVERALL PERCENTAGES OF ALL SUBJECTS'),
('S2', 'Y', 'OVERALL TOTALS OF ALL SUBJECTS'),
('S2', 'Y', 'OVERALL PERCENTAGES OF ALL SUBJECTS')]
df = pd.DataFrame(
np.random.choice(['FOO', 'BAR', 'BAZ'], (3, len(cols))),
columns=pd.MultiIndex.from_tuples(cols))
# 1. Rename the column
df.columns = pd.MultiIndex.from_tuples([
x if x != ('', 'Y', 'Name') else ('', 'Y', 'Name2')
for x in df.columns])
# 2. Set value
df['', 'X', 'Name'] = df['', 'X', 'Name'].str.title()
# Print (transposed for readability)
df.T
df.get_level_values(2)
和df.set_levels(newlist,level=2)
尝试使用新的值列表更改列的名称,但我得到了错误“level values must unique”。我还尝试了df.rename(columns={(''Y','Name'):(''Y','Name2')},inplace=True)
,但什么也没发生df.rename(columns={'Name':'Name2'},level=2,inplace=True)
重命名两个“Name”,因此这不起作用
当我尝试df[[('''X','Name')].str.lower()
,我得到错误“'DataFrame'对象没有属性'str'”。不知道为什么会这样。此列(“”,'Y','Name')是所有名称,字符串也是。另外,df[[(''Y','Name')]].dtypes
返回“object”。我怀疑错误再次是由于我选择列的方式造成的
我有多个文件要读取,因此每个文件的列名可能不同。所有级别都没有唯一的名称。但是,所有3个级别的组合始终是唯一的
我需要在2级中的一个非常具体的专栏上工作。因此,我想要一种可靠地选择一个这样的列的方法,使用类似df[('a','B','C')]的东西来重命名该列,并将其行值转换为小写。我该怎么做呢?您必须重新创建多索引,如下所示:
('', 'X', 'Name'),
('', 'Y', 'Name'),
('S1', 'X', 'OVERALL TOTALS OF ALL SUBJECTS'),
('S1', 'X', 'OVERALL PERCENTAGES OF ALL SUBJECTS'),
('S2', 'Y', 'OVERALL TOTALS OF ALL SUBJECTS'),
('S2', 'Y', 'OVERALL PERCENTAGES OF ALL SUBJECTS')
# create the dataframe
cols = [('', 'X', 'Name'),
('', 'Y', 'Name'),
('S1', 'X', 'OVERALL TOTALS OF ALL SUBJECTS'),
('S1', 'X', 'OVERALL PERCENTAGES OF ALL SUBJECTS'),
('S2', 'Y', 'OVERALL TOTALS OF ALL SUBJECTS'),
('S2', 'Y', 'OVERALL PERCENTAGES OF ALL SUBJECTS')]
df = pd.DataFrame(
np.random.choice(['FOO', 'BAR', 'BAZ'], (3, len(cols))),
columns=pd.MultiIndex.from_tuples(cols))
# 1. Rename the column
df.columns = pd.MultiIndex.from_tuples([
x if x != ('', 'Y', 'Name') else ('', 'Y', 'Name2')
for x in df.columns])
# 2. Set value
df['', 'X', 'Name'] = df['', 'X', 'Name'].str.title()
# Print (transposed for readability)
df.T
输出:
0 1 2
X Name Bar Foo Baz
Y Name2 BAR BAR BAZ
S1 X OVERALL TOTALS OF ALL SUBJECTS BAR FOO BAZ
OVERALL PERCENTAGES OF ALL SUBJECTS BAZ BAR BAR
S2 Y OVERALL TOTALS OF ALL SUBJECTS BAZ BAZ FOO
OVERALL PERCENTAGES OF ALL SUBJECTS BAZ BAZ FOO