Python 如何在数据帧的一列中添加数据帧_Python_Pandas

Python 如何在数据帧的一列中添加数据帧

python pandas

Python 如何在数据帧的一列中添加数据帧,python,pandas,Python,Pandas,我正在创建一个数据框来存储样本的信息。我的一些列标签的格式为index:subindex。有更好的方法吗？我在看pd.MultiIndex，但我的子索引是特定于该索引的 import pandas as pd df = pd.DataFrame( np.random.random(size=(1234, 6)), columns=['ID', 'Charge:pH2', 'Charge:pH4', 'Charge:pH6', '

我正在创建一个数据框来存储样本的信息。我的一些列标签的格式为index:subindex。有更好的方法吗？我在看pd.MultiIndex，但我的子索引是特定于该索引的

import pandas as pd
df = pd.DataFrame(
    np.random.random(size=(1234, 6)),
    columns=['ID',
             'Charge:pH2', 'Charge:pH4', 'Charge:pH6',
             'Extinction:Wavelength200nm', 'Extinction:Wavelength500nm'])

我想能够调用

df.loc[：，'ID']

或

df.loc[：，'Charge']

或

df.loc[：，（'Charge'，pH6'）]

我认为最好的方法是创建不可能拆分列的索引或多索引（没有拆分器），然后使用expand=True>创建多索引
np.random.seed(2019)
df = pd.DataFrame(
    np.random.random(size=(3, 6)),
    columns=['ID',
             'Charge:pH2', 'Charge:pH4', 'Charge:pH6',
             'Extinction:Wavelength200nm', 'Extinction:Wavelength500nm'])

df = df.set_index('ID')
df.columns = df.columns.str.split(':', expand=True)
print (df)
            Charge                          Extinction                
               pH2       pH4       pH6 Wavelength200nm Wavelength500nm
ID                                                                    
0.903482  0.393081  0.623970  0.637877        0.880499        0.299172
0.702198  0.903206  0.881382  0.405750        0.452447        0.267070
0.162865  0.889215  0.148476  0.984723        0.032361        0.515351

索引中未设置ID
的解决方案是可能的，但对于未拆分的列名称，第二级可以获得NaN
：
df.columns = df.columns.str.split(':', expand=True)
print (df)
         ID    Charge                          Extinction                
        NaN       pH2       pH4       pH6 Wavelength200nm Wavelength500nm
0  0.903482  0.393081  0.623970  0.637877        0.880499        0.299172
1  0.702198  0.903206  0.881382  0.405750        0.452447        0.267070
2  0.162865  0.889215  0.148476  0.984723        0.032361        0.515351

最后按列名称选择，如果需要按第二级选择，也可以使用：
print (df['Charge'])
               pH2       pH4       pH6
ID                                    
0.903482  0.393081  0.623970  0.637877
0.702198  0.903206  0.881382  0.405750
0.162865  0.889215  0.148476  0.984723

print (df.xs('Charge', axis=1, level=0))
               pH2       pH4       pH6
ID                                    
0.903482  0.393081  0.623970  0.637877
0.702198  0.903206  0.881382  0.405750
0.162865  0.889215  0.148476  0.984723

print (df.xs('pH4', axis=1, level=1))
            Charge
ID                
0.903482  0.623970
0.702198  0.881382
0.162865  0.148476

您可以使用：
输出
         ID    Charge       ...            Extinction                
        NaN       pH2       ...       Wavelength200nm Wavelength500nm
0  0.301592  0.137384       ...              0.074137        0.339948
1  0.737711  0.557524       ...              0.813727        0.586845
2  0.615398  0.529687       ...              0.148700        0.466916
3  0.411509  0.725513       ...              0.380019        0.876992
4  0.031172  0.623944       ...              0.311610        0.488207
5  0.022140  0.450630       ...              0.422927        0.479094
6  0.119681  0.221624       ...              0.710848        0.719201
7  0.252039  0.632321       ...              0.453235        0.952687
8  0.379501  0.356493       ...              0.141977        0.028836
9  0.249950  0.316020       ...              0.307337        0.881437

[10 rows x 6 columns]

(1234, 1)
(1234, 3)
(1234,)

所有必需的索引方案都有效：
print(df.loc[:, 'ID'].shape)
print(df.loc[:, 'Charge'].shape)
print(df.loc[:, ('Charge', 'pH6')].shape)

输出
         ID    Charge       ...            Extinction                
        NaN       pH2       ...       Wavelength200nm Wavelength500nm
0  0.301592  0.137384       ...              0.074137        0.339948
1  0.737711  0.557524       ...              0.813727        0.586845
2  0.615398  0.529687       ...              0.148700        0.466916
3  0.411509  0.725513       ...              0.380019        0.876992
4  0.031172  0.623944       ...              0.311610        0.488207
5  0.022140  0.450630       ...              0.422927        0.479094
6  0.119681  0.221624       ...              0.710848        0.719201
7  0.252039  0.632321       ...              0.453235        0.952687
8  0.379501  0.356493       ...              0.141977        0.028836
9  0.249950  0.316020       ...              0.307337        0.881437

[10 rows x 6 columns]

(1234, 1)
(1234, 3)
(1234,)