如何在Python中高效地搜索和访问数据帧中的某些单元格？_Python_Pandas

如何在Python中高效地搜索和访问数据帧中的某些单元格？

python pandas

如何在Python中高效地搜索和访问数据帧中的某些单元格？,python,pandas,Python,Pandas,我有以下数据帧结构 clientID month savings 0 10 2 15 1 20 2 2 2 30 2 10 3 40 2 5 4 50 2 7 5 60 2 9 6 10 3 10 7

我有以下数据帧结构

      clientID  month  savings
0         10      2       15
1         20      2        2
2         30      2       10
3         40      2        5
4         50      2        7
5         60      2        9
6         10      3       10
7         20      3       10
8         30      3       11
9         10      4       13
10        30      4       15
11        40      4       16

我想把它转换成一个新的数据框架，如下所示

    clientID   2     3     4
 0     10     15    10    13
 1     20      2    10  NULL
 2     30     10    11    15
 3     40      5  NULL    16
 4     50      7  NULL  NULL
 5     60      9  NULL  NULL

我确实解决了这个问题，但是使用了一个非常非python的代码，我想这就是为什么运行速度如此之慢的原因（我的初始数据帧有200多万行）。这是我的代码：

 df = pd.DataFrame({'clientID': [10, 20, 30, 40, 50, 60, 10, 20, 30, 10, 30, 40], 
                   'month': [2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4],
                   'savings': [15, 2, 10, 5, 7, 9, 10, 10, 11, 13, 15, 16]})
 myDF = pd.DataFrame(df['clientID'].unique(), columns = ['User'])
 columnsToAdd = df['month'].unique()
 for col in columnsToAdd:
     columnName = str(col)
     myDF[columnName] = 'NULL'
     #indexes in df for which month = col
     idxMonth = df[df['month']==col].index.tolist()
     print(columnName, '\n')
     #User IDs for which month = col
     idxlabel = df['clientID'].loc[idxMonth]

     for i in np.arange(0, len(idxlabel)):
          zidx = myDF[myDF['User'] == idxlabel.iloc[i]].index.tolist()        
          myDF[columnName].loc[zidx] = df['savings'].iloc[idxlabel.index[i]]

您能建议一种有效的方法来解决这个问题吗？

以下是

pivot\u table

PS中的解决方案：使用

.reset\u index（）

选择1

pd.pivot_table(df,values='savings',index=['clientID'],columns=['month'],aggfunc='sum')
Out[429]: 
month        2     3     4
clientID                  
10        15.0  10.0  13.0
20         2.0  10.0   NaN
30        10.0  11.0  15.0
40         5.0   NaN  16.0
50         7.0   NaN   NaN
60         9.0   NaN   NaN

选择2 使用

unstack

df.set_index(['clientID','month']).unstack(-1)
Out[432]: 
         savings            
month          2     3     4
clientID                    
10          15.0  10.0  13.0
20           2.0  10.0   NaN
30          10.0  11.0  15.0
40           5.0   NaN  16.0
50           7.0   NaN   NaN
60           9.0   NaN   NaN

顺便说一句，您应该标记

pandas

选项1可以是

df.pivot（index='clientID'，columns='month'）

@Abdou，以防他在一个月内多次保存一个客户~：），但您是对的~：）非常感谢！你帮我节省了很多时间！我不知道“接受”按钮，这是我第一次在这里提问。@Ame.Lia np，现在你知道这个网站是如何工作的了。快乐编码：）