Python For循环：用原始值替换cat代码时优化代码速度_Python_Python 3.x_Pandas_Dataframe

Python For循环：用原始值替换cat代码时优化代码速度

python python-3.x pandas dataframe

Python For循环：用原始值替换cat代码时优化代码速度,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我想优化以下代码的速度： true_results_str = true_results.astype(str) for i in range( 0,len(true_results)): for k in range( 1,true_results.shape[1]): if pd.isna(true_results.iloc[i][k]) == False: true_results_str.iat[i,k]=df_data_

我想优化以下代码的速度：

 true_results_str = true_results.astype(str)
 for i in range( 0,len(true_results)): 
     for k in range( 1,true_results.shape[1]): 
         if pd.isna(true_results.iloc[i][k]) == False:
               true_results_str.iat[i,k]=df_data_.Type[df_.cat_code == mapping_item_id[true_results.iloc[i][k]]].item()

true_结果数据帧的示例如下所示：

      user   rec_2  rec_3  
0       16     nan    nan          
1       18      0      4          
2       51      3      0        
3       52      3     nan        
4       58      3      0

Key    Type   Size   Value
0      int64    1      13     
3      int64    1      14     
4      int64    1      15      
6      int64    1      16

    _Type              cat_code
    Car                   13
    Shirt                 14
    Tops                  15
    Shoes                 16

字典映射\u项\u id的示例如下所示：

      user   rec_2  rec_3  
0       16     nan    nan          
1       18      0      4          
2       51      3      0        
3       52      3     nan        
4       58      3      0

Key    Type   Size   Value
0      int64    1      13     
3      int64    1      14     
4      int64    1      15      
6      int64    1      16

    _Type              cat_code
    Car                   13
    Shirt                 14
    Tops                  15
    Shoes                 16

数据帧df_u的示例如下所示：

      user   rec_2  rec_3  
0       16     nan    nan          
1       18      0      4          
2       51      3      0        
3       52      3     nan        
4       58      3      0

Key    Type   Size   Value
0      int64    1      13     
3      int64    1      14     
4      int64    1      15      
6      int64    1      16

    _Type              cat_code
    Car                   13
    Shirt                 14
    Tops                  15
    Shoes                 16

数据帧替换为来自df_的cat_代码：

      user   rec_2  rec_3  
0       16     nan     nan         
1       18     13     16          
2       51     14     13        
3       52     14     nan        
4       58     14     13

最终真实结果数据帧应为：

      user   rec_2     rec_3  
0       16     nan      nan         
1       18     Car     Shoes          
2       51     Shirt    Car        
3       52     Shirt    nan        
4       58     Shirt    Car   


for test_results of length of 44550
Time Started at :  12262.5898183
Time Stopped at :  12317.2825541
Time Completed at :  54.692735799999355

您可以使用

mapping\u item\u id

和

df\u

创建最终词汇表，然后选择所有不带first by和lambda函数中的列，使用

Series

调用

进行映射：

在R中有一个类似的例子，我现在要测试一下。最初，上述代码在120万数据帧上花费了1HR45分钟。我将报告speed.s=mapping_item_id.set_index（'Key'）['Value'].map（df_.set_index（'cat_code'）['u Type']）AttributeError:'dict'对象没有属性'set_index'df_'是一个数据帧，mapping_item_id是一个字典。mapping_item_id.set_index（'Key'）['Value']AttributeError:'dict'对象没有属性'set_index'时间开始于：12955.066432时间停止于：12955.2114821时间完成于：0.14505009999349测试结果长度为44550