Python 根据特定条件创建新行，并在列表中迭代_Python_Pandas

Python 根据特定条件创建新行，并在列表中迭代

python pandas

Python 根据特定条件创建新行，并在列表中迭代,python,pandas,Python,Pandas,我有一个df，如下所示 B_ID No_Show Session slot_num Cumulative_no_show 1 0.4 S1 1 0.4 2 0.3 S1 2 0.7 3 0.8 S1 3 1.5 4 0.3 S1 4 1.

我有一个df，如下所示

B_ID   No_Show   Session  slot_num  Cumulative_no_show
    1     0.4       S1        1       0.4   
    2     0.3       S1        2       0.7      
    3     0.8       S1        3       1.5        
    4     0.3       S1        4       1.8       
    5     0.6       S1        5       2.4         
    6     0.8       S1        6       3.2       
    7     0.9       S1        7       4.1        
    8     0.4       S1        8       4.5   
    9     0.6       S1        9       5.1     
    12    0.9       S2        1       0.9    
    13    0.5       S2        2       1.4       
    14    0.3       S2        3       1.7        
    15    0.7       S2        4       2.4         
    20    0.7       S2        5       3.1          
    16    0.6       S2        6       3.7       
    17    0.8       S2        7       4.5        
    19    0.3       S2        8       4.8

创建上述df的代码如下所示

import pandas as pd
import numpy as np
df = pd.DataFrame({'B_ID': [1,2,3,4,5,6,7,8,9,12,13,14,15,20,16,17,19],
                   'No_Show': [0.4,0.3,0.8,0.3,0.6,0.8,0.9,0.4,0.6,0.9,0.5,0.3,0.7,0.7,0.6,0.8,0.3],
                   'Session': ['s1','s1','s1','s1','s1','s1','s1','s1','s1','s2','s2','s2','s2','s2','s2','s2','s2'],
                   'slot_num': [1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8],
                   })
df['Cumulative_no_show'] = df.groupby(['Session'])['No_Show'].cumsum()

还有一个名为walkin_no_show=[0.3,0.4,0.3,0.4,0.3,0.4等等，长度为1000]

当u_累积>0.8时，从上面创建一个新行，该行正好位于

 df[No_Show] = walkin_no_show[i]

它的Session和slot_num应该与上一个相同，并通过从上一个列中减去（1-walkin_no_show[i]）来创建一个名为u_cumulative的新列

预期产出：

B_ID   No_Show   Session  slot_num  Cumulative_no_show    u_cumulative
    1     0.4       S1        1       0.4                 0.4
    2     0.3       S1        2       0.7                 0.7
    3     0.8       S1        3       1.5                 1.5
walkin1   0.3       S1        3       1.5                 0.8
    4     0.3       S1        4       1.8                 1.1      
walkin2   0.4       S1        4       1.8                 0.5
    5     0.6       S1        5       2.4                 1.1    
walkin3   0.3       S1        5       2.4                 0.4
    6     0.8       S1        6       3.2                 1.2      
walkin4   0.4       S1        6       3.2                 0.6
    7     0.9       S1        7       4.1                 1.5               
walkin5   0.3       S1        7       4.1                 0.8   
    8     0.4       S1        8       4.5                 1.2
walkin6   0.4       S1        8       4.5                 0.6
    9     0.6       S1        9       5.1                 1.2
    12    0.9       S2        1       0.9                 0.9
walkin1   0.3       S2        1       0.9                 0.2
    13    0.5       S2        2       1.4                 0.7           
    14    0.3       S2        3       1.7                 1.0
walkin2   0.4       S2        3       1.7                 0.4
    15    0.7       S2        4       2.4                 1.1
walkin3   0.3       S2        4       2.4                 0.4      
    20    0.7       S2        5       3.1                 1.1
walkin4   0.4       S2        5       3.1                 0.5       
    16    0.6       S2        6       3.7                 1.1
walkin5   0.3       S2        6       3.7                 0.4                    
    17    0.8       S2        7       4.5                 1.2
walkin6   0.4       S2        7       4.5                 0.6       
    19    0.3       S2        8       4.8                 0.9

我尝试了下面的代码小编辑。正如@Ben.T在下面的回答中提到的我的问题

谢谢@Ben.T。完全归功于你

def create_u_columns (ser):
    l_index = []
    arr_ns = ser.to_numpy()
    # array for latter insert
    arr_idx = np.zeros(len(ser), dtype=int)
    walkin_id = 1
    for i in range(len(arr_ns)-1):
        if arr_ns[i]>0.8:
            # remove 1 to u_no_show
            arr_ns[i+1:] -= (1-walkin_no_show[arr_idx])
            # increment later idx to add
            arr_idx[i] = walkin_id
            walkin_id +=1
    #return a dataframe with both columns
    return pd.DataFrame({'u_cumulative': arr_ns, 'mask_idx':arr_idx}, index=ser.index)

df[['u_cumulative', 'mask_idx']]= df.groupby(['Session']['Cumulative_no_show'].apply(create_u_columns)


# select the rows
df_toAdd = df.loc[df['mask_idx'].astype(bool), :].copy()
# replace the values as wanted
df_toAdd['No_Show'] = walkin_no_show[mask_idx]
df_toAdd['B_ID'] = 'walkin'+df_toAdd['mask_idx'].astype(str)
df_toAdd['u_cumulative'] -= 1
# add 0.5 to index for later sort
df_toAdd.index += 0.5 

new_df_0.8 = pd.concat([df,df_toAdd]).sort_index()\
           .reset_index(drop=True).drop('mask_idx', axis=1)

此外，我还想列举一个清单。我们可以改变（Alrns[i]＞0.8）[ 0.8，0.9，1 ]，并创建3 DF，如NexYdf0.0.8，NexYdFy0.9和NexYdf1.1.0/P>< P>，唯一需要考虑的窍门是增加索引值的方式。以下是一个解决方案：

步行表演=[0.3,0.4,0.3,0.4,0.3]

df = pd.DataFrame({'B_ID': [1,2,3,4,5],
                   'No_Show': [0.1,0.1,0.3,0.5,0.6],
                   'Session': ['s1','s1','s1','s2','s2'],
                   'slot_num': [1,2,3,1,2],
                   'Cumulative_no_show': [1.5, 0.4, 1.6, 0.3, 1.9]
                   })
df = df[['B_ID', 'No_Show', 'Session', 'slot_num', 'Cumulative_no_show']]
df['u_cumulative'] = df['Cumulative_no_show']

print(df.head())

输出：

   B_ID  No_Show Session  slot_num  Cumulative_no_show  u_cumulative
0     1      0.1      s1         1                 1.5           1.5
1     2      0.1      s1         2                 0.4           0.4
2     3      0.3      s1         3                 1.6           1.6
3     4      0.5      s2         1                 0.3           0.3
4     5      0.6      s2         2                 1.9           1.9

然后：

输出：

      B_ID  No_Show Session  slot_num  Cumulative_no_show  u_cumulative
0        1      0.1      s1         1                 1.5           1.5
1  walkin1      0.3      s1         1                 1.5           0.7
2        2      0.1      s1         2                 0.4           0.4
3        3      0.3      s1         3                 1.6           1.6
4  walkin2      0.3      s1         3                 1.6           0.7
5        4      0.5      s2         1                 0.3           0.3
6        5      0.6      s2         2                 1.9           1.9
7  walkin3      0.3      s2         2                 1.9           0.7

我希望有帮助

从导入的已用函数：

IIUC，可以通过以下方式执行：

def create_u_columns (ser, threshold_ns = 0.8):

    arr_ns = ser.to_numpy()
    # array for latter insert
    arr_idx = np.zeros(len(ser), dtype=int)
    walkin_id = 0 #start at 0 not 1 for list indexing
    for i in range(len(arr_ns)-1):
        if arr_ns[i]>threshold_ns:
            # remove 1 to u_no_show
            arr_ns[i+1:] -= (1-walkin_no_show[walkin_id]) #this is slightly different
            # increment later idx to add
            arr_idx[i] = walkin_id+1
            walkin_id +=1
    #return a dataframe with both columns
    return pd.DataFrame({'u_cumulative': arr_ns, 'mask_idx':arr_idx}, index=ser.index)

#create empty dict for storing the dataframes
d_dfs = {}
#iterate over the value for the threshold
for th_ns in [0.8, 0.9, 1.0]:
    #create a copy and do the same kind of operation
    df_ = df.copy()
    df_[['u_cumulative', 'mask_idx']]= \
        df_.groupby(['Session'])['Cumulative_no_show']\
           .apply(lambda x: create_u_columns(x, threshold_ns=th_ns))

    # select the rows
    df_toAdd = df_.loc[df_['mask_idx'].astype(bool), :].copy()
    # replace the values as wanted
    df_toAdd['No_Show'] = np.array(walkin_no_show)[df_toAdd.groupby('Session').cumcount()] 
    df_toAdd['B_ID'] = 'walkin'+df_toAdd['mask_idx'].astype(str)
    df_toAdd['u_cumulative'] -= (1 - df_toAdd['No_Show'])
    # add 0.5 to index for later sort
    df_toAdd.index += 0.5 

    d_dfs[th_ns] = pd.concat([df_,df_toAdd]).sort_index()\
                       .reset_index(drop=True).drop('mask_idx', axis=1)

然后，如果希望访问数据帧，可以执行以下操作，例如：

for th, df_ in d_dfs.items():
    print (th)
    print (df_.head(4))

我想你误解了我的逻辑。。你能用我同样的输入和复制同样的期望吗output@Danish，现在就试试吧！你能分享你的成果吗。。我已经添加了创建输入数据集的代码。。我得到的输出与预期略有不同。下面给出了创建输入数据的代码。数据框架（{B'U ID）6,6,6,7,7,8,8,9,9,12,12,13,14,14,15,15,15,15,20,20,20,16,16,16,16,15,15,15,15,20,16,16,16,17,17,17,17,17,17,17,19]，“没有任何展示的展示”作为作为作为pd进口大熊猫作为作为pd作为作为pd进口大熊猫作为作为pd作为pd进口，作为pd作为作为作为pd的进口熊猫作为作为pd的np输入，作为作为NPD的NPD的np输入，数据框架。数据框架。数据框架。数据框架。数据框架。数据框架。数据框架（数据框架（数据框架（数据框架（{数据框架（{数据框架）数据框架）数据框架（（（{数据框架）数据框架（{[1.数据框架：[1.数据框架）数据框架（{，[1，作为NPP作为作为作为np，'s2'，'s2'，'s2'，'s2'，'s2'，'s2'，'s2']是的，你是对的。。这是我的错。。对不起。。但是我得到的输出是u_累计列上的NaN值。。请分享你的成果。如果你有时间，请研究下面的问题。与上述问题类似，但变化较小。。上述问题是否复杂？这对我来说很难解决。

for th, df_ in d_dfs.items():
    print (th)
    print (df_.head(4))