Python 根据特定条件创建新行,并在列表中迭代
我有一个df,如下所示Python 根据特定条件创建新行,并在列表中迭代,python,pandas,Python,Pandas,我有一个df,如下所示 B_ID No_Show Session slot_num Cumulative_no_show 1 0.4 S1 1 0.4 2 0.3 S1 2 0.7 3 0.8 S1 3 1.5 4 0.3 S1 4 1.
B_ID No_Show Session slot_num Cumulative_no_show
1 0.4 S1 1 0.4
2 0.3 S1 2 0.7
3 0.8 S1 3 1.5
4 0.3 S1 4 1.8
5 0.6 S1 5 2.4
6 0.8 S1 6 3.2
7 0.9 S1 7 4.1
8 0.4 S1 8 4.5
9 0.6 S1 9 5.1
12 0.9 S2 1 0.9
13 0.5 S2 2 1.4
14 0.3 S2 3 1.7
15 0.7 S2 4 2.4
20 0.7 S2 5 3.1
16 0.6 S2 6 3.7
17 0.8 S2 7 4.5
19 0.3 S2 8 4.8
创建上述df的代码如下所示
import pandas as pd
import numpy as np
df = pd.DataFrame({'B_ID': [1,2,3,4,5,6,7,8,9,12,13,14,15,20,16,17,19],
'No_Show': [0.4,0.3,0.8,0.3,0.6,0.8,0.9,0.4,0.6,0.9,0.5,0.3,0.7,0.7,0.6,0.8,0.3],
'Session': ['s1','s1','s1','s1','s1','s1','s1','s1','s1','s2','s2','s2','s2','s2','s2','s2','s2'],
'slot_num': [1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8],
})
df['Cumulative_no_show'] = df.groupby(['Session'])['No_Show'].cumsum()
还有一个名为walkin_no_show=[0.3,0.4,0.3,0.4,0.3,0.4等等,长度为1000]
当u_累积>0.8时,从上面创建一个新行,该行正好位于
df[No_Show] = walkin_no_show[i]
它的Session和slot_num应该与上一个相同,并通过从上一个列中减去(1-walkin_no_show[i])来创建一个名为u_cumulative的新列
预期产出:
B_ID No_Show Session slot_num Cumulative_no_show u_cumulative
1 0.4 S1 1 0.4 0.4
2 0.3 S1 2 0.7 0.7
3 0.8 S1 3 1.5 1.5
walkin1 0.3 S1 3 1.5 0.8
4 0.3 S1 4 1.8 1.1
walkin2 0.4 S1 4 1.8 0.5
5 0.6 S1 5 2.4 1.1
walkin3 0.3 S1 5 2.4 0.4
6 0.8 S1 6 3.2 1.2
walkin4 0.4 S1 6 3.2 0.6
7 0.9 S1 7 4.1 1.5
walkin5 0.3 S1 7 4.1 0.8
8 0.4 S1 8 4.5 1.2
walkin6 0.4 S1 8 4.5 0.6
9 0.6 S1 9 5.1 1.2
12 0.9 S2 1 0.9 0.9
walkin1 0.3 S2 1 0.9 0.2
13 0.5 S2 2 1.4 0.7
14 0.3 S2 3 1.7 1.0
walkin2 0.4 S2 3 1.7 0.4
15 0.7 S2 4 2.4 1.1
walkin3 0.3 S2 4 2.4 0.4
20 0.7 S2 5 3.1 1.1
walkin4 0.4 S2 5 3.1 0.5
16 0.6 S2 6 3.7 1.1
walkin5 0.3 S2 6 3.7 0.4
17 0.8 S2 7 4.5 1.2
walkin6 0.4 S2 7 4.5 0.6
19 0.3 S2 8 4.8 0.9
我尝试了下面的代码小编辑。正如@Ben.T在下面的回答中提到的我的问题
谢谢@Ben.T。完全归功于你
def create_u_columns (ser):
l_index = []
arr_ns = ser.to_numpy()
# array for latter insert
arr_idx = np.zeros(len(ser), dtype=int)
walkin_id = 1
for i in range(len(arr_ns)-1):
if arr_ns[i]>0.8:
# remove 1 to u_no_show
arr_ns[i+1:] -= (1-walkin_no_show[arr_idx])
# increment later idx to add
arr_idx[i] = walkin_id
walkin_id +=1
#return a dataframe with both columns
return pd.DataFrame({'u_cumulative': arr_ns, 'mask_idx':arr_idx}, index=ser.index)
df[['u_cumulative', 'mask_idx']]= df.groupby(['Session']['Cumulative_no_show'].apply(create_u_columns)
# select the rows
df_toAdd = df.loc[df['mask_idx'].astype(bool), :].copy()
# replace the values as wanted
df_toAdd['No_Show'] = walkin_no_show[mask_idx]
df_toAdd['B_ID'] = 'walkin'+df_toAdd['mask_idx'].astype(str)
df_toAdd['u_cumulative'] -= 1
# add 0.5 to index for later sort
df_toAdd.index += 0.5
new_df_0.8 = pd.concat([df,df_toAdd]).sort_index()\
.reset_index(drop=True).drop('mask_idx', axis=1)
此外,我还想列举一个清单。我们可以改变(Alrns[i]>0.8)[ 0.8,0.9,1 ],并创建3 DF,如NexYdf0.0.8,NexYdFy0.9和NexYdf1.1.0/P>< P>,唯一需要考虑的窍门是增加索引值的方式。 以下是一个解决方案: 步行表演=[0.3,0.4,0.3,0.4,0.3]
df = pd.DataFrame({'B_ID': [1,2,3,4,5],
'No_Show': [0.1,0.1,0.3,0.5,0.6],
'Session': ['s1','s1','s1','s2','s2'],
'slot_num': [1,2,3,1,2],
'Cumulative_no_show': [1.5, 0.4, 1.6, 0.3, 1.9]
})
df = df[['B_ID', 'No_Show', 'Session', 'slot_num', 'Cumulative_no_show']]
df['u_cumulative'] = df['Cumulative_no_show']
print(df.head())
输出:
B_ID No_Show Session slot_num Cumulative_no_show u_cumulative
0 1 0.1 s1 1 1.5 1.5
1 2 0.1 s1 2 0.4 0.4
2 3 0.3 s1 3 1.6 1.6
3 4 0.5 s2 1 0.3 0.3
4 5 0.6 s2 2 1.9 1.9
然后:
输出:
B_ID No_Show Session slot_num Cumulative_no_show u_cumulative
0 1 0.1 s1 1 1.5 1.5
1 walkin1 0.3 s1 1 1.5 0.7
2 2 0.1 s1 2 0.4 0.4
3 3 0.3 s1 3 1.6 1.6
4 walkin2 0.3 s1 3 1.6 0.7
5 4 0.5 s2 1 0.3 0.3
6 5 0.6 s2 2 1.9 1.9
7 walkin3 0.3 s2 2 1.9 0.7
我希望有帮助
从导入的已用函数:IIUC,可以通过以下方式执行:
def create_u_columns (ser, threshold_ns = 0.8):
arr_ns = ser.to_numpy()
# array for latter insert
arr_idx = np.zeros(len(ser), dtype=int)
walkin_id = 0 #start at 0 not 1 for list indexing
for i in range(len(arr_ns)-1):
if arr_ns[i]>threshold_ns:
# remove 1 to u_no_show
arr_ns[i+1:] -= (1-walkin_no_show[walkin_id]) #this is slightly different
# increment later idx to add
arr_idx[i] = walkin_id+1
walkin_id +=1
#return a dataframe with both columns
return pd.DataFrame({'u_cumulative': arr_ns, 'mask_idx':arr_idx}, index=ser.index)
#create empty dict for storing the dataframes
d_dfs = {}
#iterate over the value for the threshold
for th_ns in [0.8, 0.9, 1.0]:
#create a copy and do the same kind of operation
df_ = df.copy()
df_[['u_cumulative', 'mask_idx']]= \
df_.groupby(['Session'])['Cumulative_no_show']\
.apply(lambda x: create_u_columns(x, threshold_ns=th_ns))
# select the rows
df_toAdd = df_.loc[df_['mask_idx'].astype(bool), :].copy()
# replace the values as wanted
df_toAdd['No_Show'] = np.array(walkin_no_show)[df_toAdd.groupby('Session').cumcount()]
df_toAdd['B_ID'] = 'walkin'+df_toAdd['mask_idx'].astype(str)
df_toAdd['u_cumulative'] -= (1 - df_toAdd['No_Show'])
# add 0.5 to index for later sort
df_toAdd.index += 0.5
d_dfs[th_ns] = pd.concat([df_,df_toAdd]).sort_index()\
.reset_index(drop=True).drop('mask_idx', axis=1)
然后,如果希望访问数据帧,可以执行以下操作,例如:
for th, df_ in d_dfs.items():
print (th)
print (df_.head(4))
我想你误解了我的逻辑。。你能用我同样的输入和复制同样的期望吗output@Danish,现在就试试吧!你能分享你的成果吗。。我已经添加了创建输入数据集的代码。。我得到的输出与预期略有不同。下面给出了创建输入数据的代码。数据框架({B'U ID)6,6,6,7,7,8,8,9,9,12,12,13,14,14,15,15,15,15,20,20,20,16,16,16,16,15,15,15,15,20,16,16,16,17,17,17,17,17,17,17,19],“没有任何展示的展示”作为作为作为pd进口大熊猫作为作为pd作为作为pd进口大熊猫作为作为pd作为pd进口,作为pd作为作为作为pd的进口熊猫作为作为pd的np输入,作为作为NPD的NPD的np输入,数据框架。数据框架。数据框架。数据框架。数据框架。数据框架。数据框架(数据框架(数据框架(数据框架({数据框架({数据框架)数据框架)数据框架((({数据框架)数据框架({[1.数据框架:[1.数据框架)数据框架({,[1,作为NPP作为作为作为np,'s2','s2','s2','s2','s2','s2','s2']是的,你是对的。。这是我的错。。对不起。。但是我得到的输出是u_累计列上的NaN值。。请分享你的成果。如果你有时间,请研究下面的问题。与上述问题类似,但变化较小。。上述问题是否复杂?这对我来说很难解决。
for th, df_ in d_dfs.items():
print (th)
print (df_.head(4))