Python 将值返回到列表中的新列
我有一份销售数据的df。我想再添加两列,突出显示其他列中的重要值 目前,我有一段代码,每当进行销售时,它都会将值返回到新列中。因此,如果主题出现在Python 将值返回到列表中的新列,python,pandas,indexing,apply,Python,Pandas,Indexing,Apply,我有一份销售数据的df。我想再添加两列,突出显示其他列中的重要值 目前,我有一段代码,每当进行销售时,它都会将值返回到新列中。因此,如果主题出现在Sales列中,则相应的值将在新列中编入索引 import pandas as pd import numpy as np a = 5 N = 10 df = pd.DataFrame({ 'Tom_$' : [500, 50, 10, 60, 50, 77, 30, 600, 40, 60], 'Tom_c' : [100, 20
Sales
列中,则相应的值将在新列中编入索引
import pandas as pd
import numpy as np
a = 5
N = 10
df = pd.DataFrame({
'Tom_$' : [500, 50, 10, 60, 50, 77, 30, 600, 40, 60],
'Tom_c' : [100, 20, 40, 50, 0, 67, 90, 100, 0, 0],
'Code' : ['nan', 'nan', 'Big', 'nan', 'nan', 'Small', 'nan','nan', 'nan','nan'],
'Sales' : ['nan','nan','Tom','nan','nan','Tom','nan','nan','nan','nan']})
df['Big'] = df.apply(lambda row: row.get(row['Sales']+'_$') if pd.notnull(row['Sales']) else np.nan, axis=1)
df['Small'] = df.apply(lambda row: row.get(row['Sales']+'_c') if pd.notnull(row['Sales']) else np.nan, axis=1)
输出:
Code Julie_$ Julie_c Sales Tom_$ Tom_c Dollars Cents
0 nan 500 300 nan 500 100 NaN NaN
1 nan 40 20 nan 50 20 NaN NaN
2 Big 10 70 Tom 10 40 10.0 40.0
3 nan 10 50 nan 60 50 NaN NaN
4 nan 50 80 nan 50 0 NaN NaN
5 Small 37 67 Tom 77 67 77.0 67.0
6 nan 30 50 nan 30 90 NaN NaN
7 Big 900 100 Julie 600 100 900.0 100.0
8 nan 40 40 nan 40 0 NaN NaN
9 nan 50 0 nan 60 0 NaN NaN
这很好,但我想使用code
列添加另一层。如果此列中的值是Big
我希望继续返回salesperson值,直到有新的销售。如果它很小,我不在乎
因此,输出将是:
Code Julie_$ Julie_c Sales Tom_$ Tom_c Dollars Cents
0 nan 500 300 nan 500 100 NaN NaN
1 nan 40 20 nan 50 20 NaN NaN
2 Big 10 70 Tom 10 40 10.0 40.0
3 nan 10 50 nan 60 50 60.0 50.0
4 nan 50 80 nan 50 0 50.0 0.0
5 Small 37 67 Tom 77 67 77.0 67.0
6 nan 30 50 nan 30 90 NaN NaN
7 Big 900 100 Julie 600 100 900.0 100.0
8 nan 40 40 nan 40 0 40.0 0.0
9 nan 50 0 nan 60 0 60.0 0.0
我考虑过使用类似的方法返回值,如Sales
列
df['Dollars'] = df.apply(lambda row: row.get(row['Sales']+'_$') if pd.notnull(row['Sales']) else np.nan, axis=1)
df['Cents'] = df.apply(lambda row: row.get(row['Sales']+'_c') if pd.notnull(row['Sales']) else np.nan, axis=1)
但这仅在值位于同一索引时有效。我有点糊涂了。我不确定是否应该填写数据,因此code
列中总是有值
Code
0 nan
1 nan
2 Big
3 Big
4 Big
5 Small
6 Small
7 Big
8 Big
9 Big
然后我可以选择重要的。但是我不愿意更改原始数据集。您可以使用
Sales
列中NaN
s的值替换为ffill
,用于通过掩码通过正向填充进行正向填充code
是大还是小
:
#replace strings nan to np.nan first
df[['Code', 'Sales']] = df[['Code', 'Sales']].replace('nan',np.nan)
mask = (df['Code'].ffill() == 'Big') | (df['Code'] == 'Small')
df.loc[mask, 'Sales'] = df['Sales'].ffill()
df['Dollars'] = df.apply(lambda row: row.get(row['Sales']+'_$') if pd.notnull(row['Sales']) else np.nan, axis=1)
df['Cents'] = df.apply(lambda row: row.get(row['Sales']+'_c') if pd.notnull(row['Sales']) else np.nan, axis=1)
print (df)
Code Julie_$ Julie_c Sales Tom_$ Tom_c Dollars Cents
0 NaN 500 300 NaN 500 100 NaN NaN
1 NaN 40 20 NaN 50 20 NaN NaN
2 Big 10 70 Tom 10 40 10.0 40.0
3 NaN 10 50 Tom 60 50 60.0 50.0
4 NaN 50 80 Tom 50 0 50.0 0.0
5 Small 37 67 Tom 77 67 77.0 67.0
6 NaN 30 50 NaN 30 90 NaN NaN
7 Big 900 100 Julie 600 100 900.0 100.0
8 NaN 40 40 Julie 40 0 40.0 40.0
9 NaN 50 0 Julie 60 0 50.0 0.0
谢谢@Jezrael。仅需确认,当Big
位于code
列中时,这将ffill
显示在Sales
列中?