Python Pandas-如果列较大且不为null,则生成值
我的数据框中有以下列:Python Pandas-如果列较大且不为null,则生成值,python,pandas,boolean,nan,Python,Pandas,Boolean,Nan,我的数据框中有以下列: W1 W2 W3 W4 L1 L2 L3 L4 0 6 6 3 6 7 3 6 7 Nan Nan Nan 6 Nan Nan Nan 我想在这个数据框中添加四列,SET1。。SET4是: 1.0如果Wi>li且两者都不是Nan 如果Widf['L'+str(i)],df['W'+str(i)]df['L'+str(i)],df['W'+str(i)]
W1 W2 W3 W4 L1 L2 L3 L4
0 6 6 3 6 7 3 6
7 Nan Nan Nan 6 Nan Nan Nan
我想在这个数据框中添加四列,SET1。。SET4是:
- 1.0如果Wi>li且两者都不是Nan
- 如果Wi
- 如果Wi或li为nan,则为nan
SET1 SET2 SET3 SET4
0.0 0.0 1.0 0.0
1.0 Nan Nan Nan
我使用以下代码来应用前2个项目符号,但我很难正确处理NaN
for i in range(1,5):
wincol = "W" + str(i)
losecol = "L" + str(i)
setcol = "SET" + str(i)
matches_df[setcol] = matches_df[wincol] > matches_df[losecol]
matches_df[setcol] = matches_df[setcol].astype(float)
您需要
startswith
,然后只需将值除以,并创建所需的df
#df=df.replace('Nan',np.nan)
#df=df.astype(float)
new_df=pd.DataFrame((df.loc[:,df.columns.str.startswith('W')].values/df.loc[:,df.columns.str.startswith('L')].values))
new_df[new_df.notnull()]=new_df.gt(1).astype(int)
new_df
Out[239]:
0 1 2 3
0 0.0 0.0 1.0 0.0
1 1.0 NaN NaN NaN
您需要
startswith
,然后只需将值除以,并创建所需的df
#df=df.replace('Nan',np.nan)
#df=df.astype(float)
new_df=pd.DataFrame((df.loc[:,df.columns.str.startswith('W')].values/df.loc[:,df.columns.str.startswith('L')].values))
new_df[new_df.notnull()]=new_df.gt(1).astype(int)
new_df
Out[239]:
0 1 2 3
0 0.0 0.0 1.0 0.0
1 1.0 NaN NaN NaN
如果
W*
和L*
列的顺序不同(例如:['W1'、'W3'、'W4'、'W2']
和['L2'、'L1'、'L4'、'L3']
),以下解决方案也适用:
演示:
如果
W*
和L*
列的顺序不同(例如:['W1'、'W3'、'W4'、'W2']
和['L2'、'L1'、'L4'、'L3']
),以下解决方案也适用:
演示:
一种方法是使用
numpy
:
df = pd.DataFrame({'W1': [0, 7], 'W2': [6, np.nan], 'W3': [6, np.nan], 'W4': [3, np.nan],
'L1': [6, 6], 'L2': [7, np.nan], 'L3': [3, np.nan], 'L4': [6, np.nan]})
# split into 2 arrays
df_L = df.loc[:, df.columns.str.startswith('L')].values
df_W = df.loc[:, df.columns.str.startswith('W')].values
# apply comparison logic
A = (df_W > df_L).astype(float)
# apply nan logic
A[np.logical_or(np.isnan(df_L), np.isnan(df_W))] = np.nan
# create dataframe
res = pd.DataFrame(A, columns=['SET'+str(i) for i in range(1, A.shape[1]+1)])
print(res)
SET1 SET2 SET3 SET4
0 0.0 0.0 1.0 0.0
1 1.0 NaN NaN NaN
一种方法是使用
numpy
:
df = pd.DataFrame({'W1': [0, 7], 'W2': [6, np.nan], 'W3': [6, np.nan], 'W4': [3, np.nan],
'L1': [6, 6], 'L2': [7, np.nan], 'L3': [3, np.nan], 'L4': [6, np.nan]})
# split into 2 arrays
df_L = df.loc[:, df.columns.str.startswith('L')].values
df_W = df.loc[:, df.columns.str.startswith('W')].values
# apply comparison logic
A = (df_W > df_L).astype(float)
# apply nan logic
A[np.logical_or(np.isnan(df_L), np.isnan(df_W))] = np.nan
# create dataframe
res = pd.DataFrame(A, columns=['SET'+str(i) for i in range(1, A.shape[1]+1)])
print(res)
SET1 SET2 SET3 SET4
0 0.0 0.0 1.0 0.0
1 1.0 NaN NaN NaN
还有
numpy。选择。它优先考虑遇到的第一个条件,因此只需首先设置null检查,逻辑就会按照您的需要工作
import numpy as np
for i in range(1,5):
df['SET'+str(i)] = np.select(((df['W'+str(i)].isnull() | df['L'+str(i)].isnull()),
df['W'+str(i)] > df['L'+str(i)], df['W'+str(i)] < df['L'+str(i)]),
[np.NaN, 1, 0])
W1 W2 W3 W4 L1 L2 L3 L4 SET1 SET2 SET3 SET4
0 0 6 6 3 6 7 3 6 0.0 0.0 1.0 0.0
1 7 NaN NaN NaN 6 NaN NaN NaN 1.0 NaN NaN NaN
将numpy导入为np
对于范围(1,5)内的i:
df['SET'+str(i)]=np.select((df['W'+str(i)].isnull()| df['L'+str(i)].isnull()),
df['W'+str(i)]>df['L'+str(i)],df['W'+str(i)]还有numpy。选择。它优先考虑遇到的第一个条件,因此只需首先设置null检查,逻辑就会按照您的需要工作
import numpy as np
for i in range(1,5):
df['SET'+str(i)] = np.select(((df['W'+str(i)].isnull() | df['L'+str(i)].isnull()),
df['W'+str(i)] > df['L'+str(i)], df['W'+str(i)] < df['L'+str(i)]),
[np.NaN, 1, 0])
W1 W2 W3 W4 L1 L2 L3 L4 SET1 SET2 SET3 SET4
0 0 6 6 3 6 7 3 6 0.0 0.0 1.0 0.0
1 7 NaN NaN NaN 6 NaN NaN NaN 1.0 NaN NaN NaN
将numpy导入为np
对于范围(1,5)内的i:
df['SET'+str(i)]=np.select((df['W'+str(i)].isnull()| df['L'+str(i)].isnull()),
df['W'+str(i)]>df['L'+str(i)],df['W'+str(i)]
将列拆分为一个多索引
n = df.set_axis(
pd.MultiIndex.from_tuples(df.columns.map(tuple)),
axis=1, inplace=False
)
n
L W
1 2 3 4 1 2 3 4
0 6 7.0 3.0 6.0 0 6.0 6.0 3.0
1 6 NaN NaN NaN 7 NaN NaN NaN
n = df.set_axis(
pd.MultiIndex.from_tuples([(a, ''.join(b)) for a, *b in df.columns]),
axis=1, inplace=False
)
生成多索引的方法稍微稳健一些
n = df.set_axis(
pd.MultiIndex.from_tuples(df.columns.map(tuple)),
axis=1, inplace=False
)
n
L W
1 2 3 4 1 2 3 4
0 6 7.0 3.0 6.0 0 6.0 6.0 3.0
1 6 NaN NaN NaN 7 NaN NaN NaN
n = df.set_axis(
pd.MultiIndex.from_tuples([(a, ''.join(b)) for a, *b in df.columns]),
axis=1, inplace=False
)
将列拆分为一个多索引
n = df.set_axis(
pd.MultiIndex.from_tuples(df.columns.map(tuple)),
axis=1, inplace=False
)
n
L W
1 2 3 4 1 2 3 4
0 6 7.0 3.0 6.0 0 6.0 6.0 3.0
1 6 NaN NaN NaN 7 NaN NaN NaN
n = df.set_axis(
pd.MultiIndex.from_tuples([(a, ''.join(b)) for a, *b in df.columns]),
axis=1, inplace=False
)
生成多索引的方法稍微稳健一些
n = df.set_axis(
pd.MultiIndex.from_tuples(df.columns.map(tuple)),
axis=1, inplace=False
)
n
L W
1 2 3 4 1 2 3 4
0 6 7.0 3.0 6.0 0 6.0 6.0 3.0
1 6 NaN NaN NaN 7 NaN NaN NaN
n = df.set_axis(
pd.MultiIndex.from_tuples([(a, ''.join(b)) for a, *b in df.columns]),
axis=1, inplace=False
)