Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/visual-studio/7.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pandas-如果列较大且不为null,则生成值_Python_Pandas_Boolean_Nan - Fatal编程技术网

Python Pandas-如果列较大且不为null,则生成值

Python Pandas-如果列较大且不为null,则生成值,python,pandas,boolean,nan,Python,Pandas,Boolean,Nan,我的数据框中有以下列: W1 W2 W3 W4 L1 L2 L3 L4 0 6 6 3 6 7 3 6 7 Nan Nan Nan 6 Nan Nan Nan 我想在这个数据框中添加四列,SET1。。SET4是: 1.0如果Wi>li且两者都不是Nan 如果Widf['L'+str(i)],df['W'+str(i)]df['L'+str(i)],df['W'+str(i)]

我的数据框中有以下列:

W1 W2   W3    W4   L1  L2  L3    L4
0  6    6     3    6   7   3     6
7  Nan Nan   Nan   6 Nan  Nan   Nan
我想在这个数据框中添加四列,SET1。。SET4是:

  • 1.0如果Wi>li且两者都不是Nan
  • 如果Wi
  • 如果Wi或li为nan,则为nan
在上述示例中,输出应为:

SET1 SET2 SET3 SET4
0.0   0.0  1.0  0.0
1.0   Nan  Nan  Nan
我使用以下代码来应用前2个项目符号,但我很难正确处理
NaN

for i in range(1,5):
    wincol = "W" + str(i)
    losecol = "L" + str(i)
    setcol = "SET" + str(i)
    matches_df[setcol] = matches_df[wincol] > matches_df[losecol]
    matches_df[setcol] = matches_df[setcol].astype(float)

您需要
startswith
,然后只需将值除以,并创建所需的df

#df=df.replace('Nan',np.nan)
#df=df.astype(float)
new_df=pd.DataFrame((df.loc[:,df.columns.str.startswith('W')].values/df.loc[:,df.columns.str.startswith('L')].values))


new_df[new_df.notnull()]=new_df.gt(1).astype(int)
new_df
Out[239]: 
     0    1    2    3
0  0.0  0.0  1.0  0.0
1  1.0  NaN  NaN  NaN

您需要
startswith
,然后只需将值除以,并创建所需的df

#df=df.replace('Nan',np.nan)
#df=df.astype(float)
new_df=pd.DataFrame((df.loc[:,df.columns.str.startswith('W')].values/df.loc[:,df.columns.str.startswith('L')].values))


new_df[new_df.notnull()]=new_df.gt(1).astype(int)
new_df
Out[239]: 
     0    1    2    3
0  0.0  0.0  1.0  0.0
1  1.0  NaN  NaN  NaN

如果
W*
L*
列的顺序不同(例如:
['W1'、'W3'、'W4'、'W2']
['L2'、'L1'、'L4'、'L3']
),以下解决方案也适用:

演示:


如果
W*
L*
列的顺序不同(例如:
['W1'、'W3'、'W4'、'W2']
['L2'、'L1'、'L4'、'L3']
),以下解决方案也适用:

演示:


一种方法是使用
numpy

df = pd.DataFrame({'W1': [0, 7], 'W2': [6, np.nan], 'W3': [6, np.nan], 'W4': [3, np.nan],
                   'L1': [6, 6], 'L2': [7, np.nan], 'L3': [3, np.nan], 'L4': [6, np.nan]})

# split into 2 arrays
df_L = df.loc[:, df.columns.str.startswith('L')].values
df_W = df.loc[:, df.columns.str.startswith('W')].values

# apply comparison logic
A = (df_W > df_L).astype(float)

# apply nan logic
A[np.logical_or(np.isnan(df_L), np.isnan(df_W))] = np.nan

# create dataframe
res = pd.DataFrame(A, columns=['SET'+str(i) for i in range(1, A.shape[1]+1)])

print(res)

   SET1  SET2  SET3  SET4
0   0.0   0.0   1.0   0.0
1   1.0   NaN   NaN   NaN

一种方法是使用
numpy

df = pd.DataFrame({'W1': [0, 7], 'W2': [6, np.nan], 'W3': [6, np.nan], 'W4': [3, np.nan],
                   'L1': [6, 6], 'L2': [7, np.nan], 'L3': [3, np.nan], 'L4': [6, np.nan]})

# split into 2 arrays
df_L = df.loc[:, df.columns.str.startswith('L')].values
df_W = df.loc[:, df.columns.str.startswith('W')].values

# apply comparison logic
A = (df_W > df_L).astype(float)

# apply nan logic
A[np.logical_or(np.isnan(df_L), np.isnan(df_W))] = np.nan

# create dataframe
res = pd.DataFrame(A, columns=['SET'+str(i) for i in range(1, A.shape[1]+1)])

print(res)

   SET1  SET2  SET3  SET4
0   0.0   0.0   1.0   0.0
1   1.0   NaN   NaN   NaN

还有
numpy。选择
。它优先考虑遇到的第一个条件,因此只需首先设置null检查,逻辑就会按照您的需要工作

import numpy as np

for i in range(1,5):
    df['SET'+str(i)] = np.select(((df['W'+str(i)].isnull() | df['L'+str(i)].isnull()), 
                        df['W'+str(i)] > df['L'+str(i)], df['W'+str(i)] < df['L'+str(i)]), 
                        [np.NaN, 1, 0])

   W1   W2   W3   W4  L1   L2   L3   L4  SET1 SET2 SET3 SET4
0   0    6    6    3   6    7    3    6  0.0  0.0  1.0  0.0
1   7  NaN  NaN  NaN   6  NaN  NaN  NaN  1.0  NaN  NaN  NaN
将numpy导入为np
对于范围(1,5)内的i:
df['SET'+str(i)]=np.select((df['W'+str(i)].isnull()| df['L'+str(i)].isnull()),
df['W'+str(i)]>df['L'+str(i)],df['W'+str(i)]
还有
numpy。选择
。它优先考虑遇到的第一个条件,因此只需首先设置null检查,逻辑就会按照您的需要工作

import numpy as np

for i in range(1,5):
    df['SET'+str(i)] = np.select(((df['W'+str(i)].isnull() | df['L'+str(i)].isnull()), 
                        df['W'+str(i)] > df['L'+str(i)], df['W'+str(i)] < df['L'+str(i)]), 
                        [np.NaN, 1, 0])

   W1   W2   W3   W4  L1   L2   L3   L4  SET1 SET2 SET3 SET4
0   0    6    6    3   6    7    3    6  0.0  0.0  1.0  0.0
1   7  NaN  NaN  NaN   6  NaN  NaN  NaN  1.0  NaN  NaN  NaN
将numpy导入为np
对于范围(1,5)内的i:
df['SET'+str(i)]=np.select((df['W'+str(i)].isnull()| df['L'+str(i)].isnull()),
df['W'+str(i)]>df['L'+str(i)],df['W'+str(i)]
将列拆分为一个
多索引

n = df.set_axis(
    pd.MultiIndex.from_tuples(df.columns.map(tuple)),
    axis=1, inplace=False
)

n

   L                 W               
   1    2    3    4  1    2    3    4
0  6  7.0  3.0  6.0  0  6.0  6.0  3.0
1  6  NaN  NaN  NaN  7  NaN  NaN  NaN
n = df.set_axis(
    pd.MultiIndex.from_tuples([(a, ''.join(b)) for a, *b in df.columns]),
    axis=1, inplace=False
)


生成
多索引的方法稍微稳健一些

n = df.set_axis(
    pd.MultiIndex.from_tuples(df.columns.map(tuple)),
    axis=1, inplace=False
)

n

   L                 W               
   1    2    3    4  1    2    3    4
0  6  7.0  3.0  6.0  0  6.0  6.0  3.0
1  6  NaN  NaN  NaN  7  NaN  NaN  NaN
n = df.set_axis(
    pd.MultiIndex.from_tuples([(a, ''.join(b)) for a, *b in df.columns]),
    axis=1, inplace=False
)

将列拆分为一个
多索引

n = df.set_axis(
    pd.MultiIndex.from_tuples(df.columns.map(tuple)),
    axis=1, inplace=False
)

n

   L                 W               
   1    2    3    4  1    2    3    4
0  6  7.0  3.0  6.0  0  6.0  6.0  3.0
1  6  NaN  NaN  NaN  7  NaN  NaN  NaN
n = df.set_axis(
    pd.MultiIndex.from_tuples([(a, ''.join(b)) for a, *b in df.columns]),
    axis=1, inplace=False
)


生成
多索引的方法稍微稳健一些

n = df.set_axis(
    pd.MultiIndex.from_tuples(df.columns.map(tuple)),
    axis=1, inplace=False
)

n

   L                 W               
   1    2    3    4  1    2    3    4
0  6  7.0  3.0  6.0  0  6.0  6.0  3.0
1  6  NaN  NaN  NaN  7  NaN  NaN  NaN
n = df.set_axis(
    pd.MultiIndex.from_tuples([(a, ''.join(b)) for a, *b in df.columns]),
    axis=1, inplace=False
)