Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/327.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Lambda Python中使用两个变量_Python_Pandas_Lambda - Fatal编程技术网

在Lambda Python中使用两个变量

在Lambda Python中使用两个变量,python,pandas,lambda,Python,Pandas,Lambda,我想基于两个变量创建一个新列。如果第1列>=0.5或第2列

我想基于两个变量创建一个新列。如果第1列>=0.5或第2列<0.5且第1列<0.5或第2列>=0.5,则我希望新列的值为Good,否则为Bad

我试过使用lambda和if


将行传递到lambda中

df['new column'] = df[['column 1', 'column 2']].apply(lambda row: "Good" if (row['column 1'] >= .5 or row['column 2'] < .5) and (row['column 1'] < .5 or row['column 2'] >= .5) else "Bad", axis=1)

将行传递到lambda中

df['new column'] = df[['column 1', 'column 2']].apply(lambda row: "Good" if (row['column 1'] >= .5 or row['column 2'] < .5) and (row['column 1'] < .5 or row['column 2'] >= .5) else "Bad", axis=1)
试试这个:

import pandas as pd 

def update_column(row):
    if (row['x'] >= .5 or row['y'] <= .5) and (row['x'] < .5 or row['y'] >= .5):
        return "Good"
    return "Bad"

df['new_column'] = df.apply(update_column, axis=1)

试试这个:

import pandas as pd 

def update_column(row):
    if (row['x'] >= .5 or row['y'] <= .5) and (row['x'] < .5 or row['y'] >= .5):
        return "Good"
    return "Bad"

df['new_column'] = df.apply(update_column, axis=1)

使用np。其中,pandas进行内部数据对齐,这意味着您不需要使用apply或逐行迭代,pandas将对齐索引上的数据:

df['new column'] = df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5)) & ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
df
时间: 每个回路5.83 ms±484µs,平均值±标准偏差为7次运行,每个回路100次

使用np。其中,pandas进行内部数据对齐,这意味着您不需要使用apply或逐行迭代,pandas将对齐索引上的数据:

df['new column'] = df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5)) & ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
df
时间: 每个回路5.83 ms±484µs,平均值±标准偏差为7次运行,每个回路100次


如果有矢量化选项,为什么要循环?当然,有几种不同的方法可以解决这个问题。但是循环通常要慢得多,而且应用比python循环快得多。这里是DataFrame.where方法,该方法速度更快,表现力更强。从长远来看,如果有一个向量化选项,那么了解工具why loop也是值得的?当然,有几种不同的方法可以解决这个问题。但是循环通常要慢得多,而且apply比python循环快得多。这里是DataFrame.where方法,该方法速度更快,表现力更强。从长远来看,了解这些工具也是值得的
df['new column'] = df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5)) & ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
df
import pandas as pd

df = pd.DataFrame({'x': [1, 2, 0.1, 0.1], 
                   'y': [1, 2, 0.7, 0.2], 
                   'column 3': [1, 2, 3, 4]})

df['new column'] = df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5)) & ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
df
     x    y  column 3 new column
0  1.0  1.0         1       Good
1  2.0  2.0         2       Good
2  0.1  0.7         3        Bad
3  0.1  0.2         4       Good
import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'x':np.random.random(100)*2, 
                   'y': np.random.random(100)*1})
def update_column(row):
    if (row['x'] >= .5 or row['y'] <= .5) and (row['x'] < .5 or row['y'] >= .5):
        return "Good"
    return "Bad"  
%timeit df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5))
& ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
%timeit df['new_column'] = df.apply(update_column, axis=1)