python.map双变量lambda_Python_Sql_Lambda_Pandas

python.map双变量lambda

python sql lambda pandas

python.map双变量lambda,python,sql,lambda,pandas,Python,Sql,Lambda,Pandas,我正在寻找一种方法，使用pandas进行列操作（比如在excel中），而不必遍历环境中的每一行。我正在处理可能非常大的pd.DataFrame对象，并希望尽可能使用函数。过去，我使用映射lambda函数来做类似的事情： df['a'] = df['a'].map(lambda x: int(str(int(x))[:-1])) #remove the last digit in column 'a' 是否可以映射类似以下lambda函数的东西来模拟SQL合并函数 lambda x,y: x

我正在寻找一种方法，使用pandas进行列操作（比如在excel中），而不必遍历环境中的每一行。我正在处理可能非常大的pd.DataFrame对象，并希望尽可能使用函数。
过去，我使用映射lambda函数来做类似的事情：

df['a'] = df['a'].map(lambda x: int(str(int(x))[:-1])) #remove the last digit in column 'a'

是否可以映射类似以下lambda函数的东西来模拟SQL合并函数

lambda x,y: x if x else y

其中x和y都是列（类似于第一个示例），我想使用lambda生成第三个列对象。

听起来像是在寻找

DataFrame.apply（）

方法。

apply

方法是跨

数据帧的列或行应用函数的一种非常通用的方法：
In [1]: df = DataFrame(randn(10, 3))

In [2]: df
Out[2]:
       0      1      2
0  2.848 -1.536  0.234
1 -0.652 -1.169  0.101
2  0.957 -0.642  0.961
3  1.722 -2.552 -0.517
4 -0.258  1.810  1.332
5  0.362 -1.215  0.768
6  0.949 -0.384 -0.802
7  0.782 -1.140 -2.217
8 -0.410  0.882 -0.366
9  0.240  0.632 -1.374

In [3]: def standardize(x):
   ...:     y = x - x.mean()
   ...:     sd = x.std()
   ...:     return y / sd
   ...:

In [4]: df.apply(standardize)
Out[4]:
       0      1      2
0  2.074 -0.773  0.384
1 -1.234 -0.490  0.263
2  0.286 -0.085  1.047
3  1.009 -1.555 -0.300
4 -0.862  1.801  1.385
5 -0.276 -0.526  0.871
6  0.279  0.113 -0.559
7  0.121 -0.468 -1.848
8 -1.005  1.087 -0.162
9 -0.391  0.895 -1.081

In [5]: df.apply(standardize).mean()
Out[5]:
0    8.327e-17
1    2.220e-17
2    2.220e-17
dtype: float64

In [6]: df.apply(standardize).std()
Out[6]:
0    1
1    1
2    1
dtype: float64

默认情况下，它将函数应用于列，但通过传递axis=1
可以将函数应用于每一行：
In [8]: df.apply(standardize, axis=1).mean(1)
Out[8]:
0   -1.850e-17
1    7.401e-17
2   -3.701e-17
3   -2.544e-17
4    9.252e-17
5    3.701e-17
6   -3.701e-17
7   -1.110e-16
8   -3.701e-17
9    0.000e+00
dtype: float64

对于x if x else y
类型计算，使用DataFrame.where（）
：
您希望y
参数是什么？map
的关键在于它将每个值逐个传递给您的函数。where
方法的Series
和DataFrame
s执行向量化x if x else y类型计算。非常感谢！假设我想要任何多列函数？
In [16]: df = DataFrame(randint(6, size=(10, 3)))

In [17]: df
Out[17]:
   0  1  2
0  2  1  4
1  2  4  0
2  4  4  4
3  4  3  2
4  2  4  3
5  1  1  3
6  2  0  2
7  1  4  4
8  2  4  5
9  2  1  2

In [19]: df.where(df, nan)
Out[19]:
   0   1   2
0  2   1   4
1  2   4 NaN
2  4   4   4
3  4   3   2
4  2   4   3
5  1   1   3
6  2 NaN   2
7  1   4   4
8  2   4   5
9  2   1   2