我是否可以对df.column的元素进行分类，并创建一个包含输出的列，而无需迭代（Python-Np）？_Python_Numpy_Pandas_Iteration

我是否可以对df.column的元素进行分类，并创建一个包含输出的列，而无需迭代（Python-Np）？

python numpy pandas

我是否可以对df.column的元素进行分类，并创建一个包含输出的列，而无需迭代（Python-Np）？,python,numpy,pandas,iteration,Python,Numpy,Pandas,Iteration,考虑到这个数据帧 A = pd.DataFrame([[1, 5, 2], [2, 4, 4], [3, 3, 1], [4, 2, 2], [5, 1, 4]], columns=['A', 'B', 'C'], index=[1, 2, 3, 4, 5]) 我想根据列“A”的大小对其元素进行分类，并创建一个新列，其输出如下： In [26]: A['Size'] = "" for index, row in A.iterrows():

考虑到这个数据帧

A = pd.DataFrame([[1, 5, 2], [2, 4, 4], [3, 3, 1], [4, 2, 2], [5, 1, 4]],
             columns=['A', 'B', 'C'], index=[1, 2, 3, 4, 5])

我想根据列“A”的大小对其元素进行分类，并创建一个新列，其输出如下：

In [26]: A['Size'] = ""
         for index, row in A.iterrows():
             if row['A'] >= 4:
                 A.loc[index, 'Size'] = 'Big'
             if 2.5 < row['A'] < 4:
                 A.loc[index, 'Size'] = 'Medium'
             if 0 < row['A'] < 2.4:
                 A.loc[index, 'Size'] = 'Small'

假设同一类别有很多列和不同的参数，有没有更有效的方法

感谢

您可以使用

loc

作为布尔掩码，仅为满足条件的行分配，即使对于如此小的df，速度也会更快，对于较大的df，速度也会显著更快：

In [60]:

%%timeit 
A['Size'] = ""
for index, row in A.iterrows():
    if row['A'] >= 4:
        A.loc[index, 'Size'] = 'Big'
    if 2.5 < row['A'] < 4:
        A.loc[index, 'Size'] = 'Medium'
    if 0 < row['A'] < 2.4:
        A.loc[index, 'Size'] = 'Small'
100 loops, best of 3: 2.31 ms per loop
In [62]:

%%timeit
A.loc[A['A'] >=4, 'Size'] = 'Big'
A.loc[(A['A'] >= 2.5) & (A['A'] < 4), 'Size'] = 'Medium'
A.loc[A['A'] < 2.4, 'Size'] = 'Small'

100 loops, best of 3: 1.95 ms per loop

更新

有趣的是，对于50000行的数据帧，

loc

方法优于嵌套的

np方法。其中

方法：我得到4.24毫秒，而不是12.1毫秒

In [60]:

%%timeit 
A['Size'] = ""
for index, row in A.iterrows():
    if row['A'] >= 4:
        A.loc[index, 'Size'] = 'Big'
    if 2.5 < row['A'] < 4:
        A.loc[index, 'Size'] = 'Medium'
    if 0 < row['A'] < 2.4:
        A.loc[index, 'Size'] = 'Small'
100 loops, best of 3: 2.31 ms per loop
In [62]:

%%timeit
A.loc[A['A'] >=4, 'Size'] = 'Big'
A.loc[(A['A'] >= 2.5) & (A['A'] < 4), 'Size'] = 'Medium'
A.loc[A['A'] < 2.4, 'Size'] = 'Small'

100 loops, best of 3: 1.95 ms per loop

In [64]:

%%timeit
A['Size'] = np.where(A['A'] < 2.4, 'Small', np.where((A['A'] >= 2.5) & (A['A'] < 4), 'Medium', np.where(A['A'] >=4, 'Big','')))
1000 loops, best of 3: 828 µs per loop