Python 基于列为df创建自定义存储桶_Python_Pandas_Jupyter Notebook

Python 基于列为df创建自定义存储桶

python pandas jupyter-notebook

Python 基于列为df创建自定义存储桶,python,pandas,jupyter-notebook,Python,Pandas,Jupyter Notebook,我想根据price列中的price值添加一个带有自定义bucket的新列（参见下面的示例） =0）&（x=51）&（x=101）&（x=251）&（x），但我正在努力将其添加到主df中，一个简单的解决方案如下： df.loc[df.price < 400, 'price_category'] = 'low' df.loc[df.price

我想根据price列中的price值添加一个带有自定义bucket的新列（参见下面的示例）

```
<400=低
```
```
=401和1000=昂贵
```

桌子

输出表

product_id  price   price_category 
 
     2       1001   high    
     4        500   medium   
     5        490   medium  
     6        200   low 
     3        429   medium  
     5        321   low

这就是我迄今为止所尝试的：

from numba import njit

def cut(arr):
    bins = np.empty(arr.shape[0])
    for idx, x in enumerate(arr):
        if (x >= 0) & (x <= 50):
            bins[idx] = 1
        elif (x >= 51) & (x <= 100):
            bins[idx] = 2
        elif (x >= 101) & (x <= 250):
            bins[idx] = 3
        elif (x >= 251) & (x <= 1000):
            bins[idx] = 4
        else:
            bins[idx] = 5

    return bins

a = cut(df2['average_listings'].to_numpy())

conversion_dict = {1: 'S',
                   2: 'M',
                   3: 'L',
                   4: 'XL',
                   5: 'XXL'}

bins = list(map(conversion_dict.get, a))

来自numba import njit
def切割（arr）：
料仓=np.空（arr.shape[0]）
对于idx，枚举中的x（arr）：
如果（x>=0）&（x=51）&（x=101）&（x=251）&（x），但我正在努力将其添加到主df中，一个简单的解决方案如下：
df.loc[df.price < 400, 'price_category'] = 'low'

df.loc[df.price<400，'price_category']='low'
简单的解决方案如下：
df.loc[df.price < 400, 'price_category'] = 'low'

df.loc[df.price<400，'price_category']='low'
pandas
有自己的方法。请指定右侧箱子边缘和相应的标签
df['price_category'] = pd.cut(df.price, [-np.inf, 400, 1000, np.inf],
                              labels=['low', 'medium', 'high'])

   product_id  price price_category
0           2   1203           high
1           4    500         medium
2           5    490         medium
3           6    200            low
4           3    429         medium
5           5    321            low


如果不使用labels
参数，您将获得用于数据的确切存储箱（以及默认情况下的闭包），在本例中，这些存储箱是：
Categories (3, interval[float64]): [(-inf, 400.0] < (400.0, 1000.0] < (1000.0, inf]]

类别（3，区间[float64]）：[（-inf，400.0]<（400.0，1000.0]<（1000.0，inf]]
pandas
有自己的方法。请指定右侧箱子边缘和相应的标签
df['price_category'] = pd.cut(df.price, [-np.inf, 400, 1000, np.inf],
                              labels=['low', 'medium', 'high'])

   product_id  price price_category
0           2   1203           high
1           4    500         medium
2           5    490         medium
3           6    200            low
4           3    429         medium
5           5    321            low


如果不使用labels
参数，您将获得用于数据的确切存储箱（以及默认情况下的闭包），在本例中，这些存储箱是：
Categories (3, interval[float64]): [(-inf, 400.0] < (400.0, 1000.0] < (1000.0, inf]]

类别（3，区间[float64]）：[（-inf，400.0]<（400.0，1000.0]<（1000.0，inf]]
您可以使用：

您可以使用：

到目前为止，您尝试了什么？如果您想要一个开始的位置，请搜索如何将apply
与DataFrame一起使用到目前为止您尝试了什么？如果您想要一个开始的位置，请搜索如何将apply
与DataFrame一起使用这一点都没有帮助这一点也没有帮助