Python 如何使用'pandas.cut()`根据要装箱的列以外的列来装箱数据?

Python 如何使用'pandas.cut()`根据要装箱的列以外的列来装箱数据?,python,pandas,dataframe,categories,Python,Pandas,Dataframe,Categories,我有一个熊猫数据框,如下所示: import pandas as pd import numpy as np data = {"first_column": ["item1", "item2", "item3", "item4", "item5", "item6", "item7"], "second_column": ["cat1", "cat1", "cat1", "cat2", "cat2", "cat2", "cat2"], "third_column"

我有一个熊猫数据框,如下所示:

import pandas as pd
import numpy as np

data = {"first_column": ["item1", "item2", "item3", "item4", "item5", "item6", "item7"],
        "second_column": ["cat1", "cat1", "cat1", "cat2", "cat2", "cat2", "cat2"],
        "third_column": [5, 1, 8, 3, 731, 189, 9]}

df = pd.DataFrame(data)

df
     first_column second_column  third_column
0        item1          cat1             5
1        item2          cat1             1
2        item3          cat1             8
3        item4          cat2             3
4        item5          cat2           731
5        item6          cat2           189
6        item7          cat2             9
现在,假设我想创建第四列,使用
pandas.cut()
显示第三列的分类。在这里,我标记每行
第三列中的元素是否小于或等于10,

  • 您不需要
    pd.cut
    。您可以使用
    谢谢。假设我想要更复杂的间隔,例如小于或等于1000
    le(1000)
    和大于或等于20
    ge(20)
    ?怎么做?在这种情况下,我需要
    pd.cut()
    df["less_than_ten"]= pd.cut(df.third_column, [-np.inf, 10, np.inf], labels=(1,0))
    
          first_column second_column  third_column less_than_ten
    0        item1          cat1             5             1
    1        item2          cat1             1             1
    2        item3          cat1             8             1
    3        item4          cat2             3             1
    4        item5          cat2           731             0
    5        item6          cat2           189             0
    6        item7          cat2             9             1
    
          first_column second_column  third_column less_than_ten
    0        item1          cat1             5             1
    1        item2          cat1             1             1
    2        item3          cat1             8             1
    3        item4          cat2             3             3
    4        item5          cat2           731             2
    5        item6          cat2           189             2
    6        item7          cat2             9             3
    
    m = dict(cat1=0, cat2=2)
    df.assign(less_than_ten=df.second_column.map(m) + df.third_column.le(10))
    
      first_column second_column  third_column  less_than_ten
    0        item1          cat1             5              1
    1        item2          cat1             1              1
    2        item3          cat1             8              1
    3        item4          cat2             3              3
    4        item5          cat2           731              2
    5        item6          cat2           189              2
    6        item7          cat2             9              3