Python 如何使用'pandas.cut()`根据要装箱的列以外的列来装箱数据?
我有一个熊猫数据框,如下所示:Python 如何使用'pandas.cut()`根据要装箱的列以外的列来装箱数据?,python,pandas,dataframe,categories,Python,Pandas,Dataframe,Categories,我有一个熊猫数据框,如下所示: import pandas as pd import numpy as np data = {"first_column": ["item1", "item2", "item3", "item4", "item5", "item6", "item7"], "second_column": ["cat1", "cat1", "cat1", "cat2", "cat2", "cat2", "cat2"], "third_column"
import pandas as pd
import numpy as np
data = {"first_column": ["item1", "item2", "item3", "item4", "item5", "item6", "item7"],
"second_column": ["cat1", "cat1", "cat1", "cat2", "cat2", "cat2", "cat2"],
"third_column": [5, 1, 8, 3, 731, 189, 9]}
df = pd.DataFrame(data)
df
first_column second_column third_column
0 item1 cat1 5
1 item2 cat1 1
2 item3 cat1 8
3 item4 cat2 3
4 item5 cat2 731
5 item6 cat2 189
6 item7 cat2 9
现在,假设我想创建第四列,使用pandas.cut()
显示第三列的分类。在这里,我标记每行第三列中的元素是否小于或等于10,
- 您不需要
pd.cut
。您可以使用谢谢。假设我想要更复杂的间隔,例如小于或等于1000le(1000)
和大于或等于20ge(20)
?怎么做?在这种情况下,我需要pd.cut()
?
df["less_than_ten"]= pd.cut(df.third_column, [-np.inf, 10, np.inf], labels=(1,0))
first_column second_column third_column less_than_ten
0 item1 cat1 5 1
1 item2 cat1 1 1
2 item3 cat1 8 1
3 item4 cat2 3 1
4 item5 cat2 731 0
5 item6 cat2 189 0
6 item7 cat2 9 1
first_column second_column third_column less_than_ten
0 item1 cat1 5 1
1 item2 cat1 1 1
2 item3 cat1 8 1
3 item4 cat2 3 3
4 item5 cat2 731 2
5 item6 cat2 189 2
6 item7 cat2 9 3
m = dict(cat1=0, cat2=2)
df.assign(less_than_ten=df.second_column.map(m) + df.third_column.le(10))
first_column second_column third_column less_than_ten
0 item1 cat1 5 1
1 item2 cat1 1 1
2 item3 cat1 8 1
3 item4 cat2 3 3
4 item5 cat2 731 2
5 item6 cat2 189 2
6 item7 cat2 9 3