Python 大熊猫按条件分为3组
我有过这样的经历Python 大熊猫按条件分为3组,python,python-3.x,pandas,group-by,Python,Python 3.x,Pandas,Group By,我有过这样的经历 import pandas as pd import numpy as np user = pd.DataFrame({'User':['101','101','101','102','102','101','101','102','102','102','102','102'],'Country':['India','Japan','India','Brazil','Japan','UK','Austria','Japan','Singapore','UK','UK','UK
import pandas as pd
import numpy as np
user = pd.DataFrame({'User':['101','101','101','102','102','101','101','102','102','102','102','102'],'Country':['India','Japan','India','Brazil','Japan','UK','Austria','Japan','Singapore','UK','UK','UK']
,'Count':[85,78,70,5,6,8,60,30,5,6,5,4]})
我想对计数列进行排序,并将前30%的行分配给第3组,然后将下30%分配给第2组,其余30%分配给第1组。我该怎么做呢。这是我的预期输出。前4列。还可以看到我的计算结果,我是如何划分30%,30%,40%
首先需要使用自定义函数对列进行排序,然后使用自定义函数对列进行排序,并将每组的长度返回到新数据帧的新行: 来自完美的想法,谢谢
用于顶部
30-30-30
:
user = user.sort_values(['User','Count'], ascending=[True, False])
def f(x):
#split to 4 groups, because 3 + 3 + 3 != 1
a, b, c, d = np.split(x, [int(.3*len(x)), int(.6*len(x)), int(.9*len(x))])
return pd.Series([len(a), len(b), len(c)], index=['30','30','30'])
df = user.groupby('User').apply(f)
df['sum'] = df.sum(axis=1)
print (df)
30 30 30 sum
User
101 1 2 1 4
102 2 2 2 6
对于30-30-40
:
user = user.sort_values(['User','Count'], ascending=[True, False])
def f(x):
#split to 3 groups, because 3 + 3 + 4 == 1
a, b, c = np.split(x, [int(.3*len(x)), int(.6*len(x))])
return pd.Series([len(a), len(b), len(c)], index=['30','30','40'])
df = user.groupby('User').apply(f)
df['sum'] = df.sum(axis=1)
print (df)
30 30 40 sum
User
101 1 2 2 5
102 2 2 3 7
编辑:
应通过列表理解创建组
:
def f(x):
a, b, c = np.split(x.index, [int(.3*len(x)), int(.6*len(x))])
L = [a,b,c]
return [i for i, y in zip(range(len(L),0,-1) ,L) for j in y]
user['Groups'] = user.groupby('User')['User'].transform(f)
print (user)
User Country Count Groups
0 101 India 85 3
1 101 Japan 78 2
2 101 India 70 2
6 101 Austria 60 1
5 101 UK 8 1
7 102 Japan 30 3
4 102 Japan 6 3
9 102 UK 6 2
3 102 Brazil 5 2
8 102 Singapore 5 1
10 102 UK 5 1
11 102 UK 4 1
请尝试
pd.cut()
(这是一个很好的文档)。如何根据每个组中的行数在原始数据集中获取组列?如果您看到我想要的输出,则有一个列名组。@KumarAK礼貌而不是苛求如何?耶斯雷尔是一名志愿者,他为你提供业余时间和专业知识。不要先说谢谢,你只会抱怨。谢谢你一直以来的支持jezrael@KumarAK-为组添加了解决方案。