Python 基于另一个数据帧中的字符串计数分布值
我希望按照以下州分配付款: 付款:Python 基于另一个数据帧中的字符串计数分布值,python,python-3.x,pandas,python-2.7,dataframe,Python,Python 3.x,Pandas,Python 2.7,Dataframe,我希望按照以下州分配付款: 付款: cust_id name date amount 0 A Edward 2021-01-01 3000 1 B Henry 2021-01-01 5000 2 C Ferth 2021-02-01 1000 声明: cust_id contract_id state1 state2 state3 0 A 1 Ala
cust_id name date amount
0 A Edward 2021-01-01 3000
1 B Henry 2021-01-01 5000
2 C Ferth 2021-02-01 1000
声明:
cust_id contract_id state1 state2 state3
0 A 1 Alabama Alaska Arizona
1 A 2 Indiana Alabama Nebraska
2 B 3 Alabama NaN Arizona
3 C 4 Alaska Nebraska NaN
4 C 5 NaN Maine Nebraska
客户可能至少有一份合同,每份合同涵盖不同的州。每个状态都必须计数,在计算比率时,出现两次的状态将计数两次,以此类推。然后,比率将乘以金额,以获得每个州的分配金额
输出:
cust_id name date state ratio amount
0 A Edward 2021-01-01 Alabama 0.333333 1000
1 A Edward 2021-01-01 Alaska 0.166667 500
2 A Edward 2021-01-01 Arizona 0.166667 500
3 A Edward 2021-01-01 Indiana 0.166667 500
4 A Edward 2021-01-01 Nebraska 0.166667 500
5 B Henry 2021-01-01 Alabama 0.500000 2500
6 B Henry 2021-01-01 Arizona 0.500000 2500
7 C Ferth 2021-02-01 Alaska 0.250000 250
8 C Ferth 2021-02-01 Nebraska 0.500000 500
9 C Ferth 2021-02-01 Maine 0.250000 250
这可以通过使用
df.melt
然后使用df.groupby
和value\u counts
以及normalize=True
来实现,这样我们就可以为每个客户展平状态,并根据出现的数量,得到每个状态的pct份额。然后与付款数据框合并,最后将金额
与pct共享相乘,得到新金额:
解决方案:
u = (state.melt(['cust_id','contract_id'],value_name='state')
.groupby("cust_id")['state'].value_counts(normalize=True)
.reset_index(name='ratio'))
out = payment.merge(u,on='cust_id')
out['new_amount'] = out['amount']*out['ratio']
print(out)
cust_id name date amount state ratio new_amount
0 A Edward 2021-01-01 3000 Alabama 0.333333 1000.0
1 A Edward 2021-01-01 3000 Alaska 0.166667 500.0
2 A Edward 2021-01-01 3000 Arizona 0.166667 500.0
3 A Edward 2021-01-01 3000 Indiana 0.166667 500.0
4 A Edward 2021-01-01 3000 Nebraska 0.166667 500.0
5 B Henry 2021-01-01 5000 Alabama 0.500000 2500.0
6 B Henry 2021-01-01 5000 Arizona 0.500000 2500.0
7 C Ferth 2021-02-01 1000 Alaska 0.250000 250.0
8 C Ferth 2021-02-01 1000 Maine 0.250000 250.0
9 C Ferth 2021-02-01 1000 Nebraska 0.500000 500.0
输出:
u = (state.melt(['cust_id','contract_id'],value_name='state')
.groupby("cust_id")['state'].value_counts(normalize=True)
.reset_index(name='ratio'))
out = payment.merge(u,on='cust_id')
out['new_amount'] = out['amount']*out['ratio']
print(out)
cust_id name date amount state ratio new_amount
0 A Edward 2021-01-01 3000 Alabama 0.333333 1000.0
1 A Edward 2021-01-01 3000 Alaska 0.166667 500.0
2 A Edward 2021-01-01 3000 Arizona 0.166667 500.0
3 A Edward 2021-01-01 3000 Indiana 0.166667 500.0
4 A Edward 2021-01-01 3000 Nebraska 0.166667 500.0
5 B Henry 2021-01-01 5000 Alabama 0.500000 2500.0
6 B Henry 2021-01-01 5000 Arizona 0.500000 2500.0
7 C Ferth 2021-02-01 1000 Alaska 0.250000 250.0
8 C Ferth 2021-02-01 1000 Maine 0.250000 250.0
9 C Ferth 2021-02-01 1000 Nebraska 0.500000 500.0
这可以通过使用
df.melt
然后使用df.groupby
和value\u counts
以及normalize=True
来实现,这样我们就可以为每个客户展平状态,并根据出现的数量,得到每个状态的pct份额。然后与付款数据框合并,最后将金额
与pct共享相乘,得到新金额:
解决方案:
u = (state.melt(['cust_id','contract_id'],value_name='state')
.groupby("cust_id")['state'].value_counts(normalize=True)
.reset_index(name='ratio'))
out = payment.merge(u,on='cust_id')
out['new_amount'] = out['amount']*out['ratio']
print(out)
cust_id name date amount state ratio new_amount
0 A Edward 2021-01-01 3000 Alabama 0.333333 1000.0
1 A Edward 2021-01-01 3000 Alaska 0.166667 500.0
2 A Edward 2021-01-01 3000 Arizona 0.166667 500.0
3 A Edward 2021-01-01 3000 Indiana 0.166667 500.0
4 A Edward 2021-01-01 3000 Nebraska 0.166667 500.0
5 B Henry 2021-01-01 5000 Alabama 0.500000 2500.0
6 B Henry 2021-01-01 5000 Arizona 0.500000 2500.0
7 C Ferth 2021-02-01 1000 Alaska 0.250000 250.0
8 C Ferth 2021-02-01 1000 Maine 0.250000 250.0
9 C Ferth 2021-02-01 1000 Nebraska 0.500000 500.0
输出:
u = (state.melt(['cust_id','contract_id'],value_name='state')
.groupby("cust_id")['state'].value_counts(normalize=True)
.reset_index(name='ratio'))
out = payment.merge(u,on='cust_id')
out['new_amount'] = out['amount']*out['ratio']
print(out)
cust_id name date amount state ratio new_amount
0 A Edward 2021-01-01 3000 Alabama 0.333333 1000.0
1 A Edward 2021-01-01 3000 Alaska 0.166667 500.0
2 A Edward 2021-01-01 3000 Arizona 0.166667 500.0
3 A Edward 2021-01-01 3000 Indiana 0.166667 500.0
4 A Edward 2021-01-01 3000 Nebraska 0.166667 500.0
5 B Henry 2021-01-01 5000 Alabama 0.500000 2500.0
6 B Henry 2021-01-01 5000 Arizona 0.500000 2500.0
7 C Ferth 2021-02-01 1000 Alaska 0.250000 250.0
8 C Ferth 2021-02-01 1000 Maine 0.250000 250.0
9 C Ferth 2021-02-01 1000 Nebraska 0.500000 500.0
请给我们看一些代码作为起点好吗?请给我们看一些代码作为起点好吗?非常感谢!这就是我一直在寻找的答案!非常感谢你!这就是我一直在寻找的答案!