Python 基于另一个数据帧中的字符串计数分布值

Python 基于另一个数据帧中的字符串计数分布值,python,python-3.x,pandas,python-2.7,dataframe,Python,Python 3.x,Pandas,Python 2.7,Dataframe,我希望按照以下州分配付款: 付款: cust_id name date amount 0 A Edward 2021-01-01 3000 1 B Henry 2021-01-01 5000 2 C Ferth 2021-02-01 1000 声明: cust_id contract_id state1 state2 state3 0 A 1 Ala

我希望按照以下州分配付款:

付款:

  cust_id    name       date  amount
0       A  Edward 2021-01-01    3000
1       B   Henry 2021-01-01    5000
2       C   Ferth 2021-02-01    1000
声明:

  cust_id  contract_id   state1    state2    state3
0       A            1  Alabama    Alaska   Arizona
1       A            2  Indiana   Alabama  Nebraska
2       B            3  Alabama       NaN   Arizona
3       C            4   Alaska  Nebraska       NaN
4       C            5      NaN     Maine  Nebraska
客户可能至少有一份合同,每份合同涵盖不同的州。每个状态都必须计数,在计算比率时,出现两次的状态将计数两次,以此类推。然后,比率将乘以金额,以获得每个州的分配金额

输出:

cust_id    name       date     state     ratio  amount
0       A  Edward 2021-01-01   Alabama  0.333333    1000
1       A  Edward 2021-01-01    Alaska  0.166667     500
2       A  Edward 2021-01-01   Arizona  0.166667     500
3       A  Edward 2021-01-01   Indiana  0.166667     500
4       A  Edward 2021-01-01  Nebraska  0.166667     500
5       B   Henry 2021-01-01   Alabama  0.500000    2500
6       B   Henry 2021-01-01   Arizona  0.500000    2500
7       C   Ferth 2021-02-01    Alaska  0.250000     250
8       C   Ferth 2021-02-01  Nebraska  0.500000     500
9       C   Ferth 2021-02-01     Maine  0.250000     250

这可以通过使用
df.melt
然后使用
df.groupby
value\u counts
以及
normalize=True
来实现,这样我们就可以为每个客户展平状态,并根据出现的数量,得到每个状态的pct份额。然后与付款数据框合并,最后将
金额
与pct共享相乘,得到新金额:

解决方案:

u = (state.melt(['cust_id','contract_id'],value_name='state')
    .groupby("cust_id")['state'].value_counts(normalize=True)
    .reset_index(name='ratio'))

out = payment.merge(u,on='cust_id')
out['new_amount'] = out['amount']*out['ratio']
print(out)

  cust_id    name        date  amount     state     ratio  new_amount
0       A  Edward  2021-01-01    3000   Alabama  0.333333      1000.0
1       A  Edward  2021-01-01    3000    Alaska  0.166667       500.0
2       A  Edward  2021-01-01    3000   Arizona  0.166667       500.0
3       A  Edward  2021-01-01    3000   Indiana  0.166667       500.0
4       A  Edward  2021-01-01    3000  Nebraska  0.166667       500.0
5       B   Henry  2021-01-01    5000   Alabama  0.500000      2500.0
6       B   Henry  2021-01-01    5000   Arizona  0.500000      2500.0
7       C   Ferth  2021-02-01    1000    Alaska  0.250000       250.0
8       C   Ferth  2021-02-01    1000     Maine  0.250000       250.0
9       C   Ferth  2021-02-01    1000  Nebraska  0.500000       500.0

输出:

u = (state.melt(['cust_id','contract_id'],value_name='state')
    .groupby("cust_id")['state'].value_counts(normalize=True)
    .reset_index(name='ratio'))

out = payment.merge(u,on='cust_id')
out['new_amount'] = out['amount']*out['ratio']
print(out)

  cust_id    name        date  amount     state     ratio  new_amount
0       A  Edward  2021-01-01    3000   Alabama  0.333333      1000.0
1       A  Edward  2021-01-01    3000    Alaska  0.166667       500.0
2       A  Edward  2021-01-01    3000   Arizona  0.166667       500.0
3       A  Edward  2021-01-01    3000   Indiana  0.166667       500.0
4       A  Edward  2021-01-01    3000  Nebraska  0.166667       500.0
5       B   Henry  2021-01-01    5000   Alabama  0.500000      2500.0
6       B   Henry  2021-01-01    5000   Arizona  0.500000      2500.0
7       C   Ferth  2021-02-01    1000    Alaska  0.250000       250.0
8       C   Ferth  2021-02-01    1000     Maine  0.250000       250.0
9       C   Ferth  2021-02-01    1000  Nebraska  0.500000       500.0

这可以通过使用
df.melt
然后使用
df.groupby
value\u counts
以及
normalize=True
来实现,这样我们就可以为每个客户展平状态,并根据出现的数量,得到每个状态的pct份额。然后与付款数据框合并,最后将
金额
与pct共享相乘,得到新金额:

解决方案:

u = (state.melt(['cust_id','contract_id'],value_name='state')
    .groupby("cust_id")['state'].value_counts(normalize=True)
    .reset_index(name='ratio'))

out = payment.merge(u,on='cust_id')
out['new_amount'] = out['amount']*out['ratio']
print(out)

  cust_id    name        date  amount     state     ratio  new_amount
0       A  Edward  2021-01-01    3000   Alabama  0.333333      1000.0
1       A  Edward  2021-01-01    3000    Alaska  0.166667       500.0
2       A  Edward  2021-01-01    3000   Arizona  0.166667       500.0
3       A  Edward  2021-01-01    3000   Indiana  0.166667       500.0
4       A  Edward  2021-01-01    3000  Nebraska  0.166667       500.0
5       B   Henry  2021-01-01    5000   Alabama  0.500000      2500.0
6       B   Henry  2021-01-01    5000   Arizona  0.500000      2500.0
7       C   Ferth  2021-02-01    1000    Alaska  0.250000       250.0
8       C   Ferth  2021-02-01    1000     Maine  0.250000       250.0
9       C   Ferth  2021-02-01    1000  Nebraska  0.500000       500.0

输出:

u = (state.melt(['cust_id','contract_id'],value_name='state')
    .groupby("cust_id")['state'].value_counts(normalize=True)
    .reset_index(name='ratio'))

out = payment.merge(u,on='cust_id')
out['new_amount'] = out['amount']*out['ratio']
print(out)

  cust_id    name        date  amount     state     ratio  new_amount
0       A  Edward  2021-01-01    3000   Alabama  0.333333      1000.0
1       A  Edward  2021-01-01    3000    Alaska  0.166667       500.0
2       A  Edward  2021-01-01    3000   Arizona  0.166667       500.0
3       A  Edward  2021-01-01    3000   Indiana  0.166667       500.0
4       A  Edward  2021-01-01    3000  Nebraska  0.166667       500.0
5       B   Henry  2021-01-01    5000   Alabama  0.500000      2500.0
6       B   Henry  2021-01-01    5000   Arizona  0.500000      2500.0
7       C   Ferth  2021-02-01    1000    Alaska  0.250000       250.0
8       C   Ferth  2021-02-01    1000     Maine  0.250000       250.0
9       C   Ferth  2021-02-01    1000  Nebraska  0.500000       500.0

请给我们看一些代码作为起点好吗?请给我们看一些代码作为起点好吗?非常感谢!这就是我一直在寻找的答案!非常感谢你!这就是我一直在寻找的答案!