Python 多列交叉表
我有一个带有名称、日期和位置的数据框。对于每个名称日位置三元组,我想知道具有该名称日的行在该位置所占的比例 在代码中,我从Python 多列交叉表,python,pandas,Python,Pandas,我有一个带有名称、日期和位置的数据框。对于每个名称日位置三元组,我想知道具有该名称日的行在该位置所占的比例 在代码中,我从df开始,寻找预期的 import pandas as pd df = pd.DataFrame( [ {"name": "Alice", "day": "friday", "location": "left"}, {"name": "Alice", "day": "friday", "location": "right"},
df
开始,寻找预期的
import pandas as pd
df = pd.DataFrame(
[
{"name": "Alice", "day": "friday", "location": "left"},
{"name": "Alice", "day": "friday", "location": "right"},
{"name": "Bob", "day": "monday", "location": "left"},
]
)
print(df)
expected = pd.DataFrame(
[
{"name": "Alice", "day": "friday", "location": "left", "row_percent": 50.0},
{"name": "Alice", "day": "friday", "location": "right", "row_percent": 50.0},
{"name": "Bob", "day": "monday", "location": "left", "row_percent": 100.0},
]
).set_index(['name', 'day', ])
print(expected)
印刷品:
In [13]: df
Out[13]:
day location name
0 friday left Alice
1 friday right Alice
2 monday left Bob
In [12]: expected
Out[12]:
location row_percent
name day
Alice friday left 50.0
friday right 50.0
Bob monday left 100.0
使用groupby
和value\u计数
:
df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)
对所需输出进行进一步清洁:
out = (df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)
.rename('row_percent').reset_index(2))
out = (df.groupby(['name', 'day']).location.value_counts(normalize=True).mul(100)
.rename('row_percent').reset_index(2))
location row_percent
name day
Alice friday left 50.0
friday right 50.0
Bob monday left 100.0
out == expected
location row_percent
name day
Alice friday True True
friday True True
Bob monday True True