Python 在Pandas中,如何计算给定另一列值的列值的相对概率?
我有两个数据框,Python 在Pandas中,如何计算给定另一列值的列值的相对概率?,python,pandas,Python,Pandas,我有两个数据框,车辆和伤亡,每个数据框都有一个公共列事故索引: import pandas as pd vehicles = pd.DataFrame({'Accident_Index': [1, 1, 2, 3, 3, 4, 4], 'Vehicle_Type': ['car', 'car', 'motorcyle', 'car', 'car', 'car', 'car'], 'Sex_Dr
车辆
和伤亡
,每个数据框都有一个公共列事故索引
:
import pandas as pd
vehicles = pd.DataFrame({'Accident_Index': [1, 1, 2, 3, 3, 4, 4],
'Vehicle_Type': ['car', 'car', 'motorcyle', 'car', 'car', 'car', 'car'],
'Sex_Driver': ['male', 'female', 'male', 'female', 'female', 'male', 'male']})
casualties = pd.DataFrame({'Accident_Index': [1, 1, 2, 3, 4],
'Casualty_Severity': ['fatal', 'serious', 'fatal', 'light', 'fatal']})
为便于可视化,以下是车辆
:
Accident_Index Sex_Driver Vehicle_Type
0 1 male car
1 1 female car
2 2 male motorcyle
3 3 female car
4 3 female car
5 4 male car
6 4 male car
Accident_Index Casualty_Severity
0 1 fatal
1 1 serious
2 2 fatal
3 3 light
4 4 fatal
这里是伤亡人数:
Accident_Index Sex_Driver Vehicle_Type
0 1 male car
1 1 female car
2 2 male motorcyle
3 3 female car
4 3 female car
5 4 male car
6 4 male car
Accident_Index Casualty_Severity
0 1 fatal
1 1 serious
2 2 fatal
3 3 light
4 4 fatal
我想计算一下,与涉及女性司机的事故相比,涉及男性司机的事故导致死亡的可能性要高出多少倍
到目前为止,我提出了以下解决方案:
dfm = casualties.merge(vehicles, on='Accident_Index')
dfm_cars = dfm.loc[dfm.Vehicle_Type == 'car']
dfm_cars_fatal_male = dfm_cars.isin({'Casualty_Severity': ['fatal'], 'Sex_Driver': ['male']})
male_driver_involved_in_fatal_car_accident = (dfm_cars_fatal_male['Casualty_Severity'] & dfm_cars_fatal_male['Sex_Driver']).sum()
dfm_cars_fatal_female = dfm_cars.isin({'Casualty_Severity': ['fatal'], 'Sex_Driver': ['female']})
female_driver_involved_in_fatal_car_accident = (dfm_cars_fatal_female['Casualty_Severity'] & dfm_cars_fatal_female['Sex_Driver']).sum()
print(male_driver_involved_in_fatal_car_accident / female_driver_involved_in_fatal_car_accident)
在这种情况下,答案是3
,因为有两起车祸死亡,一起涉及一名男性和一名女性驾驶员,另一起涉及两名男性驾驶员
然而,这段代码似乎并不特别简洁。我如何重构它?IIUC,您可以使用
merge
+query
+groupby
:
g = casualties.merge(vehicles, on='Accident_Index')\
.query("Vehicle_Type == 'car' and Casualty_Severity == 'fatal'")\
.groupby('Sex_Driver').Sex_Driver.count()
g / g.sum()
Sex_Driver
female 0.25
male 0.75
Name: Sex_Driver, dtype: float64
为了简化此操作,您可以使用变量进行查询:
vehicle = 'car'
severity = 'fatal'
然后,您可以将查询
步骤重写为:
query("Vehicle_Type == @vehicle and Casualty_Severity == @severity")
如果您想(比如)将代码放入函数中,并针对各种输入组合进行测试,那么重用代码就更容易了。这一点并不含糊。两个相同的事故指数具有不同的伤亡严重程度?Bharath,这意味着在涉及两辆车的单一事故中有两名伤亡人员(例如,每辆车的驾驶员)。您如何在车辆数据中映射他们?第一个
1
是致命的,第二个1
是严重的?像那样。因为在合并时会有重复的行。这对你合适吗?