Python 按应用于表中相同列的条件计数_Python_Pandas_Count

Python 按应用于表中相同列的条件计数

python pandas

Python 按应用于表中相同列的条件计数,python,pandas,count,Python,Pandas,Count,这是我的数据框 acc_index veh_count veh_type 001 1 1 002 2 1 002 2 2 003 2 1 003 2 2 004 1 1 005 2

这是我的数据框

acc_index    veh_count    veh_type
001             1            1
002             2            1
002             2            2
003             2            1
003             2            2
004             1            1
005             2            1
005             2            3
006             1            2
007             2            1
007             2            2
008             2            1
008             2            1
009             3            1
009             3            1
009             3            2

acc_指数对于每次事故都是唯一的

车辆计数显示一次事故涉及多少辆车

veh_type显示事故中涉及的车辆类型（1=自行车，2=汽车，3=公共汽车）

我想做的是计算汽车和自行车之间的事故数量（因此，对于相同的acc_指数，其中veh_type=1和veh_type=9），即使涉及更多的汽车或自行车，我仍然想将其计算为一次事故。我该怎么做

我试着用下面的代码来做，但是我得到了所有涉及汽车或自行车的事故的计数，我只想得到它们之间的事故

df[(df['veh_count'] >=2) & (df.veh_type.isin(['1','2']))].groupby(['acc_index', 'veh_count', 'veh_type']).count()

我想得到下面这样的东西，但也包括整个数据帧，而不仅仅是总和

acc_index    veh_count    veh_type     count
002             2            1           
002             2            2
                           count         1
003             2            1
003             2            2
                           count         1
007             2            1
007             2            2
                           count         1
009             3            1
009             3            1
009             3            2
                           count         1
                        total_count      4

如果您有更好的解决方案/想法，我将不胜感激。

IIUC，您可以查看

veh_type

，了解感兴趣的问题和分组方式：

(df.assign(car=df.veh_type.eq(1),
          bike=df.veh_type.eq(2))  # change 2 to correct type
   [['acc_index','car','bike']]
   .groupby('acc_index')
   .any()
   .all(1).sum()
)

输出：

    acc_index  veh_count  veh_type
1           2          2         1
2           2          2         2
3           3          2         1
4           3          2         2
9           7          2         1
10          7          2         2
13          9          3         1
14          9          3         1
15          9          3         2

更新：如果需要所有行：

s = (df.assign(car=df.veh_type.eq(1),
          bike=df.veh_type.eq(2))  # change 2 to correct type
   [['acc_index','car','bike']]
   .groupby('acc_index')
   .any()
   .all(1)
)

df[df['acc_index'].map(s)]

输出：

    acc_index  veh_count  veh_type
1           2          2         1
2           2          2         2
3           3          2         1
4           3          2         2
9           7          2         1
10          7          2         2
13          9          3         1
14          9          3         1
15          9          3         2

太好了，谢谢！如果我还想列出数据帧的所有4行，我该怎么做呢？另一种方法是：

pd.get_dummies（df.veh_type）[[1,2]].groupby（df.acc_index）.any（）.all（1.sum（）

或：

df.groupby（'acc_index'）['veh_type']）。apply（lambda g:not（{1,2}-{g}））.sum（）

@piRSquared谢谢你的代码，两个都能用！第二个要快得多，你知道我上面评论的答案吗？另外，如果我想统计和列出所有自行车或汽车事故，我该怎么做？使用@piRSquared和transform

df[df.groupby（'acc_index'）['veh_type']].transform（lambda g:not（{1,2}-{*g}））

？