Python-Pandas,将变量长度列表聚合到一个整洁的数据集中

Python-Pandas,将变量长度列表聚合到一个整洁的数据集中,python,pandas,dataset,Python,Pandas,Dataset,我有以下数据框,每一行都是事件名称的字符串: 0 event_1 1 other_event 2 other_event, other_event, other_event, other_e... 3 event_3, other_event, other_event, other_event... 4

我有以下数据框,每一行都是事件名称的字符串:

0                                              event_1
1                                          other_event
2    other_event, other_event, other_event, other_e...
3    event_3, other_event, other_event, other_event...
4                              some_event, other_event
5    event_1, event_5, some_event, some_event, some...
6                        event_5, event_6, other_event
7                                              event_1
我想拆分每一行,按事件名称聚合,并创建一个整洁的数据集,如下所示:

+---+--------+------------+--------+-----------+--------+--------+
|id |event_1 |other_event |event_3 |some_event |event_5 |event_6 |
+---+--------+------------+--------+-----------+--------+--------+
|0  |1       |0           |0       |0          |0       |0       |
+---+--------+------------+--------+-----------+--------+--------+
|1  |0       |1           |0       |0          |0       |0       |
+---+--------+------------+--------+-----------+--------+--------+
|2  |0       |4           |0       |0          |0       |0       |
+---+--------+------------+--------+-----------+--------+--------+
|3  |0       |3           |1       |0          |0       |0       |
+---+--------+------------+--------+-----------+--------+--------+
|4  |0       |1           |0       |1          |0       |0       |
+---+--------+------------+--------+-----------+--------+--------+
|5  |1       |0           |0       |3          |1       |0       |
+---+--------+------------+--------+-----------+--------+--------+
|6  |0       |1           |0       |0          |1       |1       |
+---+--------+------------+--------+-----------+--------+--------+
|7  |1       |0           |0       |0          |0       |0       |
+---+--------+------------+--------+-----------+--------+--------+

我曾经使用过
df[“events_array”].str.split(“,”
),但是被卡住了,任何帮助都会被显示出来:)

第一个想法是在列表字典的列表理解中使用
计数器
,并传递到
数据帧
构造函数,替换缺少的值并转换为整数:

from collections import Counter

df = pd.DataFrame([Counter(x.split(", ")) for x in df["events_array"]]).fillna(0).astype(int)
print (df)
   event_1  other_event  event_3  some_event  event_5  event_6
0        1            0        0           0        0        0
1        0            1        0           0        0        0
2        0            4        0           0        0        0
3        0            3        1           0        0        0
4        0            1        0           1        0        0
5        1            0        0           3        1        0
6        0            1        0           0        1        1
7        1            0        0           0        0        0
或者可以通过和
expand=True
创建数据帧,然后通过
value\u counts
apply
中的每行进行计数:

df = (df["events_array"].str.split(', ', expand=True)
                        .apply(pd.value_counts, 1)
                        .fillna(0)
                        .astype(int)
                        )
print (df)
   event_1  event_3  event_5  event_6  other_event  some_event
0        1        0        0        0            0           0
1        0        0        0        0            1           0
2        0        0        0        0            4           0
3        0        1        0        0            3           0
4        0        0        0        0            1           1
5        1        0        1        0            0           3
6        0        0        1        1            1           0
7        1        0        0        0            0           0

谢谢,这正是我想要的:)你能看看这里吗:?