Python 在列中对集合类型中的值进行计数
在图像上有一个如下所示的数据帧Python 在列中对集合类型中的值进行计数,python,pandas,Python,Pandas,在图像上有一个如下所示的数据帧 df = pd.DataFrame({'bus':[{268},{23,200,268},{24},{24},{200,268}], 'problem':["Driver Issues" ,"Driver Issues" , "Timing Problem","Routing", "Timing Problem"]}) “总线”列表示总线号,“问题”列包含有关总线的投诉。在总线列中,任何行都可以有一个或多个总线号 我试图数一数每一辆公交车的车号,以及最常见的问题
df = pd.DataFrame({'bus':[{268},{23,200,268},{24},{24},{200,268}],
'problem':["Driver Issues" ,"Driver Issues" , "Timing Problem","Routing",
"Timing Problem"]})
“总线”列表示总线号,“问题”列包含有关总线的投诉。在总线列中,任何行都可以有一个或多个总线号
我试图数一数每一辆公交车的车号,以及最常见的问题/问题/投诉。。查找最常见的公交车号码及其最常见的投诉
但由于集合类型的原因,无法使用计数器func
输出可以如下所示:
df2 = pd.DataFrame({'busses':["268","24","200","23"],
'ComplainFrequency':["3" ,"2" , "2","1"]})
及
首先将集合展平到新的
数据帧
:
df1 = pd.DataFrame([(c, b) for a, b in zip(df['bus'], df['problem']) for c in a],
columns=['bus','problem'])
print (df1)
bus problem
0 268 Driver Issues
1 200 Driver Issues
2 268 Driver Issues
3 23 Driver Issues
4 24 Timing Problem
5 24 Routing
6 200 Timing Problem
7 268 Timing Problem
如果存在带有,
的字符串值集,则需要进行双重展平:
df = pd.DataFrame({'bus':[{'268'},{'23,200,268'},{'24'},{'24'},{'200,268'}],
'problem':["Driver Issues" ,"Driver Issues" , "Timing Problem",
"Routing","Timing Problem"]})
print (df)
bus problem
0 {268} Driver Issues
1 {23,200,268} Driver Issues
2 {24} Timing Problem
3 {24} Routing
4 {200,268} Timing Problem
df1 = pd.DataFrame([(d, b) for a, b in zip(df['bus'], df['problem'])
for c in a
for d in c.split(',')],
columns=['bus','problem'])
print (df1)
bus problem
0 268 Driver Issues
1 23 Driver Issues
2 200 Driver Issues
3 268 Driver Issues
4 24 Timing Problem
5 24 Routing
6 200 Timing Problem
7 268 Timing Problem
然后使用:
我编辑了帖子,而不是图片和链接添加您的数据帧和所需的输出光。我编辑帖子是为了澄清问题,您能再看一遍吗help@nous-请现在检查。
df = pd.DataFrame({'bus':[{'268'},{'23,200,268'},{'24'},{'24'},{'200,268'}],
'problem':["Driver Issues" ,"Driver Issues" , "Timing Problem",
"Routing","Timing Problem"]})
print (df)
bus problem
0 {268} Driver Issues
1 {23,200,268} Driver Issues
2 {24} Timing Problem
3 {24} Routing
4 {200,268} Timing Problem
df1 = pd.DataFrame([(d, b) for a, b in zip(df['bus'], df['problem'])
for c in a
for d in c.split(',')],
columns=['bus','problem'])
print (df1)
bus problem
0 268 Driver Issues
1 23 Driver Issues
2 200 Driver Issues
3 268 Driver Issues
4 24 Timing Problem
5 24 Routing
6 200 Timing Problem
7 268 Timing Problem
df2 = df1.groupby('bus')['problem'].size().reset_index(name='ComplainFrequency')
print (df2)
bus ComplainFrequency
0 200 2
1 23 1
2 24 2
3 268 3
df3 = df1.groupby(['bus','problem']).size().reset_index(name='Coplains')
print (df3)
bus problem Coplains
0 200 Driver Issues 1
1 200 Timing Problem 1
2 23 Driver Issues 1
3 24 Routing 1
4 24 Timing Problem 1
5 268 Driver Issues 2
6 268 Timing Problem 1