Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/asp.net-mvc-3/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在列中对集合类型中的值进行计数_Python_Pandas - Fatal编程技术网

Python 在列中对集合类型中的值进行计数

Python 在列中对集合类型中的值进行计数,python,pandas,Python,Pandas,在图像上有一个如下所示的数据帧 df = pd.DataFrame({'bus':[{268},{23,200,268},{24},{24},{200,268}], 'problem':["Driver Issues" ,"Driver Issues" , "Timing Problem","Routing", "Timing Problem"]}) “总线”列表示总线号,“问题”列包含有关总线的投诉。在总线列中,任何行都可以有一个或多个总线号 我试图数一数每一辆公交车的车号,以及最常见的问题

在图像上有一个如下所示的数据帧

df = pd.DataFrame({'bus':[{268},{23,200,268},{24},{24},{200,268}],
'problem':["Driver Issues" ,"Driver Issues" , "Timing Problem","Routing",
"Timing Problem"]})
“总线”列表示总线号,“问题”列包含有关总线的投诉。在总线列中,任何行都可以有一个或多个总线号

我试图数一数每一辆公交车的车号,以及最常见的问题/问题/投诉。。查找最常见的公交车号码及其最常见的投诉

但由于集合类型的原因,无法使用计数器func

输出可以如下所示:

df2 = pd.DataFrame({'busses':["268","24","200","23"],
'ComplainFrequency':["3" ,"2" , "2","1"]})


首先将集合展平到新的
数据帧

df1 = pd.DataFrame([(c, b) for a, b in zip(df['bus'], df['problem']) for c in a], 
                    columns=['bus','problem'])
print (df1)
   bus         problem
0  268   Driver Issues
1  200   Driver Issues
2  268   Driver Issues
3   23   Driver Issues
4   24  Timing Problem
5   24         Routing
6  200  Timing Problem
7  268  Timing Problem
如果存在带有
的字符串值集,则需要进行双重展平:

df = pd.DataFrame({'bus':[{'268'},{'23,200,268'},{'24'},{'24'},{'200,268'}], 
                   'problem':["Driver Issues" ,"Driver Issues" , "Timing Problem",
                              "Routing","Timing Problem"]})

print (df)
            bus         problem
0         {268}   Driver Issues
1  {23,200,268}   Driver Issues
2          {24}  Timing Problem
3          {24}         Routing
4     {200,268}  Timing Problem

df1 = pd.DataFrame([(d, b) for a, b in zip(df['bus'], df['problem']) 
                           for c in a 
                           for d in c.split(',')], 
                    columns=['bus','problem'])

print (df1)
   bus         problem
0  268   Driver Issues
1   23   Driver Issues
2  200   Driver Issues
3  268   Driver Issues
4   24  Timing Problem
5   24         Routing
6  200  Timing Problem
7  268  Timing Problem
然后使用:


我编辑了帖子,而不是图片和链接添加您的数据帧和所需的输出光。我编辑帖子是为了澄清问题,您能再看一遍吗help@nous-请现在检查。
df = pd.DataFrame({'bus':[{'268'},{'23,200,268'},{'24'},{'24'},{'200,268'}], 
                   'problem':["Driver Issues" ,"Driver Issues" , "Timing Problem",
                              "Routing","Timing Problem"]})

print (df)
            bus         problem
0         {268}   Driver Issues
1  {23,200,268}   Driver Issues
2          {24}  Timing Problem
3          {24}         Routing
4     {200,268}  Timing Problem

df1 = pd.DataFrame([(d, b) for a, b in zip(df['bus'], df['problem']) 
                           for c in a 
                           for d in c.split(',')], 
                    columns=['bus','problem'])

print (df1)
   bus         problem
0  268   Driver Issues
1   23   Driver Issues
2  200   Driver Issues
3  268   Driver Issues
4   24  Timing Problem
5   24         Routing
6  200  Timing Problem
7  268  Timing Problem
df2 = df1.groupby('bus')['problem'].size().reset_index(name='ComplainFrequency')
print (df2)
   bus  ComplainFrequency
0  200                  2
1   23                  1
2   24                  2
3  268                  3

df3 = df1.groupby(['bus','problem']).size().reset_index(name='Coplains')
print (df3)
   bus         problem  Coplains
0  200   Driver Issues         1
1  200  Timing Problem         1
2   23   Driver Issues         1
3   24         Routing         1
4   24  Timing Problem         1
5  268   Driver Issues         2
6  268  Timing Problem         1