如何在python中通过groupby结果执行函数?

如何在python中通过groupby结果执行函数?,python,csv,pandas,Python,Csv,Pandas,我使用这段代码来计算每个集群中每个用户的不同质量度量值 >>> for name, group in df.groupby(["Cluster_id", "User"]): ... print 'group name:', name ... print 'group rows:' ... print group ... print 'counts of Quality values:' ... print group["Quality"]

我使用这段代码来计算每个集群中每个用户的不同质量度量值

>>> for name, group in df.groupby(["Cluster_id", "User"]):
...     print 'group name:', name
...     print 'group rows:'
...     print group
...     print 'counts of Quality values:'
...     print group["Quality"].value_counts()
...     raw_input()
...     
但是现在我得到的输出是

group rows:
                tag                       user                    quality  cluster
676    black fabric  http://steve.nl/user_1002          usefulness-useful        1
708      blond wood  http://steve.nl/user_1002          usefulness-useful        1
709      blond wood  http://steve.nl/user_1002    problematic-misspelling        1
1410         eames?  http://steve.nl/user_1002      usefulness-not_useful        1
1411         eames?  http://steve.nl/user_1002  problematic-misperception        1
3649  rocking chair  http://steve.nl/user_1002          usefulness-useful        1
3650  rocking chair  http://steve.nl/user_1002  problematic-misperception        1
counts of Quality Values:
usefulness-useful            3
problematic-misperception    2
usefulness-not_useful        1
problematic-misspelling      1
我现在想做的是检查条件,即:

if quality==usefulness-useful:
 good = good + 1
else:
 bad = bad + 1
我尝试编写输出:

counts of Quality Values:
usefulness-useful            3
problematic-misperception    2
usefulness-not_useful        1
problematic-misspelling      1

并尝试逐行遍历变量,但无效。有人能给我一些建议,关于如何在某些行上进行计算。

一旦你有了一个组,你可以使用
.iterrows()
方法逐行迭代。它为您提供行索引和行本身:

In [33]: for row_number, row in group.iterrows():
   ....:     print row_number
   ....:     print row
   ....:     
676
Tag                        black fabric
User          http://steve.nl/user_1002
Quality               usefulness-useful
Cluster_id                            1
Name: 676
708
Tag                          blond wood
User          http://steve.nl/user_1002
Quality               usefulness-useful
Cluster_id                            1
Name: 708
[etc]
这些行中的每一行都可以像字典一样编入索引,例如:

In [48]: row
Out[48]: 
Tag                       rocking chair
User          http://steve.nl/user_1002
Quality       problematic-misperception
Cluster_id                            1
Name: 3650

In [49]: row["User"]
Out[49]: 'http://steve.nl/user_1002'

In [50]: row["Tag"]
Out[50]: 'rocking chair'
所以你可以像这样写你的循环

good = 0
bad = 0
for row_number, row in group.iterrows():
    if row['Quality'] == 'usefulness-useful':
        good += 1
    else:
        bad += 1
print 'good', good, 'bad', bad

good 3 bad 4
如果这对你有意义的话,这是一个非常好的方法。另一种方法是直接从
质量
列中的计数进行计算:

In [54]: counts = group["Quality"].value_counts()

In [55]: counts
Out[55]: 
usefulness-useful            3
problematic-misperception    2
usefulness-not_useful        1
problematic-misspelling      1

In [56]: counts['usefulness-useful']
Out[56]: 3
既然坏=总-好,我们有

In [57]: counts.sum() - counts['usefulness-useful']
Out[57]: 4

您能否将
df.head().的输出添加到您的问题中,以便其他人可以使用您的数据进行故障排除?您好,非常感谢。第一种方法非常有效。但在第二种方法中,我认为应该有一个检查条件,如
if exists counts['usivery-usivery']
,否则,它会在计算中显示错误。您还可以告诉我如何将其作为csv文件中的一行写入输出。例如:集群、用户、坏、,good@user1992696:啊,接得好。您可以改为使用counts.get('usivery-usivery',0)
,这做了同样的事情,但是如果没有与键
'usivery-usivery'
相关联的值,它将给出0。