Python Pearson算法从编程到集体智能仍然不起作用_Python_Algorithm_Pearson

Python Pearson算法从编程到集体智能仍然不起作用

python algorithm

Python Pearson算法从编程到集体智能仍然不起作用,python,algorithm,pearson,Python,Algorithm,Pearson,我运行代码来计算皮尔逊相关系数，函数（粘贴在下面）顽固地返回0 根据前面关于这个问题的建议（参见下面的#1，#2），我确实确保了函数能够执行浮点计算，但这没有帮助。我希望能得到一些指导 from __future__ import division from math import sqrt def sim_pearson(prefs,p1,p2): # Get the list of mutually rated items si={}

我运行代码来计算皮尔逊相关系数，函数（粘贴在下面）顽固地返回0

根据前面关于这个问题的建议（参见下面的#1，#2），我确实确保了函数能够执行浮点计算，但这没有帮助。我希望能得到一些指导

    from __future__ import division
    from math import sqrt

    def sim_pearson(prefs,p1,p2):
    # Get the list of mutually rated items
       si={}
       for item in prefs[p1]:
          if item in prefs[p2]: si[item]=1


          # Find the number of elements
          n=float(len(si))


          # if they are no ratings in common, return 0
          if n==0: return 0


          # Add up all the preferences
          sum1=float(sum([prefs[p1][it] for it in si]))
          sum2=float(sum([prefs[p2][it] for it in si]))

          # Sum up the squares
          sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
          sum2Sq=sum([pow(prefs[p2][it],2) for it in si])

          # Sum up the products
          pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])


          # Calculate Pearson score
          num=pSum-(1.0*sum1*sum2/n)
          den=sqrt((sum1Sq-1.0*pow(sum1,2)/n)*(sum2Sq-1.0*pow(sum2,2)/n))
          if den==0: return 0

          r=num/den

          return r

我的数据集：

 # A dictionary of movie critics and their ratings of a small
 # set of movies

 critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
       'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
       'The Night Listener': 3.0},
     'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
      'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
      'You, Me and Dupree': 3.5},
     'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
      'Superman Returns': 3.5, 'The Night Listener': 4.0},
     'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
      'The Night Listener': 4.5, 'Superman Returns': 4.0,
      'You, Me and Dupree': 2.5},
     'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
      'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
      'You, Me and Dupree': 2.0},
     'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
      'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
     'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

其他类似问题：

链接#1：
链接#2：

感谢大家在评论中的帮助，我发现了问题所在。只是开玩笑。有很多问题。最后，我注意到for循环没有折叠（在第6行），它需要折叠。在结束前的一个阶段，我疯狂地将一切都包围在

float

，对不起。不管怎样，你想要花车。在此之前，他没有为批评者引用

keys（）

，这是他需要的。此外，皮尔逊系数计算错误，以至于需要数学家来修正（我有数学学士学位）。现在，他为吉恩·西摩和丽莎·罗斯测试的例子正确地工作了。无论如何，请将其另存为

pearson.py

，或其他内容：

from __future__ import division
from math import sqrt

def sim_pearson(prefs,p1,p2):
# Get the list of mutually rated items
   si={}
   for item in prefs[p1].keys():
      for item in prefs[p2].keys():
         if item in prefs[p2].keys():
            si[item]=1


      # Find the number of elements
      n=float(len(si))


      # if they are no ratings in common, return 0
      if n==0:
         print 'n=0'
         return 0


      # Add up all the preferences
      sum1=float(sum([prefs[p1][it] for it in si.keys()]))
      sum2=float(sum([prefs[p2][it] for it in si.keys()]))
      print 'sum1=', sum1, 'sum2=', sum2
      # Sum up the squares
      sum1Sq=float(sum([pow(prefs[p1][it],2) for it in si.keys()]))
      sum2Sq=float(sum([pow(prefs[p2][it],2) for it in si.keys()]))
      print 'sum1s=', sum1Sq, 'sum2s=', sum2Sq
      # Sum up the products
      pSum=float(sum([prefs[p1][it]*prefs[p2][it] for it in si.keys()]))


      # Calculate Pearson score
      num=(pSum/n)-(1.0*sum1*sum2/pow(n,2))
      den=sqrt(((sum1Sq/n)-float(pow(sum1,2))/float(pow(n,2)))*((sum2Sq/n)-float(pow(sum2,2))/float(pow(n,2))))
      if den==0:
         print 'den=0'
         return 0

      r=num/den

      return r

critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
   'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
   'The Night Listener': 3.0},
 'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
  'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
  'You, Me and Dupree': 3.5},
 'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
  'Superman Returns': 3.5, 'The Night Listener': 4.0},
 'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
  'The Night Listener': 4.5, 'Superman Returns': 4.0,
                                                                                                                                                             1,1

然后，键入：

import pearson
pearson.sim_pearson(pearson.critics, pearson.critics.keys()[1], pearson.critics.keys()[2])

或者简单地说：

import pearson
pearson.sim_pearson(pearson.critics, 'Lisa Rose', 'Gene Seymour')

如果你在使用上有任何问题，请告诉我。我留下了我用来排除故障的

print

语句，只是为了让您看看我是如何解决的，但显然不需要它们

如果你在这本书中遇到了更多的问题，而你无法解决它们，那么在SO的帮助下，给我发电子邮件：raphael[at]postacle.com，我应该可以给你回复。我刚才也下载了，只是有点懒；）

@rofls是对的-

For循环：我的For循环是一个问题。Floats：需要从int类型转换为float类型的一些术语
钥匙：这是我第一次肯定会错过的东西，@rofls被抓住了
皮尔逊系数：我收紧了皮尔逊系数的分母部分，使其与页面上的表达式对齐；这是“数学属性”部分下的最后一个表达式

代码现在可以工作了。我尝试了不同的输入组合

    from __future__ import division
    from math import sqrt

    def sim_pearson(prefs,p1,p2):
    # Get the list of mutually rated items
       si={}
       for item in prefs[p1].keys():
           if item in prefs[p2].keys():
               print 'item=', item
               si[item]=1

       # Find the number of elements
       n=float(len(si))
       print 'n=', n

       # if they are no ratings in common, return 0
       if n==0:
           print 'n=0'
           return 0

       # Add up all the preferences
       sum1=float(sum([prefs[p1][it] for it in si.keys()]))
       sum2=float(sum([prefs[p2][it] for it in si.keys()]))
       print 'sum1=', sum1, 'sum2=', sum2

       # Sum up the squares
       sum1Sq=float(sum([pow(prefs[p1][it],2) for it in si.keys()]))
       sum2Sq=float(sum([pow(prefs[p2][it],2) for it in si.keys()]))
       print 'sum1s=', sum1Sq, 'sum2s=', sum2Sq

       # Sum up the products
       pSum=float(sum([prefs[p1][it]*prefs[p2][it] for it in si.keys()]))
       print 'pSum=', pSum

       # Calculate Pearson score
       num=(n*pSum)-(1.0*sum1*sum2)
       print 'num=', num
       den1=sqrt((n*sum1Sq)-float(pow(sum1,2)))
       print 'den1=', den1
       den2=sqrt((n*sum2Sq)-float(pow(sum2,2)))
       print 'den2=', den2
       den=1.0*den1*den2

      if den==0:
           print 'den=0'
           return 0

       r=num/den
       return r

    critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
          'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
          'The Night Listener': 3.0},
         'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
          'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
          'You, Me and Dupree': 3.5},
         'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
          'Superman Returns': 3.5, 'The Night Listener': 4.0},
         'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
          'The Night Listener': 4.5, 'Superman Returns': 4.0,
          'You, Me and Dupree': 2.5},
         'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
          'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
          'You, Me and Dupree': 2.0},
         'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
          'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
         'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

    # Done

请包括一个样本数据集。假设存在结果应为零的数据集。您可能还希望包含指向前面问题的链接，以提供更详细的上下文。是的，请提供第3行中的

for

循环中定义的

prefs

。如果它们为null，或者如果prefs[p1]为null，那么

len（si）

将为0，因此n也将为0。它包含两个元组的列表，并使用了我在网上某处找到的一些公式，当我把它用于家庭作业时，它给了我正确的答案。这可能很有用。@BrianCain刚刚更新了我的帖子，加入了数据集。非常感谢你的帮助！我对Lisa Rose/Gene Seymour组合的代码有一个问题，但它仍然给了我其他组合的错误MSG。对于我输入的“pearson.sim_pearson（pearson.critics，'Toby'，'Gene Seymour'），错误消息看起来像“keyrerror:'Lady in The Water'”。也就是说，在以另一种方式重新编写分母部分并修复for循环问题后，事情开始运行得更加顺利。再次感谢您抽出时间！