Python-字典的DataFrame中列之间的scipy pdist_Python_Dictionary_Pandas_Scipy_Dataframe

Python-字典的DataFrame中列之间的scipy pdist

python dictionary pandas dataframe

Python-字典的DataFrame中列之间的scipy pdist,python,dictionary,pandas,scipy,dataframe,Python,Dictionary,Pandas,Scipy,Dataframe,我正在开发一个程序来计算电影评论之间的欧几里德距离。我想计算一个给定的审阅者和另一个给定的审阅者，以及一个给定的审阅者和所有其他人之间的差异。我将数据放在字典的数据框中，如下所示： { 'Nancy Pollock': { 'Lawrence of Arabia': 2.5, 'Gravity': 3.5, 'The Godfather': 3.0, 'Prometheus': 3.5, 'For a Few

我正在开发一个程序来计算电影评论之间的欧几里德距离。我想计算一个给定的审阅者和另一个给定的审阅者，以及一个给定的审阅者和所有其他人之间的差异。我将数据放在字典的数据框中，如下所示：

{
    'Nancy Pollock': {
        'Lawrence of Arabia': 2.5,
        'Gravity': 3.5,
        'The Godfather': 3.0,
        'Prometheus': 3.5,
        'For a Few Dollars More': 2.5,
        'The Guns of Navarone': 3.0
    },
    'Jack Holmes': {
        'Lawrence of Arabia': 3.0,
        'Gravity': 3.5,
        'The Godfather': 1.5,
        'Prometheus': 5.0,
        'The Guns of Navarone': 3.0,
        'For a Few Dollars More': 3.5
    },
    'Mary Doyle': {
        'Lawrence of Arabia': 2.5,
        'Gravity': 3.0,
        'Prometheus': 3.5,
        'The Guns of Navarone': 4.0
    },
    'Doug Redpath': {
        'Gravity': 3.5,
        'The Godfather': 3.0,
        'The Guns of Navarone': 4.5,
        'Prometheus': 4.0,
        'For a Few Dollars More': 2.5
    },
    'Jill Brown': {
        'Lawrence of Arabia': 3.0,
        'Gravity': 4.0,
        'The Godfather': 2.0,
        'Prometheus': 3.0,
        'The Guns of Navarone': 3.0,
        'For a Few Dollars More': 2.0
    },
    'Trevor Chappell': {
        'Lawrence of Arabia': 3.0,
        'Gravity': 4.0,
        'The Guns of Navarone': 3.0,
        'Prometheus': 5.0,
        'For a Few Dollars More': 3.5
    },
    'Peter': {
        'Gravity': 4.5,
        'For a Few Dollars More': 1.0,
        'Prometheus': 4.0
    }
}

我在这里相当迷茫，但我想知道的是如何制作一个函数，将每一本词典转换成pdist可以使用的格式。然后我可以研究如何遍历它。我目前掌握的代码如下：

import pandas as pd
from scipy.spatial.distance import pdist, squareform
f= open("reviews.txt")
d= eval(f.read())
#print(d)
df = pd.DataFrame(d)
print(df)
def getSimilarity():
    EcDist = pd.DataFrame(index=df.index) #container for results
    movieArray = df.values
    #some way of turning it into a format pdist can use
    EcDist = pdist#etc
    return EcDist

def getSimilarities():
    EcDist2 = pd.DataFrame(index=df.index)
    movieArrays = df.values
    #some way of turning it into a format pdist can use
    EcDist2 = pdist#etc
    return EcDist2

试着把这一点收紧一点：例如，给我们三篇评论，以及你想从这三篇评论中得到的格式样本。好的，我回家后会看一看。非常感谢。