Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将JSON传递给数组_Python_Json_Numpy_Dictionary - Fatal编程技术网

Python 将JSON传递给数组

Python 将JSON传递给数组,python,json,numpy,dictionary,Python,Json,Numpy,Dictionary,正在尝试编写一个余弦相似性方法来解析以下JSON并计算2个用户之间的相似性: { "Rajan": { "Inception": 2.5, "Pulp Fiction": 3.5, "Anger Management": 3.0, "Fracture": 3.5, "Serendipity": 2.5, "Jerry Maguire": 3.0 }, "Rinku": { "Inception": 3.0, "Pulp Fictio

正在尝试编写一个余弦相似性方法来解析以下JSON并计算2个用户之间的相似性:

{
"Rajan":
{
    "Inception": 2.5,
    "Pulp Fiction": 3.5,
    "Anger Management": 3.0,
    "Fracture": 3.5,
    "Serendipity": 2.5,
    "Jerry Maguire": 3.0
},
"Rinku":
{
    "Inception": 3.0,
    "Pulp Fiction": 3.5,
    "Anger Management": 1.5,
    "Fracture": 5.0,
    "Jerry Maguire": 3.0,
    "Serendipity": 3.5
}
}
但是,我在将JSON术语解析为数组以执行余弦相似性时遇到了问题

import os
from sys import platform
import json
import numpy as np


def check_user_exist(self, dataset, user1, user2, algorithm):
    # check user in dataset
    if user1 not in dataset: raise Exception('User ' + user1 + ' not in dataset.')
    if user2 not in dataset: raise Exception('User ' + user2 + ' not in dataset.')

    rated_by_both = {item: 1 for item in dataset[user1] if item in dataset[user2]}
    if len(rated_by_both) == 0: return 0
    num_ratings = len(rated_by_both)

    if algorithm == 'euclidean_distance':
        return self.euclidean_distance(dataset, user1, user2)
    elif algorithm == 'cosine_similarity':
        return self.cosine_similarity(dataset, user1, user2)

def cosine_similarity(self, dataset, user1, user2):
    """ return cosine similarity between two lists """
    for item in dataset[user1]:
        print dataset[user1][item]
    array_user1 = np.array(item for item in dataset[user1][item])
    array_user2 = np.array(item for item in dataset[user2])
    dot_product = np.dot(array_user1, array_user2)
    norm_user1 = np.linalg.norm(array_user1)
    norm_user2 = np.linalg.norm(array_user2)
    return dot_product / (norm_user1 * norm_user2)

if __name__ == '__main__':
path = os.path.dirname(os.getcwd())

filename = path + '/data_files/movie_ratings.json' \
    if platform == 'linux' or platform == 'linux2' \
    else path + '\\data_files\\movie_ratings.json'

with open(filename, 'r') as f: data = json.loads(f.read())
user1 = 'Rajan'
user2 = 'Rinku'
measures = Similarity()
print('\nCosine similarity:')
print(measures.check_user_exist(data, user1, user2, "cosine_similarity"))
当前在my cosine_相似性方法中的
np.array(数据集[user1][item]中的项对应项)处抛出以下错误

TypeError:“float”对象不可编辑

在对数据集[user1]中的项使用
进行调试期间:打印数据集[user1][item]
我得到以下输出:

3.5
3.0
3.0
3.5
2.5
2.5

这基本上是第一个用户给出的电影评级。如何将JSON字典解析为一个数组,以便执行余弦相似性测试?

dataset[user1][item]
根据您的print语句是一个浮点数。并且不能迭代浮点。应该是

np.array(数据集[user1]中的项对应项)
而不是

np.array(数据集[user1][item]中的项对应项)

cosine\u similarity
函数中,位于
array\u user1=np.array(数据集[user1][item]中的项对应项)

如果您愿意使用pandas,访问数据的开销将减少到在数据帧上进行简单的键查找。比如说,

import pandas as pd
import numpy as np

def cosine_similarity(dataset, user1, user2):
    """ return cosine similarity between two lists """
    dot_product = np.dot(dataset[user1], dataset[user2])
    norm_user1 = np.linalg.norm(dataset[user1])
    norm_user2 = np.linalg.norm(dataset[user2])
    return dot_product / (norm_user1 * norm_user2)

data = {
    "Rajan": {
        "Inception": 2.5,
        "Pulp Fiction": 3.5,
        "Anger Management": 3.0,
        "Fracture": 3.5,
        "Serendipity": 2.5,
        "Jerry Maguire": 3.0
    },
    "Rinku": {
        "Inception": 3.0,
        "Pulp Fiction": 3.5,
        "Anger Management": 1.5,
        "Fracture": 5.0,
        "Jerry Maguire": 3.0,
        "Serendipity": 3.5
    }
}

df = pd.DataFrame(data)

sim = cosine_similarity(df, 'Rajan', 'Rinku')
print(sim)
输出:

0.9606463013980241

对于dict中的每个键使用.keys()

如果您想要一个包含一个用户数值的数组,请使用
np.array(list(data['Rajan'].values())
。强烈建议pandas这样做。这应该是可以接受的答案。为了避免这种自作自受的错误而增加对熊猫的依赖是愚蠢的(不管熊猫的使用有多好)。
for item in dataset[user1].keys():
    print dataset[user1][item]