Python 将JSON传递给数组
正在尝试编写一个余弦相似性方法来解析以下JSON并计算2个用户之间的相似性:Python 将JSON传递给数组,python,json,numpy,dictionary,Python,Json,Numpy,Dictionary,正在尝试编写一个余弦相似性方法来解析以下JSON并计算2个用户之间的相似性: { "Rajan": { "Inception": 2.5, "Pulp Fiction": 3.5, "Anger Management": 3.0, "Fracture": 3.5, "Serendipity": 2.5, "Jerry Maguire": 3.0 }, "Rinku": { "Inception": 3.0, "Pulp Fictio
{
"Rajan":
{
"Inception": 2.5,
"Pulp Fiction": 3.5,
"Anger Management": 3.0,
"Fracture": 3.5,
"Serendipity": 2.5,
"Jerry Maguire": 3.0
},
"Rinku":
{
"Inception": 3.0,
"Pulp Fiction": 3.5,
"Anger Management": 1.5,
"Fracture": 5.0,
"Jerry Maguire": 3.0,
"Serendipity": 3.5
}
}
但是,我在将JSON术语解析为数组以执行余弦相似性时遇到了问题
import os
from sys import platform
import json
import numpy as np
def check_user_exist(self, dataset, user1, user2, algorithm):
# check user in dataset
if user1 not in dataset: raise Exception('User ' + user1 + ' not in dataset.')
if user2 not in dataset: raise Exception('User ' + user2 + ' not in dataset.')
rated_by_both = {item: 1 for item in dataset[user1] if item in dataset[user2]}
if len(rated_by_both) == 0: return 0
num_ratings = len(rated_by_both)
if algorithm == 'euclidean_distance':
return self.euclidean_distance(dataset, user1, user2)
elif algorithm == 'cosine_similarity':
return self.cosine_similarity(dataset, user1, user2)
def cosine_similarity(self, dataset, user1, user2):
""" return cosine similarity between two lists """
for item in dataset[user1]:
print dataset[user1][item]
array_user1 = np.array(item for item in dataset[user1][item])
array_user2 = np.array(item for item in dataset[user2])
dot_product = np.dot(array_user1, array_user2)
norm_user1 = np.linalg.norm(array_user1)
norm_user2 = np.linalg.norm(array_user2)
return dot_product / (norm_user1 * norm_user2)
if __name__ == '__main__':
path = os.path.dirname(os.getcwd())
filename = path + '/data_files/movie_ratings.json' \
if platform == 'linux' or platform == 'linux2' \
else path + '\\data_files\\movie_ratings.json'
with open(filename, 'r') as f: data = json.loads(f.read())
user1 = 'Rajan'
user2 = 'Rinku'
measures = Similarity()
print('\nCosine similarity:')
print(measures.check_user_exist(data, user1, user2, "cosine_similarity"))
当前在my cosine_相似性方法中的np.array(数据集[user1][item]中的项对应项)处抛出以下错误
TypeError:“float”对象不可编辑
在对数据集[user1]中的项使用进行调试期间:打印数据集[user1][item]
我得到以下输出:
3.5
3.0
3.0
3.5
2.5
2.5
这基本上是第一个用户给出的电影评级。如何将JSON字典解析为一个数组,以便执行余弦相似性测试?dataset[user1][item]
根据您的print语句是一个浮点数。并且不能迭代浮点。应该是
np.array(数据集[user1]中的项对应项)
而不是
np.array(数据集[user1][item]中的项对应项)
在cosine\u similarity
函数中,位于array\u user1=np.array(数据集[user1][item]中的项对应项)
如果您愿意使用pandas,访问数据的开销将减少到在数据帧上进行简单的键查找。比如说,
import pandas as pd
import numpy as np
def cosine_similarity(dataset, user1, user2):
""" return cosine similarity between two lists """
dot_product = np.dot(dataset[user1], dataset[user2])
norm_user1 = np.linalg.norm(dataset[user1])
norm_user2 = np.linalg.norm(dataset[user2])
return dot_product / (norm_user1 * norm_user2)
data = {
"Rajan": {
"Inception": 2.5,
"Pulp Fiction": 3.5,
"Anger Management": 3.0,
"Fracture": 3.5,
"Serendipity": 2.5,
"Jerry Maguire": 3.0
},
"Rinku": {
"Inception": 3.0,
"Pulp Fiction": 3.5,
"Anger Management": 1.5,
"Fracture": 5.0,
"Jerry Maguire": 3.0,
"Serendipity": 3.5
}
}
df = pd.DataFrame(data)
sim = cosine_similarity(df, 'Rajan', 'Rinku')
print(sim)
输出:
0.9606463013980241
对于dict中的每个键使用.keys()如果您想要一个包含一个用户数值的数组,请使用np.array(list(data['Rajan'].values())
。强烈建议pandas这样做。这应该是可以接受的答案。为了避免这种自作自受的错误而增加对熊猫的依赖是愚蠢的(不管熊猫的使用有多好)。
for item in dataset[user1].keys():
print dataset[user1][item]