Python 从JSON文件创建共现列表
尝试将同一部电影中的演员匹配在一起,并将他们放入一个列表中,每个元素都是一对演员 这是JSON文件的基本概要:Python 从JSON文件创建共现列表,python,json,Python,Json,尝试将同一部电影中的演员匹配在一起,并将他们放入一个列表中,每个元素都是一对演员 这是JSON文件的基本概要: [ { "id": "1234567", "name": "Dwayne Johnson", "born": "1970-12-12", "movies": [ { "id": "345678", "title": "Fast
[
{
"id": "1234567",
"name": "Dwayne Johnson",
"born": "1970-12-12",
"movies": [
{
"id": "345678",
"title": "Fast and furious 7",
"role": "actor",
"year": 2017,
"kind": "movie"
},
{
"id": "345678",
"title": "blah blah",
"role": "actor",
"year": 2020,
"kind": "movie"
},
]
},
{
"id": "7844682",
"name": "Nicole Kidman",
"born": "1970-12-12",
"movies": [
{
"id": "10161886",
"title": "The Prom",
"role": "actress",
"year": 2020,
"kind": "movie"
},
{
"id": "345678",
"title": "blah blah",
"role": "actress",
"year": 2020,
"kind": "movie"
},
等等,大约70000条线路。我想要的输出看起来像
[ ('Dwayne Johnson', 'Nicole Kidman'), ('Dwayne Johnson', 'Jason Statham') ...... etc. ]
所以德韦恩·约翰逊和妮可·基德曼都在《胡说八道》,所以他们是一对
我已经尝试过好几次了,但我真正要展示的就是这个
import json, itertools
fin = open("actors.json","r")
data = json.load(fin)
fin.close()
for actor in data:
actor_pairs = list( itertools.combinations(actor["name"], r=2))
print(actor_pairs)
但这只是打印出文件名中最后一个参与者的每个字母组合
[ ('N', 'i'), ('N', 'c')....etc. ]
我有点不知所措,不知道该怎么办。我需要更多的嵌套for循环,或者类似的东西吗 我用上面提供的数据集尝试了下面的代码,这是一个非常没有性能的代码,它似乎可以工作 我可以提高它的效率,但是你能用你所有的数据来测试这段代码吗
data = [
{
"id": "1234567",
"name": "Dwayne Johnson",
"born": "1970-12-12",
"movies": [
{
"id": "345678",
"title": "Fast and furious 7",
"role": "actor",
"year": 2017,
"kind": "movie"
},
{
"id": "345678",
"title": "blah blah",
"role": "actor",
"year": 2020,
"kind": "movie"
},
]
},
{
"id": "7844682",
"name": "Nicole Kidman",
"born": "1970-12-12",
"movies": [
{
"id": "10161886",
"title": "The Prom",
"role": "actress",
"year": 2020,
"kind": "movie"
},
{
"id": "345678",
"title": "blah blah",
"role": "actress",
"year": 2020,
"kind": "movie"
}
]
}
]
movie_actor_list = []
for d in data:
for movie in d['movies']:
movie_actor_list.append((d['name'], movie['title']))
final_list = []
for name1, movie1 in movie_actor_list:
for name2, movie2 in movie_actor_list:
if movie1 == movie2 and not name1 == name2 and not (name2, name1) in final_list:
final_list.append((name1, name2))
print(final_list)
输出:
[('Dwayne Johnson', 'Nicole Kidman')]
我用上面提供的数据集尝试了下面的代码,这是一个非常没有性能的代码,它似乎可以工作 我可以提高它的效率,但是你能用你所有的数据来测试这段代码吗
data = [
{
"id": "1234567",
"name": "Dwayne Johnson",
"born": "1970-12-12",
"movies": [
{
"id": "345678",
"title": "Fast and furious 7",
"role": "actor",
"year": 2017,
"kind": "movie"
},
{
"id": "345678",
"title": "blah blah",
"role": "actor",
"year": 2020,
"kind": "movie"
},
]
},
{
"id": "7844682",
"name": "Nicole Kidman",
"born": "1970-12-12",
"movies": [
{
"id": "10161886",
"title": "The Prom",
"role": "actress",
"year": 2020,
"kind": "movie"
},
{
"id": "345678",
"title": "blah blah",
"role": "actress",
"year": 2020,
"kind": "movie"
}
]
}
]
movie_actor_list = []
for d in data:
for movie in d['movies']:
movie_actor_list.append((d['name'], movie['title']))
final_list = []
for name1, movie1 in movie_actor_list:
for name2, movie2 in movie_actor_list:
if movie1 == movie2 and not name1 == name2 and not (name2, name1) in final_list:
final_list.append((name1, name2))
print(final_list)
输出:
[('Dwayne Johnson', 'Nicole Kidman')]
另一种单线解决方案是:
actor_pairs = [(actor1, actor2) for (actor1, movies1), (actor2, movies2) in itertools.combinations([(actor['name'], {movie['id'] for movie in actor['movies']}) for actor in data], r=2) if len(movies1 & movies2) > 0]
解决方案分为以下步骤:
# create a list of tuples where each tuple contains the actor's name and a set of the ids from all the movies they worked on
actors_and_movies = [(actor['name'], {movie['id'] for movie in actor['movies']}) for actor in data]
# Create all combinations of the previous list
all_combinations = itertools.combinations(actors_and_movies, r=2)
# Create a list of tuples containing 2 actors if the intersection between their movies is not empty
actor_pairs = [(actor1, actor2) for (actor1, movies1), (actor2, movies2) in all_combinations if movies1 & movies2]
另一种单线解决方案是:
actor_pairs = [(actor1, actor2) for (actor1, movies1), (actor2, movies2) in itertools.combinations([(actor['name'], {movie['id'] for movie in actor['movies']}) for actor in data], r=2) if len(movies1 & movies2) > 0]
解决方案分为以下步骤:
# create a list of tuples where each tuple contains the actor's name and a set of the ids from all the movies they worked on
actors_and_movies = [(actor['name'], {movie['id'] for movie in actor['movies']}) for actor in data]
# Create all combinations of the previous list
all_combinations = itertools.combinations(actors_and_movies, r=2)
# Create a list of tuples containing 2 actors if the intersection between their movies is not empty
actor_pairs = [(actor1, actor2) for (actor1, movies1), (actor2, movies2) in all_combinations if movies1 & movies2]
我不认为
name
或title
是唯一的键,不应该这样使用。@Jon我没有将它们用作键,它们是元组。对,但它是由非唯一值组成的。actor
和movie
dicts有一个看起来唯一的id
键。不是movie\u actor\u list.append((d['name',movie['title'))
它应该是movie\u actor\u list.append((d['id',movie['id'))
非常感谢,效果非常好。“非唯一值”部分对我的情况并不重要,因为它们成对出现的次数以后会很重要。我不认为name
或title
是唯一的键,不应该这样使用。@Jon我没有将它们用作键,它们是元组。对,但它是由非唯一值组成的。actor
和movie
dicts有一个看起来唯一的id
键。不是movie\u actor\u list.append((d['name',movie['title'))
它应该是movie\u actor\u list.append((d['id',movie['id'))
非常感谢,效果非常好。非唯一值部分对我的情况并不重要,因为它们成对出现的次数稍后将非常重要。谢谢,我最终修改了自己的代码,将您和Gauri的代码都包括在内。非常感谢您的帮助:)谢谢您,我最终修改了自己的代码,将您的和Gauri的都包括在内。非常感谢您的帮助:)