Python 从JSON文件创建共现列表_Python_Json

Python 从JSON文件创建共现列表

python json

Python 从JSON文件创建共现列表,python,json,Python,Json,尝试将同一部电影中的演员匹配在一起，并将他们放入一个列表中，每个元素都是一对演员这是JSON文件的基本概要： [ { "id": "1234567", "name": "Dwayne Johnson", "born": "1970-12-12", "movies": [ { "id": "345678", "title": "Fast

尝试将同一部电影中的演员匹配在一起，并将他们放入一个列表中，每个元素都是一对演员

这是JSON文件的基本概要：

[
    {
        "id": "1234567",
        "name": "Dwayne Johnson",
        "born": "1970-12-12",
        "movies": [
            {
                "id": "345678",
                "title": "Fast and furious 7",
                "role": "actor",
                "year": 2017,
                "kind": "movie"
            },
            {
                "id": "345678",
                "title": "blah blah",
                "role": "actor",
                "year": 2020,
                "kind": "movie"
            },
        ]
    },
    {
        "id": "7844682",
        "name": "Nicole Kidman",
        "born": "1970-12-12",
        "movies": [
            {
                "id": "10161886",
                "title": "The Prom",
                "role": "actress",
                "year": 2020,
                "kind": "movie"
            },
            {
                "id": "345678",
                "title": "blah blah",
                "role": "actress",
                "year": 2020,
                "kind": "movie"
            },

等等，大约70000条线路。我想要的输出看起来像

[ ('Dwayne Johnson', 'Nicole Kidman'), ('Dwayne Johnson', 'Jason Statham') ...... etc. ]

所以德韦恩·约翰逊和妮可·基德曼都在《胡说八道》，所以他们是一对

我已经尝试过好几次了，但我真正要展示的就是这个

import json, itertools

fin = open("actors.json","r")
data = json.load(fin)
fin.close()

for actor in data:
        actor_pairs = list( itertools.combinations(actor["name"], r=2))
print(actor_pairs)

但这只是打印出文件名中最后一个参与者的每个字母组合

[ ('N', 'i'), ('N', 'c')....etc. ]

我有点不知所措，不知道该怎么办。我需要更多的嵌套for循环，或者类似的东西吗

我用上面提供的数据集尝试了下面的代码，这是一个非常没有性能的代码，它似乎可以工作

我可以提高它的效率，但是你能用你所有的数据来测试这段代码吗

data = [
    {
        "id": "1234567",
        "name": "Dwayne Johnson",
        "born": "1970-12-12",
        "movies": [
            {
                "id": "345678",
                "title": "Fast and furious 7",
                "role": "actor",
                "year": 2017,
                "kind": "movie"
            },
            {
                "id": "345678",
                "title": "blah blah",
                "role": "actor",
                "year": 2020,
                "kind": "movie"
            },
        ]
    },
    {
        "id": "7844682",
        "name": "Nicole Kidman",
        "born": "1970-12-12",
        "movies": [
            {
                "id": "10161886",
                "title": "The Prom",
                "role": "actress",
                "year": 2020,
                "kind": "movie"
            },
            {
                "id": "345678",
                "title": "blah blah",
                "role": "actress",
                "year": 2020,
                "kind": "movie"
            }
        ]

    }
]
movie_actor_list = []
for d in data:
    for movie in d['movies']:
        movie_actor_list.append((d['name'], movie['title']))

final_list = []
for name1, movie1 in movie_actor_list:
    for name2, movie2 in movie_actor_list:
        if movie1 == movie2 and not name1 == name2 and not (name2, name1) in final_list:
            final_list.append((name1, name2))

print(final_list)

输出：

[('Dwayne Johnson', 'Nicole Kidman')]

我用上面提供的数据集尝试了下面的代码，这是一个非常没有性能的代码，它似乎可以工作

我可以提高它的效率，但是你能用你所有的数据来测试这段代码吗

data = [
    {
        "id": "1234567",
        "name": "Dwayne Johnson",
        "born": "1970-12-12",
        "movies": [
            {
                "id": "345678",
                "title": "Fast and furious 7",
                "role": "actor",
                "year": 2017,
                "kind": "movie"
            },
            {
                "id": "345678",
                "title": "blah blah",
                "role": "actor",
                "year": 2020,
                "kind": "movie"
            },
        ]
    },
    {
        "id": "7844682",
        "name": "Nicole Kidman",
        "born": "1970-12-12",
        "movies": [
            {
                "id": "10161886",
                "title": "The Prom",
                "role": "actress",
                "year": 2020,
                "kind": "movie"
            },
            {
                "id": "345678",
                "title": "blah blah",
                "role": "actress",
                "year": 2020,
                "kind": "movie"
            }
        ]

    }
]
movie_actor_list = []
for d in data:
    for movie in d['movies']:
        movie_actor_list.append((d['name'], movie['title']))

final_list = []
for name1, movie1 in movie_actor_list:
    for name2, movie2 in movie_actor_list:
        if movie1 == movie2 and not name1 == name2 and not (name2, name1) in final_list:
            final_list.append((name1, name2))

print(final_list)

输出：

[('Dwayne Johnson', 'Nicole Kidman')]

另一种单线解决方案是：

actor_pairs = [(actor1, actor2) for (actor1, movies1), (actor2, movies2) in itertools.combinations([(actor['name'], {movie['id'] for movie in actor['movies']}) for actor in data], r=2) if len(movies1 & movies2) > 0]

解决方案分为以下步骤：

# create a list of tuples where each tuple contains the actor's name and a set of the ids from all the movies they worked on 
actors_and_movies = [(actor['name'], {movie['id'] for movie in actor['movies']}) for actor in data]
# Create all combinations of the previous list
all_combinations = itertools.combinations(actors_and_movies, r=2)
# Create a list of tuples containing 2 actors if the intersection between their movies is not empty
actor_pairs = [(actor1, actor2) for (actor1, movies1), (actor2, movies2) in all_combinations if movies1 & movies2]

另一种单线解决方案是：

actor_pairs = [(actor1, actor2) for (actor1, movies1), (actor2, movies2) in itertools.combinations([(actor['name'], {movie['id'] for movie in actor['movies']}) for actor in data], r=2) if len(movies1 & movies2) > 0]

解决方案分为以下步骤：

# create a list of tuples where each tuple contains the actor's name and a set of the ids from all the movies they worked on 
actors_and_movies = [(actor['name'], {movie['id'] for movie in actor['movies']}) for actor in data]
# Create all combinations of the previous list
all_combinations = itertools.combinations(actors_and_movies, r=2)
# Create a list of tuples containing 2 actors if the intersection between their movies is not empty
actor_pairs = [(actor1, actor2) for (actor1, movies1), (actor2, movies2) in all_combinations if movies1 & movies2]

我不认为

name

或

title

是唯一的键，不应该这样使用。@Jon我没有将它们用作键，它们是元组。对，但它是由非唯一值组成的。

actor

和

movie

dicts有一个看起来唯一的

id

键。不是

movie\u actor\u list.append（（d['name'，movie['title'））

它应该是

movie\u actor\u list.append（（d['id'，movie['id'））

非常感谢，效果非常好。“非唯一值”部分对我的情况并不重要，因为它们成对出现的次数以后会很重要。我不认为

name

或

title

是唯一的键，不应该这样使用。@Jon我没有将它们用作键，它们是元组。对，但它是由非唯一值组成的。

actor

和

movie

dicts有一个看起来唯一的

id

键。不是

movie\u actor\u list.append（（d['name'，movie['title'））

它应该是

movie\u actor\u list.append（（d['id'，movie['id'））

非常感谢，效果非常好。非唯一值部分对我的情况并不重要，因为它们成对出现的次数稍后将非常重要。谢谢，我最终修改了自己的代码，将您和Gauri的代码都包括在内。非常感谢您的帮助：）谢谢您，我最终修改了自己的代码，将您的和Gauri的都包括在内。非常感谢您的帮助：）