Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/280.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从JSON文件创建共现列表_Python_Json - Fatal编程技术网

Python 从JSON文件创建共现列表

Python 从JSON文件创建共现列表,python,json,Python,Json,尝试将同一部电影中的演员匹配在一起,并将他们放入一个列表中,每个元素都是一对演员 这是JSON文件的基本概要: [ { "id": "1234567", "name": "Dwayne Johnson", "born": "1970-12-12", "movies": [ { "id": "345678", "title": "Fast

尝试将同一部电影中的演员匹配在一起,并将他们放入一个列表中,每个元素都是一对演员

这是JSON文件的基本概要:

[
    {
        "id": "1234567",
        "name": "Dwayne Johnson",
        "born": "1970-12-12",
        "movies": [
            {
                "id": "345678",
                "title": "Fast and furious 7",
                "role": "actor",
                "year": 2017,
                "kind": "movie"
            },
            {
                "id": "345678",
                "title": "blah blah",
                "role": "actor",
                "year": 2020,
                "kind": "movie"
            },
        ]
    },
    {
        "id": "7844682",
        "name": "Nicole Kidman",
        "born": "1970-12-12",
        "movies": [
            {
                "id": "10161886",
                "title": "The Prom",
                "role": "actress",
                "year": 2020,
                "kind": "movie"
            },
            {
                "id": "345678",
                "title": "blah blah",
                "role": "actress",
                "year": 2020,
                "kind": "movie"
            },
等等,大约70000条线路。我想要的输出看起来像

[ ('Dwayne Johnson', 'Nicole Kidman'), ('Dwayne Johnson', 'Jason Statham') ...... etc. ]
所以德韦恩·约翰逊和妮可·基德曼都在《胡说八道》,所以他们是一对

我已经尝试过好几次了,但我真正要展示的就是这个

import json, itertools

fin = open("actors.json","r")
data = json.load(fin)
fin.close()

for actor in data:
        actor_pairs = list( itertools.combinations(actor["name"], r=2))
print(actor_pairs)
但这只是打印出文件名中最后一个参与者的每个字母组合

[ ('N', 'i'), ('N', 'c')....etc. ]

我有点不知所措,不知道该怎么办。我需要更多的嵌套for循环,或者类似的东西吗

我用上面提供的数据集尝试了下面的代码,这是一个非常没有性能的代码,它似乎可以工作

我可以提高它的效率,但是你能用你所有的数据来测试这段代码吗

data = [
    {
        "id": "1234567",
        "name": "Dwayne Johnson",
        "born": "1970-12-12",
        "movies": [
            {
                "id": "345678",
                "title": "Fast and furious 7",
                "role": "actor",
                "year": 2017,
                "kind": "movie"
            },
            {
                "id": "345678",
                "title": "blah blah",
                "role": "actor",
                "year": 2020,
                "kind": "movie"
            },
        ]
    },
    {
        "id": "7844682",
        "name": "Nicole Kidman",
        "born": "1970-12-12",
        "movies": [
            {
                "id": "10161886",
                "title": "The Prom",
                "role": "actress",
                "year": 2020,
                "kind": "movie"
            },
            {
                "id": "345678",
                "title": "blah blah",
                "role": "actress",
                "year": 2020,
                "kind": "movie"
            }
        ]

    }
]
movie_actor_list = []
for d in data:
    for movie in d['movies']:
        movie_actor_list.append((d['name'], movie['title']))

final_list = []
for name1, movie1 in movie_actor_list:
    for name2, movie2 in movie_actor_list:
        if movie1 == movie2 and not name1 == name2 and not (name2, name1) in final_list:
            final_list.append((name1, name2))

print(final_list)
输出:

[('Dwayne Johnson', 'Nicole Kidman')]

我用上面提供的数据集尝试了下面的代码,这是一个非常没有性能的代码,它似乎可以工作

我可以提高它的效率,但是你能用你所有的数据来测试这段代码吗

data = [
    {
        "id": "1234567",
        "name": "Dwayne Johnson",
        "born": "1970-12-12",
        "movies": [
            {
                "id": "345678",
                "title": "Fast and furious 7",
                "role": "actor",
                "year": 2017,
                "kind": "movie"
            },
            {
                "id": "345678",
                "title": "blah blah",
                "role": "actor",
                "year": 2020,
                "kind": "movie"
            },
        ]
    },
    {
        "id": "7844682",
        "name": "Nicole Kidman",
        "born": "1970-12-12",
        "movies": [
            {
                "id": "10161886",
                "title": "The Prom",
                "role": "actress",
                "year": 2020,
                "kind": "movie"
            },
            {
                "id": "345678",
                "title": "blah blah",
                "role": "actress",
                "year": 2020,
                "kind": "movie"
            }
        ]

    }
]
movie_actor_list = []
for d in data:
    for movie in d['movies']:
        movie_actor_list.append((d['name'], movie['title']))

final_list = []
for name1, movie1 in movie_actor_list:
    for name2, movie2 in movie_actor_list:
        if movie1 == movie2 and not name1 == name2 and not (name2, name1) in final_list:
            final_list.append((name1, name2))

print(final_list)
输出:

[('Dwayne Johnson', 'Nicole Kidman')]

另一种单线解决方案是:

actor_pairs = [(actor1, actor2) for (actor1, movies1), (actor2, movies2) in itertools.combinations([(actor['name'], {movie['id'] for movie in actor['movies']}) for actor in data], r=2) if len(movies1 & movies2) > 0]
解决方案分为以下步骤:

# create a list of tuples where each tuple contains the actor's name and a set of the ids from all the movies they worked on 
actors_and_movies = [(actor['name'], {movie['id'] for movie in actor['movies']}) for actor in data]
# Create all combinations of the previous list
all_combinations = itertools.combinations(actors_and_movies, r=2)
# Create a list of tuples containing 2 actors if the intersection between their movies is not empty
actor_pairs = [(actor1, actor2) for (actor1, movies1), (actor2, movies2) in all_combinations if movies1 & movies2]

另一种单线解决方案是:

actor_pairs = [(actor1, actor2) for (actor1, movies1), (actor2, movies2) in itertools.combinations([(actor['name'], {movie['id'] for movie in actor['movies']}) for actor in data], r=2) if len(movies1 & movies2) > 0]
解决方案分为以下步骤:

# create a list of tuples where each tuple contains the actor's name and a set of the ids from all the movies they worked on 
actors_and_movies = [(actor['name'], {movie['id'] for movie in actor['movies']}) for actor in data]
# Create all combinations of the previous list
all_combinations = itertools.combinations(actors_and_movies, r=2)
# Create a list of tuples containing 2 actors if the intersection between their movies is not empty
actor_pairs = [(actor1, actor2) for (actor1, movies1), (actor2, movies2) in all_combinations if movies1 & movies2]

我不认为
name
title
是唯一的键,不应该这样使用。@Jon我没有将它们用作键,它们是元组。对,但它是由非唯一值组成的。
actor
movie
dicts有一个看起来唯一的
id
键。不是
movie\u actor\u list.append((d['name',movie['title'))
它应该是
movie\u actor\u list.append((d['id',movie['id'))
非常感谢,效果非常好。“非唯一值”部分对我的情况并不重要,因为它们成对出现的次数以后会很重要。我不认为
name
title
是唯一的键,不应该这样使用。@Jon我没有将它们用作键,它们是元组。对,但它是由非唯一值组成的。
actor
movie
dicts有一个看起来唯一的
id
键。不是
movie\u actor\u list.append((d['name',movie['title'))
它应该是
movie\u actor\u list.append((d['id',movie['id'))
非常感谢,效果非常好。非唯一值部分对我的情况并不重要,因为它们成对出现的次数稍后将非常重要。谢谢,我最终修改了自己的代码,将您和Gauri的代码都包括在内。非常感谢您的帮助:)谢谢您,我最终修改了自己的代码,将您的和Gauri的都包括在内。非常感谢您的帮助:)