如何用python制作术语文档
我有16000条来自imdb数据集的记录如何用python制作术语文档,python,machine-learning,information-retrieval,imdb,inverted-index,Python,Machine Learning,Information Retrieval,Imdb,Inverted Index,我有16000条来自imdb数据集的记录 Movie_Name Synops Alien Predator ['great','17th', 'abigail', 'by', 'century', 'is'] Shark Exorcist ['demonic', 'devil', 'great', 'hell', 'holy', 'nun'] Jurassic Shark ['abandoned', 'an', 'and', 'beautiful', '
Movie_Name Synops
Alien Predator ['great','17th', 'abigail', 'by', 'century', 'is']
Shark Exorcist ['demonic', 'devil', 'great', 'hell', 'holy', 'nun']
Jurassic Shark ['abandoned', 'an', 'and', 'beautiful', 'abigail',]
"great": Alien Predator,Shark Exorcist
"17th" :Alien Predator
"abigail":Alien Predator,Jurassic Shark
.....
我不知道如何像这样为Synops专栏中的每个单词制作术语文档
Movie_Name Synops
Alien Predator ['great','17th', 'abigail', 'by', 'century', 'is']
Shark Exorcist ['demonic', 'devil', 'great', 'hell', 'holy', 'nun']
Jurassic Shark ['abandoned', 'an', 'and', 'beautiful', 'abigail',]
"great": Alien Predator,Shark Exorcist
"17th" :Alien Predator
"abigail":Alien Predator,Jurassic Shark
.....
首先将它们放入字典或JSON中。一旦你有了它
dataset = {
"Alien Predator":['great','17th', 'abigail', 'by', 'century', 'is'],
"Shark Exorcist":['demonic', 'devil', 'great', 'hell', 'holy', 'nun'],
"Jurassic Shark":['abandoned', 'an', 'and', 'beautiful', 'abigail',],
}
您可以从此处轻松查询值
search_word = "great"
d = [movie for movie, synops in dataset.items() if search_word in synops]
回馈[“外星捕食者”,“鲨鱼驱魔者”]
您可以将它们添加到字典中以生成完整的结果
final_dict = {}
final_dict[search] = d
这应该给你一个答案
>>> final_dict
{'great': ['Alien Predator', 'Shark Exorcist']}
现在,您可以使用一些for循环和所需关键字列表来实现相同的功能,并自己完成任务
data = {
"Alien Predator": ['great','17th', 'abigail', 'by', 'century', 'is'],
"Shark Exorcist": ['demonic', 'devil', 'great', 'hell', 'holy', 'nun'],
"Jurassic Shark": ['abandoned', 'an', 'and', 'beautiful', 'abigail',]
}
result = {}
for movie_name, keywords in data.items():
for keyword in keywords:
result.setdefault(keyword, []).append(movie_name)
print(result)
结果(为清晰起见添加了换行符):
数据集的表示形式是什么?它是一个以电影名称为键,以synops为值的字典吗?它是一个excel文件,有两列(电影名称,synops)。