如何在python数据框架中找到唯一的列表项?

如何在python数据框架中找到唯一的列表项?,python,pandas,Python,Pandas,我有一个数据集,其中包含电影标题以及它所属的不同类型。每部电影都有不止一种类型。因此,对于整个数据集,我希望找到存在的唯一类型的总数 我不能使用df.unique(),因为它是数据帧本身每列中的一个列表 movieId title genres 0 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy 1 2 Jumanji (1995) Adventure|Children|Fantasy 2

我有一个数据集,其中包含电影标题以及它所属的不同类型。每部电影都有不止一种类型。因此,对于整个数据集,我希望找到存在的唯一类型的总数

我不能使用
df.unique()
,因为它是数据帧本身每列中的一个列表

movieId title   genres
0   1   Toy Story (1995)    Adventure|Animation|Children|Comedy|Fantasy
1   2   Jumanji (1995)  Adventure|Children|Fantasy
2   3   Grumpier Old Men (1995) Comedy|Romance
3   4   Waiting to Exhale (1995)    Comedy|Drama|Romance
4   5   Father of the Bride Part II (1995)  Comedy
5   6   Heat (1995) Action|Crime|Thriller
6   7   Sabrina (1995)  Comedy|Romance
7   8   Tom and Huck (1995) Adventure|Children
8   9   Sudden Death (1995) Action
9   10  GoldenEye (1995)    Action|Adventure|Thriller
10  11  American President, The (1995)  Comedy|Drama|Romance
11  12  Dracula: Dead and Loving It (1995)  Comedy|Horror
12  13  Balto (1995)    Adventure|Animation|Children
13  14  Nixon (1995)    Drama
14  15  Cutthroat Island (1995) Action|Adventure|Romance
15  16  Casino (1995)   Crime|Drama
16  17  Sense and Sensibility (1995)    Drama|Romance
17  18  Four Rooms (1995)   Comedy
18  19  Ace Ventura: When Nature Calls (1995)   Comedy
19  20  Money Train (1995)  Action|Comedy|Crime|Drama|Thriller
20  21  Get Shorty (1995)   Comedy|Crime|Thriller
21  22  Copycat (1995)  Crime|Drama|Horror|Mystery|Thriller
22  23  Assassins (1995)    Action|Crime|Thriller
23  24  Powder (1995)   Drama|Sci-Fi
24  25  Leaving Las Vegas (1995)    Drama|Romance
25  26  Othello (1995)  Drama
26  27  Now and Then (1995) Children|Drama
27  28  Persuasion (1995)   Drama|Romance
28  29  City of Lost Children, The (Cité des enfants p...   
这是电影的数据集

在体裁专栏下,我想把动作、喜剧、犯罪、戏剧、惊悚片分为动作、喜剧、犯罪、戏剧、惊悚片


另外,对于现在作为数据帧的整个数据集,我希望找到唯一的类型。

您可以按照以下步骤进行操作:

df = pd.DataFrame({'title':['Toy Story (1995)','Jumanji (1995)','Grumpier Old Men (1995)'],
                            'genres':['Adventure|Animation|Children|Comedy|Fantasy','Adventure|Children|Fantasy','Comedy|Romance']})


a = list(set([y for x in df['genres'] for y in x.split('|')]))
print(a)
输出:

['Animation', 'Comedy', 'Children', 'Fantasy', 'Adventure', 'Romance']

尝试使用以下方法:

temp = df.genres.str.split("|").tolist() # this will return a list of lists for all the genres
import functools
import operator

unique_genres = set(functools.reduce(operator.concat, temp)) #this will flatten the list of lists and ultimately call the set to get the unique genres. Use len to get the number of unique genres afterwards
请尝试以下操作:

df = pda.read_csv('movies.csv')
df['genres'] = df['genres'].apply(lambda x: x.strip().split('|'))
df['count'] = df['genres'].apply(lambda y: len(y))
print(df)

OUTPUT :

   movie   Id  ...                                             genres count
     0    1  ...  [Adventure, Animation, Children, Comedy, Fantasy]     5
     1    2  ...                     [Adventure, Children, Fantasy]     3
     2    3  ...                                  [Comedy, Romance]     2
     3    4  ...                           [Comedy, Drama, Romance]     3
     4    5  ...                                           [Comedy]     1
     5    6  ...                          [Action, Crime, Thriller]     3

您是否尝试先将所有类型列收集到一个数组中,然后调用.unique()?不,还没有。我对python非常陌生,因此我对它不熟悉。我会试试的。我试过了,它确实有用。但它只是需要时间来运行。谢谢很高兴它成功了!不管怎么说,AkshayNevrekar的答案似乎更好这也很有效。但是ashish14给出的结果似乎更快。无论如何谢谢你!