Python 将具有公共列值的数据帧中的字符串追加到列表时重复_Python_Pandas

Python 将具有公共列值的数据帧中的字符串追加到列表时重复

python pandas

Python 将具有公共列值的数据帧中的字符串追加到列表时重复,python,pandas,Python,Pandas,这里的初学者，我试图根据我分配给他们的集群值，从多伦多的数据帧中分离出社区的名称。我没有列出3个独特的项目，而是列出了2363个项目 Neigh_List = [] for n in toronto_merged['Cluster Labels']: if n == 7 : x = toronto_merged['Neighborhood'] Neigh_List.append(x) if x not in Neigh_List else Non

这里的初学者，我试图根据我分配给他们的集群值，从多伦多的数据帧中分离出社区的名称。我没有列出3个独特的项目，而是列出了2363个项目

Neigh_List = []
for n in toronto_merged['Cluster Labels']:

        if n == 7 :
        x = toronto_merged['Neighborhood']
        Neigh_List.append(x) if x not in Neigh_List else None      


        
               
Neigh_List

[0                                                                                                Parkwoods
 1                                                                                                Parkwoods
 2                                                                                         Victoria Village
 3                                                                                         Victoria Village
 4                                                                                         Victoria Village
                                                        ...                                                
 2359    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
 2360    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
 2361    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
 2362    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
 2363    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
 Name: Neighborhood, Length: 2364, dtype: object]

你试过用熊猫自己的力量吗。选择集群标签等于7的所有行，获取唯一的邻域


...
Neigh_List=toronto_merged.loc[lambda d:d['Cluster Labels'].eq（7）]['neighbour'].unique（）.tolist（）
#除了.unique（），您还可以执行.drop_duplicates（），这会更快

一般来说，对于较大的数据集（~1000+）应避免在熊猫数据帧上循环，因为熊猫内置的矢量化函数通常更快（）

您可以尝试以下方法：

neigh_list = list(toronto_merged.loc[toronto_merged['Neighborhood'] == 7]]['Neighborhood'].unique())

此外，如果希望避免列表中出现重复项，可以使用python

或者，使用集合理解：

unique_elements = {unique_item for unique_item in some_iterable}

这就是我指出的；）是的，你就在我前面：）（我也在发布后偷偷地做了一些编辑/纠正了一些错误）。令人惊讶的是（从一个好的方面来说），第一个回答往往比回答自己的问题更难。不过，你多做了一点；）使用这个建议（我遇到过，但在这里应用之前并不理解），我的代码如下所示。非常感谢。neigh_list=set（）表示多伦多地区合并的['Cluster Labels']：如果a==7：表示多伦多地区合并的['neigh']：neigh_list.添加（x）neigh_list

unique_elements = {unique_item for unique_item in some_iterable}