Python 如何根据数据帧的图像编号列分隔文件夹中的图像？_Python_Python 3.x_Pandas_Cluster Analysis

Python 如何根据数据帧的图像编号列分隔文件夹中的图像？

python python-3.x pandas

Python 如何根据数据帧的图像编号列分隔文件夹中的图像？,python,python-3.x,pandas,cluster-analysis,Python,Python 3.x,Pandas,Cluster Analysis,我有以下形状的数据帧（8683483），其中868是我拥有的图像总数，3481是图像中的像素数。每行代表一个特定图像，图像编号位于img列中。我应用了无监督学习，并对集群列中的这些图像进行了聚类 img cluster 0 1 2 3 4 5 6 7 0 3 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1 2 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2 3 1.0 1.0 1.0 1.0 1.0 1.0 1.0 3 1 1.0 1.0

我有以下形状的数据帧（8683483），其中868是我拥有的图像总数，3481是图像中的像素数。每行代表一个特定图像，图像编号位于

img

列中。我应用了无监督学习，并对

集群

列中的这些图像进行了聚类

img cluster 0 1 2 3 4 5 6 7
0   3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
1   2   1.0 1.0 1.0 1.0 1.0 1.0 1.0
2   3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
3   1   1.0 1.0 1.0 1.0 1.0 1.0 1.0
5   3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
6   3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
8   3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
9   3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
10  2   1.0 1.0 1.0 1.0 1.0 1.0 1.0
11  2   1.0 1.0 1.0 1.0 1.0 1.0 1.0
13  3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
15  1   1.0 1.0 1.0 1.0 1.0 1.0 1.0

我有一个文件夹，其中的图像标记与

img

列相同。现在，我想根据这些图像所属的集群来隔离它们

例如图像“0,2,5,6,8,9,13”属于cluster3，因此我想将这些图像分离到名为“cluster3”的子文件夹中，cluster1和cluster2也是如此

有没有一个简单的方法可以做到这一点

您可以根据python中的

os

（或Dennis评论的

shutil

模块移动文件。据我所知，我们只关心img和cluster列

dictionary=df.set_index（“img”）[“cluster”]”。to_dict（）

将返回一个字典，每个键都是一个图像，每个簇都是一个文件夹。我不确定存在多少集群，但我们也可以使用os命令创建许多文件夹和子文件夹，如下所示

#This is where you decide to save the images 
#Here you make individual folders for each cluster
fp = "path/to/save/images/clusters/"
import os
os.mkdir("clusters/")
allClusters = list(set(df["cluster"]))
for x in allClusters:
    os.mkdir(fp+"cluster" + str(x))

然后，您可以将每个文件转到其相应的文件夹（我不确定每个文件的名称，但现在我假定名称为

img1.png、img2.png…

等）对于您的问题，我建议重命名img列（或其他列，并将索引设置为下一行中的该列）

这应该可以完成任务。如果有任何错误，请告诉我

您可以读取

img

编号及其

群集标签

。然后使用

shutil

library将图像移动到所需文件夹。@AkashTripuramallu我不确定你的意思。在代码中，我编写了它以便您可以声明文件路径？我建议使用数据帧的一列作为所有文件名（带路径），将该列用作字典的索引（请参见上文

df.set_index（image_name）[clusternames].to_dict（）

因此，当您继续运行os.rename时，您可以将字典键中存储的文件重命名为名为

cluster+str（cluster\u num）的文件夹中的文件

如果这回答了您的问题，请将其标记为正确。有什么建议吗？@AkashTripuramallu-oops这是因为windows计算机使用反斜杠而不是正斜杠。请根据我的实际尝试更改我的代码，但它显示了相同的错误。知道为什么吗？

#This is where the dictionary is created. The key to each value is the 
#original file name
#The cluster value is the folder that each image will saved two (see above
#where we create each folder
dictionary = df.set_index("img")["cluster"].to_dict()
for x in dictionary:
    #THIS is how the file is acess, the dictionary stores the name of the
    #files as the key, and path to file is the folder of all those images
    filename = "path/to/images/" + str(x) + ".png" 

    #This is where we rename the original image to the new filepath
    os.rename(filename, fp + "cluster" + str(dictionary(x)) +"/"+ filename))