Google cloud platform 从现有bucket为AutoML Vision生成CSV导入文件

Google cloud platform 从现有bucket为AutoML Vision生成CSV导入文件,google-cloud-platform,google-cloud-storage,automl,Google Cloud Platform,Google Cloud Storage,Automl,我已经有一个GCloud存储桶,按标签划分如下: gs://my_bucket/dataset/label1/ gs://my_bucket/dataset/label2/ ... 每个标签文件夹内都有照片。我想生成所需的CSV–但考虑到每个文件夹中有数百张照片,我不知道如何以编程方式进行。CSV文件应如下所示: gs://my_bucket/dataset/label1/photo1.jpg,label1 gs://my_bucket/dataset/label1/photo12.jpg,l

我已经有一个GCloud存储桶,按标签划分如下:

gs://my_bucket/dataset/label1/
gs://my_bucket/dataset/label2/
...
每个标签文件夹内都有照片。我想生成所需的CSV–但考虑到每个文件夹中有数百张照片,我不知道如何以编程方式进行。CSV文件应如下所示:

gs://my_bucket/dataset/label1/photo1.jpg,label1
gs://my_bucket/dataset/label1/photo12.jpg,label1
gs://my_bucket/dataset/label2/photo7.jpg,label2
...

您需要列出dataset文件夹中的所有文件及其完整路径,然后对其进行解析,以获取包含该文件的文件夹的名称,就像在您的示例中,这是您要使用的标签一样。这可以通过几种不同的方式实现。我将包括两个示例,您可以根据这些示例编写代码:

Gsutil有一个字符串,然后可以使用bash脚本解析该字符串:

 # Create csv file and define bucket path
bucket_path="gs://buckbuckbuckbuck/dataset/"
filename="labels_csv_bash.csv"
touch $filename

IFS=$'\n' # Internal field separator variable has to be set to separate on new lines

# List of every .jpg file inside the buckets folder. ** searches for them recursively.
for i in `gsutil ls $bucket_path**.jpg`
do
        # Cuts the address using the / limiter and gets the second item starting from the end.
        label=$(echo $i | rev | cut -d'/' -f2 | rev)
        echo "$i, $label" >> $filename
done

IFS=' ' # Reset to originnal value

gsutil cp $filename $bucket_path
也可以使用为不同语言提供的。这里有一个使用python的示例:

# Imports the Google Cloud client library
import os
from google.cloud import storage

# Instantiates a client
storage_client = storage.Client()

# The name for the new bucket
bucket_name = 'my_bucket'
path_in_bucket = 'dataset'

blobs = storage_client.list_blobs(bucket_name, prefix=path_in_bucket)

# Reading blobs, parsing information and creating the csv file
filename = 'labels_csv_python.csv'
with open(filename, 'w+') as f:
    for blob in blobs:
        if '.jpg' in blob.name:
            bucket_path = 'gs://' + os.path.join(bucket_name, blob.name)
            label = blob.name.split('/')[-2]
            f.write(', '.join([bucket_path, label]))
            f.write("\n")

# Uploading csv file to the bucket
bucket = storage_client.get_bucket(bucket_name)
destination_blob_name = os.path.join(path_in_bucket, filename)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(filename)

您需要列出dataset文件夹中的所有文件及其完整路径,然后对其进行解析,以获取包含该文件的文件夹的名称,就像在您的示例中,这是您要使用的标签一样。这可以通过几种不同的方式实现。我将包括两个示例,您可以根据这些示例编写代码:

Gsutil有一个字符串,然后可以使用bash脚本解析该字符串:

 # Create csv file and define bucket path
bucket_path="gs://buckbuckbuckbuck/dataset/"
filename="labels_csv_bash.csv"
touch $filename

IFS=$'\n' # Internal field separator variable has to be set to separate on new lines

# List of every .jpg file inside the buckets folder. ** searches for them recursively.
for i in `gsutil ls $bucket_path**.jpg`
do
        # Cuts the address using the / limiter and gets the second item starting from the end.
        label=$(echo $i | rev | cut -d'/' -f2 | rev)
        echo "$i, $label" >> $filename
done

IFS=' ' # Reset to originnal value

gsutil cp $filename $bucket_path
也可以使用为不同语言提供的。这里有一个使用python的示例:

# Imports the Google Cloud client library
import os
from google.cloud import storage

# Instantiates a client
storage_client = storage.Client()

# The name for the new bucket
bucket_name = 'my_bucket'
path_in_bucket = 'dataset'

blobs = storage_client.list_blobs(bucket_name, prefix=path_in_bucket)

# Reading blobs, parsing information and creating the csv file
filename = 'labels_csv_python.csv'
with open(filename, 'w+') as f:
    for blob in blobs:
        if '.jpg' in blob.name:
            bucket_path = 'gs://' + os.path.join(bucket_name, blob.name)
            label = blob.name.split('/')[-2]
            f.write(', '.join([bucket_path, label]))
            f.write("\n")

# Uploading csv file to the bucket
bucket = storage_client.get_bucket(bucket_name)
destination_blob_name = os.path.join(path_in_bucket, filename)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(filename)

对于像我这样的人,他们正在寻找在googleAutoML中创建用于批处理的.csv文件的方法,但不需要标签列:

# Create csv file and define bucket path
bucket_path="gs:YOUR_BUCKET/FOLDER"
filename="THE_FILENAME_YOU_WANT.csv"
touch $filename

IFS=$'\n' # Internal field separator variable has to be set to separate on new lines

# List of every [YOUREXTENSION] file inside the buckets folder - change in next line - ie **.png beceomes **.your_extension. ** searches for them recursively.
for i in `gsutil ls $bucket_path**.png`
do

       echo "$i" >> $filename
done

IFS=' ' # Reset to originnal value

gsutil cp $filename $bucket_path

对于像我这样的人,他们正在寻找在googleAutoML中创建用于批处理的.csv文件的方法,但不需要标签列:

# Create csv file and define bucket path
bucket_path="gs:YOUR_BUCKET/FOLDER"
filename="THE_FILENAME_YOU_WANT.csv"
touch $filename

IFS=$'\n' # Internal field separator variable has to be set to separate on new lines

# List of every [YOUREXTENSION] file inside the buckets folder - change in next line - ie **.png beceomes **.your_extension. ** searches for them recursively.
for i in `gsutil ls $bucket_path**.png`
do

       echo "$i" >> $filename
done

IFS=' ' # Reset to originnal value

gsutil cp $filename $bucket_path