tensorflow中的普通编码器的等价物是什么?

tensorflow中的普通编码器的等价物是什么?,tensorflow,encode,tensorflow-datasets,Tensorflow,Encode,Tensorflow Datasets,我的数据集中有一个特殊的功能,它包含分类字符串。这些值属于['a','ae','e','i','u'] 但是,我想把这些字符映射成数字。请注意,我使用的是tensorflow数据集 以下是我的示例代码: data_dir = "C:/Users/user/Documents/vowels/" # I have data collected from 13 different subjects. Each time the data is recorded is considered one t

我的数据集中有一个特殊的功能,它包含分类字符串。这些值属于
['a','ae','e','i','u']

但是,我想把这些字符映射成数字。请注意,我使用的是tensorflow数据集

以下是我的示例代码:

data_dir = "C:/Users/user/Documents/vowels/"

# I have data collected from 13 different subjects. Each time the data is recorded is considered one trial. In total we have 6 trials per subject.
# In this case, I used the first 5 trials for training and the 6th for testing/validation.
subjects_nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

trial_nums_train = [1, 2, 3, 4, 5]
trial_nums_test = [6]

paths_train = [data_dir + 'Col3/*/*_{}_trail_{}.png'.format(i, j) for i in subjects_nums for j in trial_nums_train]
paths_test = [data_dir + 'Col3/*/*_{}_trail_{}.png'.format(i, j) for i in subjects_nums for j in trial_nums_test]

list_ds_train = tf.data.Dataset.list_files(paths_train)
list_ds_test = tf.data.Dataset.list_files(paths_test)

# Here in this case, I did the conversion manually, on purpose for now. However, what if I don't know all the categories, or if I have 10s of them. I would like to convert the strings into numbers automatically.
def get_label(file_path):
    # convert the path to a list of path components
    parts = tf.strings.split(file_path, os.path.sep)
    # The second to last is the class-directory
    char = tf.strings.split(parts[-2], "_")[1]

    tensor = char
    if tensor == 'a':
        return 0
    elif tensor == 'ae':
        return 1
    elif tensor == 'e':
        return 2
    elif tensor == 'i':
        return 3
    else:
        return 4

def decode_img(img):
    # convert the compressed string to a 3D uint8 tensor
    img = tf.image.decode_jpeg(img, channels=3)
    # Use `convert_image_dtype` to convert to floats in the [0,1] range.
    img = tf.image.convert_image_dtype(img, tf.float32)
    # resize the image to the desired size.
    return img


def process_path(file_path):
    label = get_label(file_path)
    # load the raw data from the file as a string
    img = tf.io.read_file(file_path)
    img = decode_img(img)
    return img, label

# Use Dataset.map to create a dataset of image, label pairs:
# Set `num_parallel_calls` so multiple images are loaded/processed in parallel.
AUTOTUNE = tf.data.experimental.AUTOTUNE
labeled_ds_train = list_ds_train.map(process_path, num_parallel_calls=AUTOTUNE)
labeled_ds_test = list_ds_test.map(process_path, num_parallel_calls=AUTOTUNE)

labeled_ds_train = labeled_ds_train.cache().shuffle(buffer_size=1000).batch(32).prefetch(AUTOTUNE)
labeled_ds_test = labeled_ds_test.cache().batch(32).prefetch(AUTOTUNE)
然后检查数据集包含的内容:

for image, label in labeled_ds_train.take(1):
    print("Image shape: ", image.numpy().shape)
    print("Label: ", label.numpy())
我得到:

Image shape:  (32, 130, 267, 3)
Label:  [b'ae' b'u' b'a' b'e' b'i' b'ae' b'i' b'e' b'e' b'a' b'i' b'a' b'i' b'a' b'i' b'u' b'u' b'ae' b'e' b'a' b'e' b'ae' b'a' b'i' b'i' b'e' b'ae' b'i' b'i' b'e' b'e' b'i']
我想一个简单的方法来转换字符串的数字,在飞行中,或自动

这怎么可能

同样,首先,我有根文件夹名为元音,然后我有子文件夹名为Col3和Col4。然后,它们中的每一个都包含子文件夹元音字母a、元音字母ae、元音字母e、元音字母i和元音字母u。然后图像存储在后一个子文件夹中。图像名称如下:subject{}trial{}.png;第一个持有人反映受试者编号,第二个持有人反映受试者试验


非常感谢您的帮助

您是否也可以分享标签为\u ds\u train/test的
定义代码
@thushv89,我已经添加了上面的所有代码。非常感谢。