Python 将函数应用于特征,张量是不可破坏的。相反,使用tensor.ref()作为键

Python 将函数应用于特征,张量是不可破坏的。相反,使用tensor.ref()作为键,python,tensorflow,Python,Tensorflow,我试图对TensorFlow数据集应用一个退出函数,但在引用功能列的正确方法方面遇到了一些问题。如果只有一个输入,则函数按预期工作 import pandas as pd import tensorflow as tf import tensorflow_datasets as tfds from collections import Counter from tensorflow.keras.preprocessing.sequence import pad_sequences text

我试图对TensorFlow数据集应用一个退出函数,但在引用功能列的正确方法方面遇到了一些问题。如果只有一个输入,则函数按预期工作

import pandas as pd
import tensorflow as tf
import tensorflow_datasets as tfds
from collections import Counter
from tensorflow.keras.preprocessing.sequence import pad_sequences


text = ["I played it a while but it was alright. The steam was a bit of trouble."
        " The more they move these game to steam the more of a hard time I have"
        " activating and playing a game. But in spite of that it was fun, I "
        "liked it. Now I am looking forward to anno 2205 I really want to "
        "play my way to the moon.",
        "This game is a bit hard to get the hang of, but when you do it's great."]


df = pd.DataFrame({"text": text})

dataset = (
    tf.data.Dataset.from_tensor_slices(
        tf.cast(df.text.values, tf.string)))

tokenizer = tfds.features.text.Tokenizer()

lowercase = True
vocabulary = Counter()
for text in dataset:
    if lowercase:
        text = tf.strings.lower(text)
    tokens = tokenizer.tokenize(text.numpy())
    vocabulary.update(tokens)

vocab_size = 5000
vocabulary, _ = zip(*vocabulary.most_common(vocab_size))

max_len = 15
max_sent = 5
encoder = tfds.features.text.TokenTextEncoder(vocabulary,
                                              lowercase=True,
                                              tokenizer=tokenizer)

def encode(text):
    sent_list = []
    sents = tf.strings.split(text, sep=". ").numpy()
    if max_sent:
        sents = sents[:max_sent]
    for sent in sents:
        text_encoded = encoder.encode(sent.decode())
        if max_len:
            text_encoded = text_encoded[:max_len]
            sent_list.append(pad_sequences([text_encoded], max_len))
    if len(sent_list) < 5:
        sent_list.append([tf.zeros(max_len) for _ in range(5 - len(sent_list))])
    return tf.concat(sent_list, axis=0)

def encode_pyfn(text):
    [text_encoded] = tf.py_function(encode, inp=[text], Tout=[tf.int32])
    return text_encoded

dataset = dataset.map(encode_pyfn).batch(batch_size=2)

next(iter(dataset))
它提出了以下问题:

TypeError: in user code:

    <ipython-input-9-30172a796c2e>:69 encode_pyfn  *
        features['text'] = tf.py_function(encode, inp=features[text], Tout=[tf.int32])
    /Users/username/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:823 __hash__
        raise TypeError("Tensor is unhashable. "

    TypeError: Tensor is unhashable. Instead, use tensor.ref() as the key.
TypeError:在用户代码中:
:69编码_pyfn*
features['text']=tf.py_函数(encode,inp=features[text],Tout=[tf.int32])
/Users/username/opt/anaconda3/lib/python3.8/site packages/tensorflow/python/framework/ops.py:823\uu散列__
raise TypeError(“张量不可损坏。”
TypeError:Tensor不可修改。请改用Tensor.ref()作为键。

将函数应用于单个功能的正确方法是什么?

您可能是指今天在这里编写
features['text']
(即添加单引号)
inp=features[text]
:)@谢谢您的关注!进行编辑会将错误消息更改为“TypeError:Expected list for'input”参数为“EcenterpyFunc”Op,而不是Tensor(“args_2:0”,shape=(),dtype=string)。`我尝试取消匹配(来自另一个问题的建议),但错误仍然存在。有什么想法吗?
inp
应该是一个张量列表,所以它应该是
inp=[features['text']]
。此外,由于您没有在这里使用unbatch,因此您正在处理一批样本;因此,请确保在实现
encode
函数时考虑了这一点。@scribbles,您的问题现在解决了吗?否则,您可以共享可执行代码来复制您的问题。谢谢您可能是指今天在这里编写
features['text']
(即添加单引号)
inp=features[text]
:)@谢谢您的关注!进行编辑会将错误消息更改为“TypeError:Expected list for'input”参数为“EcenterpyFunc”Op,而不是Tensor(“args_2:0”,shape=(),dtype=string)。`我尝试取消匹配(来自另一个问题的建议),但错误仍然存在。有什么想法吗?
inp
应该是一个张量列表,所以它应该是
inp=[features['text']]
。此外,由于您没有在这里使用unbatch,因此您正在处理一批样本;因此,请确保在实现
encode
函数时考虑了这一点。@scribbles,您的问题现在解决了吗?否则,您可以共享可执行代码来复制您的问题。谢谢
TypeError: in user code:

    <ipython-input-9-30172a796c2e>:69 encode_pyfn  *
        features['text'] = tf.py_function(encode, inp=features[text], Tout=[tf.int32])
    /Users/username/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:823 __hash__
        raise TypeError("Tensor is unhashable. "

    TypeError: Tensor is unhashable. Instead, use tensor.ref() as the key.