Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 使用TransformDataset而不使用AnalyzeAndTransformDataset_Python 3.x_Tensorflow_Tensorflow2.0_Tensorflow Transform - Fatal编程技术网

Python 3.x 使用TransformDataset而不使用AnalyzeAndTransformDataset

Python 3.x 使用TransformDataset而不使用AnalyzeAndTransformDataset,python-3.x,tensorflow,tensorflow2.0,tensorflow-transform,Python 3.x,Tensorflow,Tensorflow2.0,Tensorflow Transform,我正在尝试使用tensorflow转换,我想序列化由不同转换组成的整个管道。假设我有一个不需要拟合的转换(作为数字列之间的特征交互)。我想直接在我已经定义的预处理函数上使用TransformDataset函数。无论如何,这似乎是不可能的 如果有人跑这样的路 import pprint import tempfile import apache_beam as beam import pandas as pd import tensorflow as tf import tensorflow_t

我正在尝试使用tensorflow转换,我想序列化由不同转换组成的整个管道。假设我有一个不需要拟合的转换(作为数字列之间的特征交互)。我想直接在我已经定义的预处理函数上使用
TransformDataset
函数。无论如何,这似乎是不可能的

如果有人跑这样的路

import pprint
import tempfile

import apache_beam as beam
import pandas as pd
import tensorflow as tf
import tensorflow_transform.beam as tft_beam
from tensorflow_transform.tf_metadata import dataset_metadata, schema_utils

NUMERIC_FEATURE_KEYS = ['a', 'b', 'c']
impute_dictionary = dict(b=1.0, c=0.0)

RAW_DATA_FEATURE_SPEC = dict([(name, tf.io.FixedLenFeature([], tf.float32)) for name in NUMERIC_FEATURE_KEYS])
RAW_DATA_METADATA = dataset_metadata.DatasetMetadata(schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC))


def interaction_fn(inputs):
    outputs = inputs.copy()
    new_numeric_feature_keys = []

    for i in range(len(NUMERIC_FEATURE_KEYS)):
        for j in range(i, len(NUMERIC_FEATURE_KEYS)):
            if i == j:
                outputs[f'{NUMERIC_FEATURE_KEYS[i]}_squared'] = outputs[NUMERIC_FEATURE_KEYS[i]] * outputs[NUMERIC_FEATURE_KEYS[i]]
                new_numeric_feature_keys.append(f'{NUMERIC_FEATURE_KEYS[i]}_squared')
            else:
                outputs[f'{NUMERIC_FEATURE_KEYS[i]}_{NUMERIC_FEATURE_KEYS[j]}'] = outputs[NUMERIC_FEATURE_KEYS[i]] * outputs[ NUMERIC_FEATURE_KEYS[j]]
                new_numeric_feature_keys.append(f'{NUMERIC_FEATURE_KEYS[i]}_{NUMERIC_FEATURE_KEYS[j]}')

    NUMERIC_FEATURE_KEYS.extend(new_numeric_feature_keys)

    return outputs


if __name__ == '__main__':
    temp = tempfile.gettempdir()

    data = pd.DataFrame(dict(
        a=[1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
        b=[1.0, 1.0, 1.0, 2.0, 0.0, 1.0],
        c=[0.9, 2.0, 1.0, 0.0, 0.0, 0.0]
    ))

    data.to_parquet('data_no_nans.parquet')

    x = {}
    for col in data.columns:
        x[col] = tf.constant(data[col], dtype=tf.float32, name=col)

    with beam.Pipeline() as pipeline:
        with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
            raw_data = pipeline | 'ReadTrainData' >> beam.io.ReadFromParquet('data_no_nans.parquet')
            raw_dataset = (raw_data, RAW_DATA_METADATA)
            transformed_data, _ = (raw_data, interaction_fn) | tft_beam.TransformDataset()

            transformed_data | beam.Map(pprint.pprint)  
我得到了错误

2020-02-11 15:49:37.025525: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-11 15:49:37.132944: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f87ddda6d30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-11 15:49:37.132959: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:Tensorflow version (2.1.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:tensorflow:Tensorflow version (2.1.0) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
Traceback (most recent call last):
  File "/Users/andrea.marchini/Hackathon/tfx_test/foo.py", line 56, in <module>
    transformed_data, _ = (raw_data, interaction_fn) | tft_beam.TransformDataset()
  File "/Users/andrea.marchini/.local/share/virtualenvs/tfx_test-jg7eSsGQ/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 482, in __ror__
    pvalueish, pvalues = self._extract_input_pvalues(left)
  File "/Users/andrea.marchini/.local/share/virtualenvs/tfx_test-jg7eSsGQ/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 908, in _extract_input_pvalues
    dataset_and_transform_fn)
TypeError: cannot unpack non-iterable PCollection object
2020-02-11 15:49:37.025525:I tensorflow/core/platform/cpu\u feature\u guard.cc:142]您的cpu支持未编译此tensorflow二进制文件以使用的指令:AVX2 FMA
XLA(20X02-02-11:15:49:37.132944:I TysFult/Cyp/XLA/Service / Service .CC:168)XLA服务0x7F8DDAD6D30为平台主机初始化(这不保证XLA将被使用)。设备:
2020-02-11 15:49:37.132959:I tensorflow/compiler/xla/service/service.cc:176]StreamExecutor设备(0):主机,默认版本
警告:找到tensorflow:tensorflow版本(2.1.0)。请注意,对TF2.0的Tensorflow转换支持目前处于测试阶段,TF.function等功能可能无法按预期工作。
警告:找到tensorflow:tensorflow版本(2.1.0)。请注意,对TF2.0的Tensorflow转换支持目前处于测试阶段,TF.function等功能可能无法按预期工作。
回溯(最近一次呼叫最后一次):
文件“/Users/andrea.marchini/Hackathon/tfx_test/foo.py”,第56行,在
转换的_数据,=(原始_数据,交互作用)| tft_beam.TransformDataset()
文件“/Users/andrea.marchini/.local/share/virtualenvs/tfx_test-jg7eSsGQ/lib/python3.7/site packages/apache_beam/transforms/ptransform.py”,第482行,在__
pvalueish,pvalues=self.\u提取\u输入\u pvalues(左)
文件“/Users/andrea.marchini/.local/share/virtualenvs/tfx_test-jg7eSsGQ/lib/python3.7/site packages/tensorflow_transform/beam/impl.py”,第908行,输入值
数据集_和_转换_fn)
TypeError:无法解压缩不可编辑的PCollection对象

TransformDataset
是否只应用于
AnalyzeAndTransformDataset
one的结果?

您可以尝试以下方法:

transformed_data=(原始_数据集,交互_fn)| tft_beam.TransformDataset()

我认为它试图解压缩不包含元数据的原始数据。而且
TransformDataset
只返回变量,不返回两个。

也许您可以尝试以下方法:

transformed_data=(原始_数据集,交互_fn)| tft_beam.TransformDataset()
我认为它试图解压缩不包含元数据的原始数据。而且
TransformDataset
只返回变量,而不是两个