tensorflow联合训练和评估中的MSE误差不同

tensorflow联合训练和评估中的MSE误差不同,tensorflow,machine-learning,tensorflow2.0,tensorflow-federated,federated-learning,Tensorflow,Machine Learning,Tensorflow2.0,Tensorflow Federated,Federated Learning,我正在tensorflow federated中实现一个回归模型。我从本教程中用于keras的一个简单模型开始: 我将模型更改为使用联合学习。这是我的模型: import pandas as pd import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers import tensorflow_federated as tff dataset_path = keras.util

我正在tensorflow federated中实现一个回归模型。我从本教程中用于keras的一个简单模型开始:

我将模型更改为使用联合学习。这是我的模型:

import pandas as pd
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_federated as tff

dataset_path = keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")

column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset = pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)

df = raw_dataset.copy()
df = df.dropna()
dfs = [x for _, x in df.groupby('Origin')]


datasets = []
targets = []
for dataframe in dfs:
    target = dataframe.pop('MPG')

    from sklearn.preprocessing import StandardScaler
    standard_scaler_x = StandardScaler(with_mean=True, with_std=True)
    normalized_values = standard_scaler_x.fit_transform(dataframe.values)

    dataset = tf.data.Dataset.from_tensor_slices(({ 'x': normalized_values, 'y': target.values}))
    train_dataset = dataset.shuffle(len(dataframe)).repeat(10).batch(20)
    test_dataset = dataset.shuffle(len(dataframe)).batch(1)
    datasets.append(train_dataset)


def build_model():
  model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=[7]),
    layers.Dense(64, activation='relu'),
    layers.Dense(1)
  ])
  return model
dataset_path


import collections


model = build_model()

sample_batch = tf.nest.map_structure(
    lambda x: x.numpy(), iter(datasets[0]).next())

def loss_fn_Federated(y_true, y_pred):
    return tf.reduce_mean(tf.keras.losses.MSE(y_true, y_pred))

def create_tff_model():
  keras_model_clone = tf.keras.models.clone_model(model)
#   adam = keras.optimizers.Adam()
  adam = tf.keras.optimizers.SGD(0.002)
  keras_model_clone.compile(optimizer=adam, loss='mse', metrics=[tf.keras.metrics.MeanSquaredError()])
  return tff.learning.from_compiled_keras_model(keras_model_clone, sample_batch)

print("Create averaging process")
# This command builds all the TensorFlow graphs and serializes them: 
iterative_process = tff.learning.build_federated_averaging_process(model_fn=create_tff_model)

print("Initzialize averaging process")
state = iterative_process.initialize()

print("Start iterations")
for _ in range(10):
  state, metrics = iterative_process.next(state, datasets)
  print('metrics={}'.format(metrics))
我很困惑,当迭代过程返回更小的mse时,为什么10次迭代后评估的mse对于训练集更高。我做错了什么?在tensorflow中fml的实现中隐藏了什么吗?有人能给我解释一下吗?

你实际上在联邦学习中发现了一个非常有趣的现象。特别是,这里需要问的问题是:培训指标是如何计算的

培训指标通常在本地培训期间计算;因此,它们是在客户拟合其本地数据时计算的;在TFF中,它们是在执行每个本地步骤之前计算的,这在前向传递调用期间发生。如果您想象一种极端情况,即仅在对每个客户的一轮培训结束时计算指标,您会清楚地看到一件事,即客户正在报告表示其与本地数据拟合程度的指标

然而,联邦学习必须在每轮联邦平均训练结束时生成一个单一的全局模型,这些局部模型在参数空间中一起平均。在一般情况下,不清楚如何直观地解释这样一个步骤——参数空间中的非线性模型的平均值并没有给出平均预测或类似的结果

联邦评估采用这个平均模型,并在每个客户机上运行本地评估,而根本不拟合本地数据。因此,如果您的客户机数据集具有完全不同的分布,那么您应该期望从联邦评估返回的度量与从一轮联邦培训返回的度量有很大的不同。联邦平均报告在适应本地数据的过程中收集的度量,而联邦评估报告的是将所有这些本地训练模型平均后收集的度量

实际上,如果您将对迭代过程的下一个函数和求值函数的调用交织在一起,您将看到如下模式:

train metrics=<mean_squared_error=88.22489929199219,loss=88.6319351196289>
eval metrics=<mean_squared_error=33.69473648071289,loss=33.55160140991211>
train metrics=<mean_squared_error=8.873666763305664,loss=8.882776260375977>
eval metrics=<mean_squared_error=29.235883712768555,loss=29.13833236694336>
train metrics=<mean_squared_error=7.932246208190918,loss=7.918393611907959>
eval metrics=<mean_squared_error=27.9038028717041,loss=27.866817474365234>
train metrics=<mean_squared_error=7.573018550872803,loss=7.576478958129883>
eval metrics=<mean_squared_error=27.600923538208008,loss=27.561887741088867>
train metrics=<mean_squared_error=7.228050708770752,loss=7.224897861480713>
eval metrics=<mean_squared_error=27.46322250366211,loss=27.36537742614746>
train metrics=<mean_squared_error=7.049572944641113,loss=7.03688907623291>
eval metrics=<mean_squared_error=26.755760192871094,loss=26.719152450561523>
train metrics=<mean_squared_error=6.983217716217041,loss=6.954374313354492>
eval metrics=<mean_squared_error=26.756895065307617,loss=26.647253036499023>
train metrics=<mean_squared_error=6.909178256988525,loss=6.923810005187988>
eval metrics=<mean_squared_error=27.047882080078125,loss=26.86684799194336>
train metrics=<mean_squared_error=6.8190460205078125,loss=6.79202938079834>
eval metrics=<mean_squared_error=26.209386825561523,loss=26.10053062438965>
train metrics=<mean_squared_error=6.7200140953063965,loss=6.737307071685791>
eval metrics=<mean_squared_error=26.682661056518555,loss=26.64984703063965>
你会看到这样的结果

eval metrics on 0th dataset=<mean_squared_error=9.426984786987305,loss=9.431192398071289>
eval metrics on 1st dataset=<mean_squared_error=34.96992111206055,loss=34.96992492675781>
eval metrics on 2nd dataset=<mean_squared_error=72.94075775146484,loss=72.88787841796875>
因此,您可以看到,平均模型在这三个数据集上的性能有显著差异

最后一点注意:您可能会注意到,评估函数的最终结果并不是三次损失的平均值,这是因为评估函数将采用示例加权,而不是客户加权,也就是说,数据越多的客户在平均值中的权重越大


希望这有帮助

谢谢你的回复,基思。这很有帮助。我怀疑它是这样实现的,但需要确认。我还有一个问题。本地模型如何从全局模型获取更新?根据以上结果,它不能完全被全局模型所取代。本地模型将在新一轮开始时被全局模型所取代。模型将作为本地培训的起点。该模型将由本地训练循环更新,它是训练度量报告的该训练循环的度量。在客户端上想象一个不同的状态概念可能会很有趣。如果您想在这里进行实验,请尝试阅读并分叉脚本。
<mean_squared_error=27.308320999145508,loss=27.19877052307129>
train metrics=<mean_squared_error=88.22489929199219,loss=88.6319351196289>
eval metrics=<mean_squared_error=33.69473648071289,loss=33.55160140991211>
train metrics=<mean_squared_error=8.873666763305664,loss=8.882776260375977>
eval metrics=<mean_squared_error=29.235883712768555,loss=29.13833236694336>
train metrics=<mean_squared_error=7.932246208190918,loss=7.918393611907959>
eval metrics=<mean_squared_error=27.9038028717041,loss=27.866817474365234>
train metrics=<mean_squared_error=7.573018550872803,loss=7.576478958129883>
eval metrics=<mean_squared_error=27.600923538208008,loss=27.561887741088867>
train metrics=<mean_squared_error=7.228050708770752,loss=7.224897861480713>
eval metrics=<mean_squared_error=27.46322250366211,loss=27.36537742614746>
train metrics=<mean_squared_error=7.049572944641113,loss=7.03688907623291>
eval metrics=<mean_squared_error=26.755760192871094,loss=26.719152450561523>
train metrics=<mean_squared_error=6.983217716217041,loss=6.954374313354492>
eval metrics=<mean_squared_error=26.756895065307617,loss=26.647253036499023>
train metrics=<mean_squared_error=6.909178256988525,loss=6.923810005187988>
eval metrics=<mean_squared_error=27.047882080078125,loss=26.86684799194336>
train metrics=<mean_squared_error=6.8190460205078125,loss=6.79202938079834>
eval metrics=<mean_squared_error=26.209386825561523,loss=26.10053062438965>
train metrics=<mean_squared_error=6.7200140953063965,loss=6.737307071685791>
eval metrics=<mean_squared_error=26.682661056518555,loss=26.64984703063965>
eval_metrics = evaluation(state.model, [datasets[0]])
print('eval metrics on 0th dataset={}'.format(eval_metrics))
eval_metrics = evaluation(state.model, [datasets[1]])
print('eval metrics on 1st dataset={}'.format(eval_metrics))
eval_metrics = evaluation(state.model, [datasets[2]])
print('eval metrics on 2nd dataset={}'.format(eval_metrics))
eval metrics on 0th dataset=<mean_squared_error=9.426984786987305,loss=9.431192398071289>
eval metrics on 1st dataset=<mean_squared_error=34.96992111206055,loss=34.96992492675781>
eval metrics on 2nd dataset=<mean_squared_error=72.94075775146484,loss=72.88787841796875>