简单的TensorFlow计算无法在不同的系统上重复（macOS、Colab、Azure）_Azure_Tensorflow_Google Colaboratory_Reproducible Research

简单的TensorFlow计算无法在不同的系统上重复（macOS、Colab、Azure）

azure tensorflow google-colaboratory

简单的TensorFlow计算无法在不同的系统上重复（macOS、Colab、Azure）,azure,tensorflow,google-colaboratory,reproducible-research,Azure,Tensorflow,Google Colaboratory,Reproducible Research,我正在调查TensorFlow中的代码在我的macOS机器、Google Colab和Docker的Azure上的可复制性。我知道我可以设置图形级种子和操作级种子。我使用的是渴望模式（因此没有并行优化）和GPU。我从单位正态分布中随机抽取100x100个样本，计算它们的平均值和标准偏差下面的测试代码验证我没有使用GPU，我使用的是Tensorflow 1.12.0或Tensorflow 2的预览，tensor ifFloat32，随机张量的第一个元素（如果我只设置了图形级种子或操作级种子，则其

我正在调查TensorFlow中的代码在我的macOS机器、Google Colab和Docker的Azure上的可复制性。我知道我可以设置图形级种子和操作级种子。我使用的是渴望模式（因此没有并行优化）和GPU。我从单位正态分布中随机抽取100x100个样本，计算它们的平均值和标准偏差

下面的测试代码验证我没有使用GPU，我使用的是Tensorflow 1.12.0或Tensorflow 2的预览，tensor if

Float32

，随机张量的第一个元素（如果我只设置了图形级种子或操作级种子，则其值不同），它们的平均值和标准偏差。我还设置了NumPy的随机种子，尽管我在这里不使用它：

将numpy导入为np
导入tensorflow作为tf
def tf_1（）：
“”“如果TensorFlow是版本1，则返回True”“”
返回tf.\uuuu版本\uuuuu.startswith（“1”）
def格式_编号（n）：
“”“返回逗号后带12个数字的数字字符串。”“”
返回“%1.12f”%n
def set_top_level_seeds（）：
“”“设置TensorFlow图形级别种子和Numpy种子。”“”
如果tf_1（）：
tf.set_random_seed（0）
其他：
tf.random.set_种子（0）
np.random.seed（0）
def生成随机数（操作种子=无）：
“”“返回随机正常绘图”“”
如果是op_种子：
t=tf.random.normal（[100100]，seed=op_seed）
其他：
t=tf.random.normal（[100100]）
返回t
def生成随机数统计数据（操作种子=无）：
“”“返回随机正常绘图的平均值和标准偏差”“”
t=生成随机数（op_种子=op_种子）
平均值=tf。减少平均值（t）
sdev=tf.sqrt（tf.reduce_-mean（tf.square（t-mean）））
返回[格式（n）表示n in（平均值，sdev）]
def生成随机数种子（）
“”“返回一个仅包含图形级种子的随机数。”“”
设置顶级种子（）
num=生成随机数（）[0，0]
返回数
def生成随机数种子（）
“”“返回一个仅包含图形级种子的随机数。”“”
设置顶级种子（）
num=生成随机数（op\u seed=1）[0，0]
返回数
def generate_stats_1_seed（）：
“”“仅返回图形级别种子的平均值和标准偏差。”“”
设置顶级种子（）
返回生成\u随机数\u统计数据\u str（）
def生成_统计信息_2_种子（）：
“”“返回带有图形和运算种子的平均值和标准偏差。”“”
设置顶级种子（）
返回生成\u随机数\u统计\u str（op\u seed=1）
类测试（tf.test.TestCase）：
“”“运行TensorFlow的再现性测试。”“”
def测试gpu（自身）：
self.assertEqual（False，tf.test.is\u gpu\u available（））
def测试_版本（自身）：
self.assertTrue（tf.\uuuuuuu版本==“1.12.0”或
tf.\uuuuuuu版本\uuuuuuuuu.startswith（“2.0.0-dev2019”））
def测试_类型（自身）：
num\u type=生成\u随机数\u 1\u种子（）
self.assertEqual（num_类型，tf.float32）
def测试执行（自我）：
self.assertEqual（True，tf.executing_急切地（）
def测试\u随机数\u 1\u种子（自身）：
num\u str=格式化\u编号（生成\u随机\u编号\u 1\u种子（）
自评资产质量（num_str，“1.511062622070”）
def测试_随机数_2_种子（自身）：
num\u str=格式化\u编号（生成\u随机\u编号\u 2\u种子（）
自评资产质量（num_str，“0.680345416069”）
def测试算法1种子（自身）：
m、 s=生成统计数据1种子（）
如果tf_1（）：
self.assertEqual（m，“-0.008264393546”）
自评资产质量（s，“0.995371103287”）
其他：
self.assertEqual（m，“-0.008264398202”）
自评资产质量（s，“0.995371103287”）
def test_算术_2_种子（自身）：
m、 s=生成统计数据种子（）
如果tf_1（）：
自评资产质量（m，“0.000620653736”）
自评资产质量（s，“0.997191190720”）
其他：
自评资产质量（m，“0.000620646286”）
自评资产质量（s，“0.997191071510”）
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu'：
tf.reset_default_graph（）
如果tf_1（）：
tf.enable_eager_execution（）
设置详细性（tf.logging.ERROR）
tf.test.main（）

在我的本地机器上，测试通过了TensorFlow 1.12.0或TensorFlow 2的预览版，在虚拟环境中，我使用

pip install TensorFlow==1.12.0

或

pip install tf-nightly-2.0-preview安装了TensorFlow。请注意，第一次随机抽取在两个版本中都是相同的，因此我假设所有随机数都是相同的，但小数点后9位的平均值和标准偏差是不同的。因此，TensorFlow在不同版本中实现的计算略有不同
在googlecolab上，我将最后一个命令替换为importunittest；main（argv=['first-arg-is-ignored'，exit=False）
（请参阅）。所有测试一次通过：相同的随机数，相同的平均值和标准偏差，图形级种子。失败的测试是使用图形级种子和操作级种子计算平均值，差值从小数点后第九位开始：
.F.......
======================================================================
FAIL: test_arithmetic_2_seeds (__main__.Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-7-16d0afebf95f>", line 109, in test_arithmetic_2_seeds
    self.assertEqual(m, "0.000620653736")
AssertionError: '0.000620654086' != '0.000620653736'
- 0.000620654086
?           ^^^
+ 0.000620653736
?           ^^^


----------------------------------------------------------------------
Ran 9 tests in 0.023s

FAILED (failures=1)

在仅图形级种子以及图形级和操作级种子的两种情况下，算术测试均失败：
FF.......
======================================================================
FAIL: test_arithmetic_1_seed (__main__.Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests.py", line 99, in test_arithmetic_1_seed
    self.assertEqual(m, "-0.008264393546")
AssertionError: '-0.008264395408' != '-0.008264393546'
- -0.008264395408
?              ^^
+ -0.008264393546
?            +  ^


======================================================================
FAIL: test_arithmetic_2_seeds (__main__.Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests.py", line 109, in test_arithmetic_2_seeds
    self.assertEqual(m, "0.000620653736")
AssertionError: '0.000620655250' != '0.000620653736'
- 0.000620655250
+ 0.000620653736


----------------------------------------------------------------------
Ran 9 tests in 0.016s

FAILED (failures=2)

当测试在Google Colab或Azure上失败时，它们的平均值与实际值一致，因此我相信问题不是我可以设置的其他随机种子
为了确定问题是否是TensorFlow在不同系统上的实现，我在Azure上测试TensorFlow的不同映像（TensorFlow/TensorFlow:latest
，没有-py3
标记），带有顶级种子的随机数也不同：
FF..F....
======================================================================
FAIL: test_arithmetic_1_seed (__main__.Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests.py", line 99, in test_arithmetic_1_seed
    self.assertEqual(m, "-0.008264393546")
AssertionError: '0.001101632486' != '-0.008264393546'

======================================================================
FAIL: test_arithmetic_2_seeds (__main__.Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests.py", line 109, in test_arithmetic_2_seeds
    self.assertEqual(m, "0.000620653736")
AssertionError: '0.000620655250' != '0.000620653736'

======================================================================
FAIL: test_random_number_1_seed (__main__.Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests.py", line 89, in test_random_number_1_seed
    self.assertEqual(num_str, "1.511062622070")
AssertionError: '-1.398459434509' != '1.511062622070'

----------------------------------------------------------------------
Ran 9 tests in 0.015s

如何确保TensorFlow计算在不同系统上的再现性？浮点计算的精度将取决于库
FF..F....
======================================================================
FAIL: test_arithmetic_1_seed (__main__.Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests.py", line 99, in test_arithmetic_1_seed
    self.assertEqual(m, "-0.008264393546")
AssertionError: '0.001101632486' != '-0.008264393546'

======================================================================
FAIL: test_arithmetic_2_seeds (__main__.Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests.py", line 109, in test_arithmetic_2_seeds
    self.assertEqual(m, "0.000620653736")
AssertionError: '0.000620655250' != '0.000620653736'

======================================================================
FAIL: test_random_number_1_seed (__main__.Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests.py", line 89, in test_random_number_1_seed
    self.assertEqual(num_str, "1.511062622070")
AssertionError: '-1.398459434509' != '1.511062622070'

----------------------------------------------------------------------
Ran 9 tests in 0.015s