Python 使用多GPU的KERA分布式训练-分段故障(核心转储)

Python 使用多GPU的KERA分布式训练-分段故障(核心转储),python,python-3.x,tensorflow,machine-learning,keras,Python,Python 3.x,Tensorflow,Machine Learning,Keras,我尝试在两个节点上使用分布式kera运行官方示例,每个节点上有一个GPU。我在第一个节点上运行TF_-CONFIG='{“cluster”:{“worker”:[“ip1:2222”,“ip2:2222”]},“task”:{“index”:0,“type”:“worker”}python3 test.py,在第二个节点上运行TF_-CONFIG='{“cluster”:{“worker”:[“ip1:2222”,“ip2:2222”},“task”:{“index”:1,“type”:“wor

我尝试在两个节点上使用分布式kera运行官方示例,每个节点上有一个GPU。我在第一个节点上运行
TF_-CONFIG='{“cluster”:{“worker”:[“ip1:2222”,“ip2:2222”]},“task”:{“index”:0,“type”:“worker”}python3 test.py
,在第二个节点上运行
TF_-CONFIG='{“cluster”:{“worker”:[“ip1:2222”,“ip2:2222”},“task”:{“index”:1,“type”:“worker”}
。当我打印
device\u lib.list\u local\u devices()
时,它们都会检测到GPU,但是我得到如下所示的错误。当我在没有
TF_CONFIG
的情况下单独运行它们时,一切都正常工作。你知道怎么了吗

节点1:

2019-11-13 18:20:00.974896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:worker/replica:0/task:0/device:GPU:0 with 7658 MB memory) -> physical GPU (device: 0, pci bus id: 0000:84:00.0, compute capability: 3.5)
2019-11-13 18:20:00.977161: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:250] Initialize GrpcChannelCache for job worker -> {0 -> localhost:2222, 1 -> ip2:2222}
2019-11-13 18:20:00.981865: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:365] Started server with target: grpc://localhost:2222
Segmentation fault (core dumped)
节点2:

2019-11-13 18:20:04.121540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:worker/replica:0/task:1/device:GPU:0 with 7659 MB memory) -> physical GPU (device: 0, pci bus id: 0000:84:00.0, compute capability: 3.5)
2019-11-13 18:20:04.123868: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:250] Initialize GrpcChannelCache for job worker -> {0 -> ip1:2222, 1 -> localhost:2222}
2019-11-13 18:20:04.129259: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:365] Started server with target: grpc://localhost:2222
Segmentation fault (core dumped)