构建TensorFlow:从Bazel根外部导入MPI头

构建TensorFlow:从Bazel根外部导入MPI头,tensorflow,build,bazel,openmpi,Tensorflow,Build,Bazel,Openmpi,我想在Ubuntu 16.04上构建TensorFlow 1.3(不是1.13),支持MPI(而不是默认的gRPC)。我从Ubuntu repos安装了这个包libopenmpi-dev。在运行configure脚本时,我提供了/usr/lib/openmpi作为MPI工具包目录 我使用以下命令启动生成: $ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package 但存在一个

我想在Ubuntu 16.04上构建TensorFlow 1.3(不是1.13),支持MPI(而不是默认的gRPC)。我从Ubuntu repos安装了这个包
libopenmpi-dev
。在运行
configure
脚本时,我提供了
/usr/lib/openmpi
作为MPI工具包目录

我使用以下命令启动生成:

$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
但存在一个标题包含问题:

  • 文件
    tensorflow/contrib/mpi/mpi_utils.cc
    包括
    tensorflow/contrib/mpi/mpi_utils.h
  • mpi_utils.h
    包括
    第三方/mpi/mpi.h
  • mpi.h
    是指向
    /usr/lib/openmpi/include/mpi.h
  • 此实际
    mpi.h
    包含以下行:
  • mpicxx.h
    位于文件夹
    /usr/lib/openmpi/include/openmpi/ompi/mpi/cxx/
    中,该文件夹不在include路径中
  • 我已通过创建指向正确文件夹的符号链接“修复”了此问题:

    $ ln -s /usr/lib/openmpi/include/openmpi third_party/mpi/openmpi
    
    现在找到了
    mpicxx.h
    ,但它希望包含
    mpi.h
    ,但由于
    mpi.h
    不在同一文件夹中,因此失败:

    $ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
    WARNING: /home/arno/tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorflow/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:exporter': No longer supported. Switch to SavedModel immediately.
    WARNING: /home/arno/tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorflow/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:gc': No longer supported. Switch to SavedModel immediately.
    INFO: Found 1 target...
    ERROR: /home/arno/tensorflow/tensorflow/contrib/mpi/BUILD:60:1: C++ compilation of rule '//tensorflow/contrib/mpi:mpi_rendezvous_mgr' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 151 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
    In file included from ./third_party/mpi/mpi.h:2673:0,
                     from ./tensorflow/contrib/mpi/mpi_utils.h:27,
                     from ./tensorflow/contrib/mpi/mpi_rendezvous_mgr.h:33,
                     from tensorflow/contrib/mpi/mpi_rendezvous_mgr.cc:18:
    ./third_party/mpi/openmpi/ompi/mpi/cxx/mpicxx.h:35:17: fatal error: mpi.h: No such file or directory
    compilation terminated.
    Target //tensorflow/tools/pip_package:build_pip_package failed to build
    Use --verbose_failures to see the command lines of failed build steps.
    ERROR: /home/arno/tensorflow/tensorflow/tools/pip_package/BUILD:134:1 C++ compilation of rule '//tensorflow/contrib/mpi:mpi_rendezvous_mgr' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 151 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
    INFO: Elapsed time: 6.668s, Critical Path: 4.98s
    
    我已尝试使用以下命令手动将标题的路径添加到GCC的包含路径:

    $ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package --copt='-I/usr/lib/openmpi/include'
    
    …但是我得到了错误,因为Bazel的配置文件中没有声明从
    /usr/lib/openmpi/include/openmpi/ompi/mpi/cxx
    中包含的头。我不能向Bazel声明它们,因为它不接受绝对路径

    我找不到正确的方法来进行构建。我是Bazel的新手,从我所读到的内容来看,构建目录应该是“自包含的”,即包含所有必要的头文件和源文件,但是TensorFlow存储库违反了这一点,在
    第三方/mpi
    中向
    /usr/lib/…
    添加了符号链接。更改TensorFlow版本不是一个选项

    如何使用OpenMPI支持构建TensorFlow 1.3

    编辑:向Bazel build命令添加
    -s
    选项,如注释中所建议的,可以提供更详细的输出,但我无法辨别使用了哪个编译器。我认为这些是相关的方面:

    >>>>> # //tensorflow/core/kernels:transpose_functor [action 'Compiling tensorflow/core/kernels/transpose_functor_cpu.cc']
    (cd /home/arno/.cache/bazel/_bazel_arno/e7d941e3336cbc1a05a122432422a066/execroot/tensorflow && \
      exec env - \
        CUDA_TOOLKIT_PATH=/usr/local/cuda \
        CUDNN_INSTALL_PATH=/usr/local/lib \
        GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/arno/bin \
        PYTHON_BIN_PATH=/usr/bin/python3 \
        PYTHON_LIB_PATH=/usr/lib/python3/dist-packages \
        TF_CUDA_CLANG=0 \
        TF_CUDA_COMPUTE_CAPABILITIES=6.2 \
        TF_CUDA_VERSION=8.0 \
        TF_CUDNN_VERSION=6 \
        TF_NEED_CUDA=1 \
        TF_NEED_OPENCL=0 \
      external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-march=native' '-std=c++11' '-march=native' -MD -MF bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/transpose_functor/tensorflow/core/kernels/transpose_functor_cpu.pic.d '-frandom-seed=bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/transpose_functor/tensorflow/core/kernels/transpose_functor_cpu.pic.o' -fPIC -DEIGEN_MPL2_ONLY -DTENSORFLOW_USE_JEMALLOC -DSNAPPY -DTENSORFLOW_USE_MPI -iquote . -iquote bazel-out/local_linux-py3-opt/genfiles -iquote external/jemalloc -iquote bazel-out/local_linux-py3-opt/genfiles/external/jemalloc -iquote external/bazel_tools -iquote bazel-out/local_linux-py3-opt/genfiles/external/bazel_tools -iquote external/protobuf -iquote bazel-out/local_linux-py3-opt/genfiles/external/protobuf -iquote external/eigen_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/local_linux-py3-opt/genfiles/external/local_config_sycl -iquote external/gif_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/gif_archive -iquote external/jpeg -iquote bazel-out/local_linux-py3-opt/genfiles/external/jpeg -iquote external/com_googlesource_code_re2 -iquote bazel-out/local_linux-py3-opt/genfiles/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/local_linux-py3-opt/genfiles/external/fft2d -iquote external/highwayhash -iquote bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -iquote external/png_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/png_archive -iquote external/zlib_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -iquote external/snappy -iquote bazel-out/local_linux-py3-opt/genfiles/external/snappy -iquote external/local_config_cuda -iquote bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda -isystem external/jemalloc/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/jemalloc/include -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/protobuf/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/protobuf/src -isystem external/eigen_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -isystem external/gif_archive/lib -isystem bazel-out/local_linux-py3-opt/genfiles/external/gif_archive/lib -isystem external/farmhash_archive/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive/src -isystem external/png_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/png_archive -isystem external/zlib_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -isystem external/local_config_cuda/cuda -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda/cuda/include -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare -fno-exceptions '-DGOOGLE_CUDA=1' -msse3 -pthread '-DGOOGLE_CUDA=1' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -c tensorflow/core/kernels/transpose_functor_cpu.cc -o bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/transpose_functor/tensorflow/core/kernels/transpose_functor_cpu.pic.o)
    ERROR: /home/arno/tensorflow/tensorflow/contrib/mpi/BUILD:48:1: C++ compilation of rule '//tensorflow/contrib/mpi:mpi_utils' failed: crosstool_wrapper_driver_is_n
    ot_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 131 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
    In file included from ./third_party/mpi/mpi.h:2673:0,
                     from ./tensorflow/contrib/mpi/mpi_utils.h:27,
                     from tensorflow/contrib/mpi/mpi_utils.cc:18:
    ./third_party/mpi/openmpi/ompi/mpi/cxx/mpicxx.h:35:17: fatal error: mpi.h: No such file or directory
    compilation terminated.
    

    解决方法是从源代码构建和安装MVAPICH(MPI工具包路径为
    /usr/local
    )。问题只存在于OpenMPI。

    “我提供了
    /usr/lib/openmp
    …”这是一个打字错误吗?OpenMP与OpenMPI并没有真正的联系。这是一个输入错误,我会修正它。我提供了
    /usr/lib/openmpi
    。请尝试将
    -s
    标志添加到bazel build命令中,如下所示
    bazel build-s…
    ,以显示正在调用的实际命令。构建MPI应用程序时,必须使用
    mpicc
    mpic++
    编译器包装器进行编译。特别是为了避免像这样的库特定的头把戏。您可以通过使用
    CC=mpicc
    CXX=mpicxx
    环境变量强制bazel使用MPI编译器包装器。您可以在使用bazel的终端中导出这些内容,也可以在bazel调用之前编写它们,就像这样
    CC=mpicc CXX=mpicxx bazel build…
    。我已经在我的帖子中添加了
    bazel build-s…
    的输出。我试着设置
    CC
    CXX
    ,甚至使用
    --action\u env=CC--action\u env=CXX
    ,bazel似乎没有考虑到它们:在从另一个终端构建过程中运行
    ps-e
    会显示几个
    gcc
    进程处于活动状态,但是没有mpicc。很高兴你弄明白了,我花了很多时间试图说服bazel承认openmpi安装。
    >>>>> # //tensorflow/core/kernels:transpose_functor [action 'Compiling tensorflow/core/kernels/transpose_functor_cpu.cc']
    (cd /home/arno/.cache/bazel/_bazel_arno/e7d941e3336cbc1a05a122432422a066/execroot/tensorflow && \
      exec env - \
        CUDA_TOOLKIT_PATH=/usr/local/cuda \
        CUDNN_INSTALL_PATH=/usr/local/lib \
        GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/arno/bin \
        PYTHON_BIN_PATH=/usr/bin/python3 \
        PYTHON_LIB_PATH=/usr/lib/python3/dist-packages \
        TF_CUDA_CLANG=0 \
        TF_CUDA_COMPUTE_CAPABILITIES=6.2 \
        TF_CUDA_VERSION=8.0 \
        TF_CUDNN_VERSION=6 \
        TF_NEED_CUDA=1 \
        TF_NEED_OPENCL=0 \
      external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-march=native' '-std=c++11' '-march=native' -MD -MF bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/transpose_functor/tensorflow/core/kernels/transpose_functor_cpu.pic.d '-frandom-seed=bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/transpose_functor/tensorflow/core/kernels/transpose_functor_cpu.pic.o' -fPIC -DEIGEN_MPL2_ONLY -DTENSORFLOW_USE_JEMALLOC -DSNAPPY -DTENSORFLOW_USE_MPI -iquote . -iquote bazel-out/local_linux-py3-opt/genfiles -iquote external/jemalloc -iquote bazel-out/local_linux-py3-opt/genfiles/external/jemalloc -iquote external/bazel_tools -iquote bazel-out/local_linux-py3-opt/genfiles/external/bazel_tools -iquote external/protobuf -iquote bazel-out/local_linux-py3-opt/genfiles/external/protobuf -iquote external/eigen_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/local_linux-py3-opt/genfiles/external/local_config_sycl -iquote external/gif_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/gif_archive -iquote external/jpeg -iquote bazel-out/local_linux-py3-opt/genfiles/external/jpeg -iquote external/com_googlesource_code_re2 -iquote bazel-out/local_linux-py3-opt/genfiles/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/local_linux-py3-opt/genfiles/external/fft2d -iquote external/highwayhash -iquote bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -iquote external/png_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/png_archive -iquote external/zlib_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -iquote external/snappy -iquote bazel-out/local_linux-py3-opt/genfiles/external/snappy -iquote external/local_config_cuda -iquote bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda -isystem external/jemalloc/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/jemalloc/include -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/protobuf/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/protobuf/src -isystem external/eigen_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -isystem external/gif_archive/lib -isystem bazel-out/local_linux-py3-opt/genfiles/external/gif_archive/lib -isystem external/farmhash_archive/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive/src -isystem external/png_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/png_archive -isystem external/zlib_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -isystem external/local_config_cuda/cuda -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda/cuda/include -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare -fno-exceptions '-DGOOGLE_CUDA=1' -msse3 -pthread '-DGOOGLE_CUDA=1' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -c tensorflow/core/kernels/transpose_functor_cpu.cc -o bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/transpose_functor/tensorflow/core/kernels/transpose_functor_cpu.pic.o)
    ERROR: /home/arno/tensorflow/tensorflow/contrib/mpi/BUILD:48:1: C++ compilation of rule '//tensorflow/contrib/mpi:mpi_utils' failed: crosstool_wrapper_driver_is_n
    ot_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 131 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
    In file included from ./third_party/mpi/mpi.h:2673:0,
                     from ./tensorflow/contrib/mpi/mpi_utils.h:27,
                     from tensorflow/contrib/mpi/mpi_utils.cc:18:
    ./third_party/mpi/openmpi/ompi/mpi/cxx/mpicxx.h:35:17: fatal error: mpi.h: No such file or directory
    compilation terminated.