如何使用SSE4.2和AVX指令编译Tensorflow？_Tensorflow_X86_Compiler Optimization_Simd_Compiler Options

如何使用SSE4.2和AVX指令编译Tensorflow？

tensorflow x86

如何使用SSE4.2和AVX指令编译Tensorflow？,tensorflow,x86,compiler-optimization,simd,compiler-options,Tensorflow,X86,Compiler Optimization,Simd,Compiler Options,这是运行脚本检查Tensorflow是否工作时收到的消息： I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally I tensorflo

这是运行脚本检查Tensorflow是否工作时收到的消息：

I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

我注意到它提到了SSE4.2和AVX

什么是SSE4.2和AVX

这些SSE4.2和AVX如何改进Tensorflow任务的CPU计算

如何使用这两个库编译Tensorflow

这些是

对于许多任务，使用矢量指令更快；机器学习就是这样一项任务

引述：

为了与尽可能广泛的机器兼容，TensorFlow默认只在x86机器上使用SSE4.1 SIMD指令。大多数现代PC和Mac都支持更高级的指令，因此，如果您正在构建一个只能在您自己的机器上运行的二进制文件，您可以在bazel build命令中使用

--copt=-march=native

来启用这些指令

我遇到了同样的问题，看起来Yaroslav Bulatov的建议不包括SSE4.2支持，添加

--copt=-msse4.2

就足够了。最后，我成功地用

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

没有得到任何警告或错误

任何系统的最佳选择可能是：

bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

（更新：，可能是因为它包含一个
=
）

-mfpmath=both

仅适用于gcc，不适用于clang

-mfpmath=sse

即使不是更好，也可能同样好，并且是x86-64的默认值。32位构建默认值为

-mfpmath=387

，因此更改该值将有助于32位。（但如果您想要高性能的数字处理，您应该构建64位二进制文件。）

我不确定TensorFlow对

-O2

或

-O3

的默认设置是什么

gcc-O3

支持包括自动矢量化在内的全面优化，但这有时会降低代码的速度

此操作：将一个选项直接传递给GCC，用于编译C和C++文件（但不链接，因此需要跨文件链接时间优化的不同选项）

x86-64 gcc默认仅使用SSE2或更旧的SIMD指令，因此您可以在任何x86-64系统上运行二进制文件。（见附件）。那不是你想要的。您希望生成一个二进制文件，它可以利用CPU可以运行的所有指令，因为您只在构建该二进制文件的系统上运行该二进制文件

-march=native

启用CPU支持的所有选项，因此它使

-mavx512f-mavx2-mavx-mfma-msse4.2

冗余。（另外，

-mavx2

已经启用了

-mavx

和

-msse4.2

，因此雅罗斯拉夫的命令应该是正确的）。此外，如果您使用的CPU不支持这些选项之一（如FMA），则使用

-mfma

将生成二进制文件，该二进制文件会出现非法指令故障

，因此使用它应该避免需要手动指定编译器选项

-march=native

启用了

-mtune=native

，因此对于诸如哪种AVX指令序列最适合于未对齐的负载之类的问题

这一切都适用于gcc、clang或ICC。（对于ICC，您可以使用

-xHOST

而不是

-march=native

）

让我先回答您的第三个问题：

如果要在conda env中运行自编译版本，可以。这些是我运行的一般说明，以使tensorflow与其他说明一起安装到我的系统上。注意：此版本适用于运行Ubuntu 16.04 LTS的AMD A10-7850版本（检查您的CPU是否支持哪些指令…可能有所不同）。我在conda环境中使用Python 3.5。tensorflow源代码安装页面和上面提供的答案都是值得称赞的

git clone https://github.com/tensorflow/tensorflow 
# Install Bazel
# https://bazel.build/versions/master/docs/install.html
sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel
# Create your virtual env with conda.
source activate YOUR_ENV
pip install six numpy wheel, packaging, appdir
# Follow the configure instructions at:
# https://www.tensorflow.org/install/install_sources
# Build your build like below. Note: Check what instructions your CPU 
# support. Also. If resources are limited consider adding the following 
# tag --local_resources 2048,.5,1.0 . This will limit how much ram many
# local resources are used but will increase time to compile.
bazel build -c opt --copt=-mavx --copt=-msse4.1 --copt=-msse4.2  -k //tensorflow/tools/pip_package:build_pip_package
# Create the wheel like so:
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# Inside your conda env:
pip install /tmp/tensorflow_pkg/NAME_OF_WHEEL.whl
# Then install the rest of your stack
pip install keras jupyter etc. etc.

关于你的第二个问题：

在我看来，一个带有优化的自编译版本是非常值得的。在我的特殊设置中，过去需要560-600秒的计算现在只需要大约300秒！虽然确切的数字会有所不同，但我认为在您的特定设置中，您通常可以预期35-50%的速度提升

最后你的第一个问题：

上面已经提供了很多答案。总之，MFA是X86 CPU上的不同类型的扩展指令集。许多包含用于处理矩阵或向量运算的优化指令

我将强调我自己的误解，希望能为您节省一些时间：并不是说SSE4.2是取代SSE4.1的更新版本的说明。SSE4=SSE4.1（一组47条指令）+SSE4.2（一组7条指令）

在tensorflow编译的上下文中，如果您的计算机支持AVX2和AVX，以及SSE4.1和SSE4.2，那么您应该将这些优化标志全部放入。不要像我那样做，只使用SSE4.2，认为它更新了，应该超越SSE4.1。这显然是错误的！我不得不重新编译，因为这花费了我整整40分钟的时间。

让我们从解释为什么您首先看到这些警告开始

很可能您没有从源代码安装TF，而是使用了类似于
pip install tensorflow
的东西。这意味着您安装了未针对您的体系结构进行优化的预构建（由其他人）二进制文件。这些警告确切地告诉您：您的体系结构上有一些可用的东西，但不会使用它，因为二进制文件不是用它编译的。这是我的部分
TensorFlow在启动时检查是否已使用 CPU上可用的优化。如果没有进行优化包括，TensorFlow将发出警告，例如AVX、AVX2和FMA 不包括说明
好的是，你很可能只是想学习/实验TF，这样一切都会正常工作，你不必担心它

什么是SSE4.2和AVX？
维基百科对和有很好的解释。这方面的知识并不需要擅长
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]

import os os.environ['TF_CPP_MIN_LOG_LEVEL']='2' import tensorflow as tf

Using python library path: /usr/local/lib/python2.7/dist-packages Do you wish to build TensorFlow with MKL support? [y/N] y MKL support will be enabled for TensorFlow Do you wish to download MKL LIB from the web? [Y/n] Y Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: Do you wish to use jemalloc as the malloc implementation? [Y/n] n jemalloc disabled Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N No Google Cloud Platform support will be enabled for TensorFlow Do you wish to build TensorFlow with Hadoop File System support? [y/N] N No Hadoop File System support will be enabled for TensorFlow Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] N No XLA JIT support will be enabled for TensorFlow Do you wish to build TensorFlow with VERBS support? [y/N] N No VERBS support will be enabled for TensorFlow Do you wish to build TensorFlow with OpenCL support? [y/N] N No OpenCL support will be enabled for TensorFlow Do you wish to build TensorFlow with CUDA support? [y/N] N No CUDA support will be enabled for TensorFlow

-mavx -mavx2 -mfma -msse4.2

git clone https://github.com/tensorflow/tensorflow.git cd tensorflow #The repo defaults to the master development branch. You can also checkout a release branch to build: git checkout r2.0 #Configure the Build => Use the Below line for Windows Machine python ./configure.py #Configure the Build => Use the Below line for Linux/MacOS Machine ./configure #This script prompts you for the location of TensorFlow dependencies and asks for additional build configuration options. #Build Tensorflow package #CPU support bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package #GPU support bazel build --config=opt --config=cuda --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package