为什么要花很长时间才能在Alpine Linux上安装Pandas
我注意到在Docker容器中安装Pandas和Numpy(它的依赖项)需要更长的时间,使用的是基本操作系统Alpine vs.CentOS或Debian。我在下面创建了一个小测试来演示时间差。除了更新和下载构建依赖项以安装Pandas和Numpy所需的几秒钟之外,为什么setup.py比Debian安装花费的时间多70倍 是否有任何方法可以使用Alpine作为基础映像加快安装速度,或者是否有其他与Alpine大小相当的基础映像更好地用于Pandas和Numpy之类的软件包 Dockerfile.debian为什么要花很长时间才能在Alpine Linux上安装Pandas,pandas,numpy,docker,alpine,Pandas,Numpy,Docker,Alpine,我注意到在Docker容器中安装Pandas和Numpy(它的依赖项)需要更长的时间,使用的是基本操作系统Alpine vs.CentOS或Debian。我在下面创建了一个小测试来演示时间差。除了更新和下载构建依赖项以安装Pandas和Numpy所需的几秒钟之外,为什么setup.py比Debian安装花费的时间多70倍 是否有任何方法可以使用Alpine作为基础映像加快安装速度,或者是否有其他与Alpine大小相当的基础映像更好地用于Pandas和Numpy之类的软件包 Dockerfile.
FROM python:3.6.4-slim-jessie
RUN pip install pandas
用Pandas&Numpy打造Debian形象:
[PandasDockerTest] time docker build -t debian-pandas -f Dockerfile.debian . --no-cache
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM python:3.6.4-slim-jessie
---> 43431c5410f3
Step 2/2 : RUN pip install pandas
---> Running in 2e4c030f8051
Collecting pandas
Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
Collecting numpy>=1.9.0 (from pandas)
Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
Collecting pytz>=2011k (from pandas)
Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)
Collecting python-dateutil>=2 (from pandas)
Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
Collecting six>=1.5 (from python-dateutil>=2->pandas)
Downloading six-1.11.0-py2.py3-none-any.whl
Installing collected packages: numpy, pytz, six, python-dateutil, pandas
Successfully installed numpy-1.14.1 pandas-0.22.0 python-dateutil-2.6.1 pytz-2018.3 six-1.11.0
Removing intermediate container 2e4c030f8051
---> a71e1c314897
Successfully built a71e1c314897
Successfully tagged debian-pandas:latest
docker build -t debian-pandas -f Dockerfile.debian . --no-cache 0.07s user 0.06s system 0% cpu 13.605 total
[PandasDockerTest] time docker build -t alpine-pandas -f Dockerfile.alpine . --no-cache
Sending build context to Docker daemon 16.9kB
Step 1/3 : FROM python:3.6.4-alpine3.7
---> 4b00a94b6f26
Step 2/3 : RUN apk --update add --no-cache g++
---> Running in 4b0c32551e3f
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/17) Upgrading musl (1.1.18-r2 -> 1.1.18-r3)
(2/17) Installing libgcc (6.4.0-r5)
(3/17) Installing libstdc++ (6.4.0-r5)
(4/17) Installing binutils-libs (2.28-r3)
(5/17) Installing binutils (2.28-r3)
(6/17) Installing gmp (6.1.2-r1)
(7/17) Installing isl (0.18-r0)
(8/17) Installing libgomp (6.4.0-r5)
(9/17) Installing libatomic (6.4.0-r5)
(10/17) Installing pkgconf (1.3.10-r0)
(11/17) Installing mpfr3 (3.1.5-r1)
(12/17) Installing mpc1 (1.0.3-r1)
(13/17) Installing gcc (6.4.0-r5)
(14/17) Installing musl-dev (1.1.18-r3)
(15/17) Installing libc-dev (0.7.1-r0)
(16/17) Installing g++ (6.4.0-r5)
(17/17) Upgrading musl-utils (1.1.18-r2 -> 1.1.18-r3)
Executing busybox-1.27.2-r7.trigger
OK: 184 MiB in 50 packages
Removing intermediate container 4b0c32551e3f
---> be26c3bf4e42
Step 3/3 : RUN pip install pandas
---> Running in 36f6024e5e2d
Collecting pandas
Downloading pandas-0.22.0.tar.gz (11.3MB)
Collecting python-dateutil>=2 (from pandas)
Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
Collecting pytz>=2011k (from pandas)
Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)
Collecting numpy>=1.9.0 (from pandas)
Downloading numpy-1.14.1.zip (4.9MB)
Collecting six>=1.5 (from python-dateutil>=2->pandas)
Downloading six-1.11.0-py2.py3-none-any.whl
Building wheels for collected packages: pandas, numpy
Running setup.py bdist_wheel for pandas: started
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: finished with status 'done'
Stored in directory: /root/.cache/pip/wheels/e8/ed/46/0596b51014f3cc49259e52dff9824e1c6fe352048a2656fc92
Running setup.py bdist_wheel for numpy: started
Running setup.py bdist_wheel for numpy: still running...
Running setup.py bdist_wheel for numpy: still running...
Running setup.py bdist_wheel for numpy: still running...
Running setup.py bdist_wheel for numpy: finished with status 'done'
Stored in directory: /root/.cache/pip/wheels/9d/cd/e1/4d418b16ea662e512349ef193ed9d9ff473af715110798c984
Successfully built pandas numpy
Installing collected packages: six, python-dateutil, pytz, numpy, pandas
Successfully installed numpy-1.14.1 pandas-0.22.0 python-dateutil-2.6.1 pytz-2018.3 six-1.11.0
Removing intermediate container 36f6024e5e2d
---> a93c59e6a106
Successfully built a93c59e6a106
Successfully tagged alpine-pandas:latest
docker build -t alpine-pandas -f Dockerfile.alpine . --no-cache 0.54s user 0.33s system 0% cpu 16:08.47 total
Dockerfile.alpine
FROM python:3.6.4-alpine3.7
RUN apk --update add --no-cache g++
RUN pip install pandas
用熊猫和努比塑造阿尔卑斯山形象:
[PandasDockerTest] time docker build -t debian-pandas -f Dockerfile.debian . --no-cache
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM python:3.6.4-slim-jessie
---> 43431c5410f3
Step 2/2 : RUN pip install pandas
---> Running in 2e4c030f8051
Collecting pandas
Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
Collecting numpy>=1.9.0 (from pandas)
Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
Collecting pytz>=2011k (from pandas)
Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)
Collecting python-dateutil>=2 (from pandas)
Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
Collecting six>=1.5 (from python-dateutil>=2->pandas)
Downloading six-1.11.0-py2.py3-none-any.whl
Installing collected packages: numpy, pytz, six, python-dateutil, pandas
Successfully installed numpy-1.14.1 pandas-0.22.0 python-dateutil-2.6.1 pytz-2018.3 six-1.11.0
Removing intermediate container 2e4c030f8051
---> a71e1c314897
Successfully built a71e1c314897
Successfully tagged debian-pandas:latest
docker build -t debian-pandas -f Dockerfile.debian . --no-cache 0.07s user 0.06s system 0% cpu 13.605 total
[PandasDockerTest] time docker build -t alpine-pandas -f Dockerfile.alpine . --no-cache
Sending build context to Docker daemon 16.9kB
Step 1/3 : FROM python:3.6.4-alpine3.7
---> 4b00a94b6f26
Step 2/3 : RUN apk --update add --no-cache g++
---> Running in 4b0c32551e3f
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/17) Upgrading musl (1.1.18-r2 -> 1.1.18-r3)
(2/17) Installing libgcc (6.4.0-r5)
(3/17) Installing libstdc++ (6.4.0-r5)
(4/17) Installing binutils-libs (2.28-r3)
(5/17) Installing binutils (2.28-r3)
(6/17) Installing gmp (6.1.2-r1)
(7/17) Installing isl (0.18-r0)
(8/17) Installing libgomp (6.4.0-r5)
(9/17) Installing libatomic (6.4.0-r5)
(10/17) Installing pkgconf (1.3.10-r0)
(11/17) Installing mpfr3 (3.1.5-r1)
(12/17) Installing mpc1 (1.0.3-r1)
(13/17) Installing gcc (6.4.0-r5)
(14/17) Installing musl-dev (1.1.18-r3)
(15/17) Installing libc-dev (0.7.1-r0)
(16/17) Installing g++ (6.4.0-r5)
(17/17) Upgrading musl-utils (1.1.18-r2 -> 1.1.18-r3)
Executing busybox-1.27.2-r7.trigger
OK: 184 MiB in 50 packages
Removing intermediate container 4b0c32551e3f
---> be26c3bf4e42
Step 3/3 : RUN pip install pandas
---> Running in 36f6024e5e2d
Collecting pandas
Downloading pandas-0.22.0.tar.gz (11.3MB)
Collecting python-dateutil>=2 (from pandas)
Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
Collecting pytz>=2011k (from pandas)
Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)
Collecting numpy>=1.9.0 (from pandas)
Downloading numpy-1.14.1.zip (4.9MB)
Collecting six>=1.5 (from python-dateutil>=2->pandas)
Downloading six-1.11.0-py2.py3-none-any.whl
Building wheels for collected packages: pandas, numpy
Running setup.py bdist_wheel for pandas: started
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: finished with status 'done'
Stored in directory: /root/.cache/pip/wheels/e8/ed/46/0596b51014f3cc49259e52dff9824e1c6fe352048a2656fc92
Running setup.py bdist_wheel for numpy: started
Running setup.py bdist_wheel for numpy: still running...
Running setup.py bdist_wheel for numpy: still running...
Running setup.py bdist_wheel for numpy: still running...
Running setup.py bdist_wheel for numpy: finished with status 'done'
Stored in directory: /root/.cache/pip/wheels/9d/cd/e1/4d418b16ea662e512349ef193ed9d9ff473af715110798c984
Successfully built pandas numpy
Installing collected packages: six, python-dateutil, pytz, numpy, pandas
Successfully installed numpy-1.14.1 pandas-0.22.0 python-dateutil-2.6.1 pytz-2018.3 six-1.11.0
Removing intermediate container 36f6024e5e2d
---> a93c59e6a106
Successfully built a93c59e6a106
Successfully tagged alpine-pandas:latest
docker build -t alpine-pandas -f Dockerfile.alpine . --no-cache 0.54s user 0.33s system 0% cpu 16:08.47 total
基于Debian的映像仅使用
python pip
安装格式的包。whl
格式:
Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
WHL格式是一种比每次都从源代码重新构建Python软件更快、更可靠的安装Python软件的方法。WHL文件只需移动到要安装的目标系统上的正确位置,而源发行版需要在安装之前执行生成步骤
基于Alpine平台的图像中不支持车轮组件pandas
和numpy
。这就是为什么在构建过程中使用python-pip
安装它们时,我们总是从alpine中的源文件编译它们:
Downloading pandas-0.22.0.tar.gz (11.3MB)
Downloading numpy-1.14.1.zip (4.9MB)
在图像构建过程中,我们可以看到以下容器内部:
/ # ps aux
PID USER TIME COMMAND
1 root 0:00 /bin/sh -c pip install pandas
7 root 0:04 {pip} /usr/local/bin/python /usr/local/bin/pip install pandas
21 root 0:07 /usr/local/bin/python -c import setuptools, tokenize;__file__='/tmp/pip-build-en29h0ak/pandas/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n
496 root 0:00 sh
660 root 0:00 /bin/sh -c gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/pri
661 root 0:00 gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/private -Inump
662 root 0:00 /usr/libexec/gcc/x86_64-alpine-linux-musl/6.4.0/cc1 -quiet -I build/src.linux-x86_64-3.6/numpy/core/src/private -I numpy/core/include -I build/src.linux-x86_64-3.6/numpy/core/includ
663 root 0:00 ps aux
如果我们稍微修改一下Dockerfile
:
FROM python:3.6.4-alpine3.7
RUN apk add --no-cache g++ wget
RUN wget https://pypi.python.org/packages/da/c6/0936bc5814b429fddb5d6252566fe73a3e40372e6ceaf87de3dec1326f28/pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
我们得到以下错误:
Step 4/4 : RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
---> Running in 0faea63e2bda
pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl is not a supported wheel on this platform.
The command '/bin/sh -c pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl' returned a non-zero code: 1
不幸的是,在Alpine映像上安装熊猫的唯一方法是等待构建完成
当然,如果您想将Alpine图像与CI中的熊猫一起使用,最好的方法是编译一次,将其推送到任何注册表,并根据需要将其用作基本图像
编辑:
如果您想将Alpine图像与熊猫一起使用,您可以拉取我的docker图像。它是一个python映像,在Alpine平台上预编译了
pandas
。它应该可以节省您的时间。回答:截至2020年3月9日,对于Python3,它仍然没有强>
以下是完整的工作摘要文件:
FROM python:3.7-alpine
RUN echo "@testing http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
RUN apk add --update --no-cache py3-numpy py3-pandas@testing
RUN echo "http://dl-8.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories \
&& apk update \
&& apk add py3-numpy py3-pandas
FROM python:3.8-alpine
RUN echo "@community http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories \
&& apk add py3-pandas@community
ENV PYTHONPATH="/usr/lib/python3.8/site-packages"
构建对python和alpine的确切版本号非常敏感-这些错误似乎会引发Max Levy的错误so:libpython3.7m.so.1.0(缺失)
-但以上这些现在对我来说确实有效
我更新的Dockerfile可在
[早期更新:] 回答:没有强> 在任何阿尔卑斯Dockerfile中,您都可以简单地执行以下操作*
RUN apk add py2-numpy@community py2-scipy@community py-pandas@edge
这是因为numpy
、scipy
和现在的pandas
都是在alpine
上预构建的:
避免每次重建或使用Docker层的一种方法是使用预构建的本地Alpine Linux/.apk
包,例如
您可以一次性构建这些.apk
s,并在Docker文件中的任何位置使用它们:)
这也省去了你必须在事情发生之前将所有其他东西都烘焙到Docker映像中的麻烦,也就是说,你可以灵活地预构建任何你喜欢的Docker映像
PS我已经在上面放了一个Dockerfile存根,大致显示了如何构建图像。其中包括重要步骤(*):
注意
查看@jtlz2答案及其最新更新
FROM python:3.6.4-alpine3.7
RUN apk --update add --no-cache g++
RUN pip install pandas
过时的
因此,py3 pandas&py3 numpy软件包被移动到了测试alpine存储库中,因此,您可以通过在Dockerfile中添加以下行来下载它:
FROM python:3.7-alpine
RUN echo "@testing http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
RUN apk add --update --no-cache py3-numpy py3-pandas@testing
RUN echo "http://dl-8.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories \
&& apk update \
&& apk add py3-numpy py3-pandas
FROM python:3.8-alpine
RUN echo "@community http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories \
&& apk add py3-pandas@community
ENV PYTHONPATH="/usr/lib/python3.8/site-packages"
希望它能帮助别人强>
高山套餐链接:-
- 阿尔卑斯山仓库
我将把这些答案集中在一个答案中,并添加一个我认为遗漏的细节。某些python库,特别是优化的数学库和数据库,在alpine上构建需要很长时间,原因是这些库的pip轮子包括从c/c++预编译的二进制文件,并链接到
gnu libc(glibc)
,这是一组常见的c标准库。Debian、Fedora、CentOS都(通常)使用glibc
,但alpine为了保持轻量级,使用musl libc
。构建在glibc
系统上的c/c++二进制文件将无法在没有glibc
的系统上运行,而musl
也是如此
Pip首先查找具有正确二进制文件的控制盘,如果找不到,则尝试从c/c++源代码编译二进制文件,并将它们与musl链接。在许多情况下,除非您拥有来自python3dev
的python头文件,或者像make
这样的构建工具,否则这甚至不起作用
正如其他人所提到的,现在的一线希望是,有社区提供的带有适当二进制文件的apk
包,使用这些包将为您节省构建二进制文件的过程(有时会很长)
事实上,您可以在alpine上从纯python
.whl
安装,但是,在撰写本文时,由于musl/gnu
问题,不支持alpine的二进制发行版。这里真正诚实的建议是,切换到基于Debian的映像,然后您的所有问题都将消失
alpineforpython应用程序不能很好地工作
下面是我的dockerfile
的一个示例:
在这种情况下,python:3.7.6-buster
更合适,此外,操作系统中不需要任何额外的依赖项
请阅读一篇有用的最新文章:
不要将Alpine Linux用于Python映像
除非您想要大大降低构建速度、更大的映像、更多的工作以及潜在的隐藏bug,否则您应该避免使用Alpine Linux