如何在Dockerfile中跨多个CMD语句持久化R工作区?
我正在创建一个docker容器来接受来自客户端的输入数据,然后生成一个R脚本,该脚本对给定数据运行分析,并将三个绘图作为PDF输出到工作目录。我遇到了Docker引擎不允许在Docker文件中使用两条CMD语句的问题。我需要在运行时读取用户的数据,以便可以根据用户更改数据集。然后,在用户的数据作为R工作区中的表对象读入后,需要获取R脚本的源代码。发生的情况是,数据读取正常,但一条错误消息表明,作为R脚本源的第二行CMD无法找到刚刚读入的数据。我理解这是因为Dockerfile的每一行都是在构建时单独执行的,但我不知道如何解决这个问题。我已经研究了体积、supervisor的使用以及使用多个容器的可能性。我也可以制作一个Python脚本来动态编程我的R脚本,但这可能仍然需要两行CMD。我还没有找到一个适合我的情况的例子。你们中有谁会解决这个问题 以下是我的R脚本,用于从数据框“x”创建3个绘图:如何在Dockerfile中跨多个CMD语句持久化R工作区?,r,docker,dockerfile,persistence,interactive,R,Docker,Dockerfile,Persistence,Interactive,我正在创建一个docker容器来接受来自客户端的输入数据,然后生成一个R脚本,该脚本对给定数据运行分析,并将三个绘图作为PDF输出到工作目录。我遇到了Docker引擎不允许在Docker文件中使用两条CMD语句的问题。我需要在运行时读取用户的数据,以便可以根据用户更改数据集。然后,在用户的数据作为R工作区中的表对象读入后,需要获取R脚本的源代码。发生的情况是,数据读取正常,但一条错误消息表明,作为R脚本源的第二行CMD无法找到刚刚读入的数据。我理解这是因为Dockerfile的每一行都是在构建时
library(iq)
norm_data <- iq::preprocess(x, median_normalization = FALSE, pdf_out = NULL)
protein_list <- iq::create_protein_list(norm_data)
# basic protein plot
pdf(file = "Protein P00366.pdf")
iq::plot_protein(protein_list$P00366, main = "Protein P00366", split = NULL)
dev.off()
protein_table <- iq::create_protein_table(protein_list)
#MaxLFQ plot
pdf(file = "MaxLFQ quantification of P00366.pdf")
iq::plot_protein(rbind(protein_list$P00366,
MaxLFQ = iq::maxLFQ(protein_list$P00366)$estimate),
main = "MaxLFQ quantification of P00366",
col = c(rep("gray", nrow(protein_list$P00366)), "green"),
split = NULL)
dev.off()
# ground truth
MaxLFQ_estimate <- iq::maxLFQ(protein_list$P12799)$estimate
ground_truth <- log2(rep(c(200, 125.99, 79.37, 50, 4, 2.52, 1.59, 1), each = 3))
ground_truth <- ground_truth - mean(ground_truth) + mean(MaxLFQ_estimate)
#ground truth plot
pdf(file = "P12799 MaxLFQ versus groundtruth.pdf")
iq::plot_protein(rbind(MaxLFQ = MaxLFQ_estimate,
Groundtruth = ground_truth),
main = "P12799 - MaxLFQ versus groundtruth",
split = 0.75,
col = c("green", "gold"))
dev.off()
尝试运行后,我从R工作区收到以下错误消息:
> source('iqTest.R')
Error in iq::preprocess(x, median_normalization = FALSE, pdf_out = NULL) :
object 'x' not found
Calls: source -> withVisible -> eval -> eval -> <Anonymous>
如果您希望通过环境变量配置数据路径,那么我建议您使用访问脚本中的该变量。这还允许您使用
Rscript
而不是R-e“source…
以下是对我有效的方法:
script.R
cat(Sys.getenv('SCRIPT'), '\n');
cat(Sys.getenv('DATA'), '\n')
Dockerfile
FROM r-base:latest
ENV SCRIPT="script.R"
ENV DATA="data.csv"
WORKDIR /workspace
CMD R -q -e "source('$SCRIPT')"
# alternative: CMD Rscript $SCRIPT
daniel@nuest /tmp/stackoverflow []$ docker build --tag stackoverflow .
Sending build context to Docker daemon 4.608kB
Step 1/5 : FROM r-base:latest
---> 46edce0e80af
Step 2/5 : ENV SCRIPT="script.R"
---> Using cache
---> 8f26d34d9c0a
Step 3/5 : ENV DATA="data.csv"
---> Using cache
---> 16c83c16a4c8
Step 4/5 : WORKDIR /workspace
---> Running in fce8619af30b
Removing intermediate container fce8619af30b
---> a8278f609d9a
Step 5/5 : CMD R -q -e "source('$SCRIPT')"
---> Running in 765bafeb8681
Removing intermediate container 765bafeb8681
---> ff7d7b09dffb
Successfully built ff7d7b09dffb
Successfully tagged stackoverflow:latest
daniel@nuest /tmp/stackoverflow []$ docker run --rm -it -v $(pwd):/workspace stackoverflow
> source('script.R')
script.R
data.csv
>
>
用法
FROM r-base:latest
ENV SCRIPT="script.R"
ENV DATA="data.csv"
WORKDIR /workspace
CMD R -q -e "source('$SCRIPT')"
# alternative: CMD Rscript $SCRIPT
daniel@nuest /tmp/stackoverflow []$ docker build --tag stackoverflow .
Sending build context to Docker daemon 4.608kB
Step 1/5 : FROM r-base:latest
---> 46edce0e80af
Step 2/5 : ENV SCRIPT="script.R"
---> Using cache
---> 8f26d34d9c0a
Step 3/5 : ENV DATA="data.csv"
---> Using cache
---> 16c83c16a4c8
Step 4/5 : WORKDIR /workspace
---> Running in fce8619af30b
Removing intermediate container fce8619af30b
---> a8278f609d9a
Step 5/5 : CMD R -q -e "source('$SCRIPT')"
---> Running in 765bafeb8681
Removing intermediate container 765bafeb8681
---> ff7d7b09dffb
Successfully built ff7d7b09dffb
Successfully tagged stackoverflow:latest
daniel@nuest /tmp/stackoverflow []$ docker run --rm -it -v $(pwd):/workspace stackoverflow
> source('script.R')
script.R
data.csv
>
>
或者,您可以尝试将数据路径作为参数传递给脚本文件,请参见
顺便说一句:为了再现性,最好固定一个特定的R版本,而不是使用
:latest
。如果您想通过环境变量配置数据路径,那么我建议您使用访问脚本中的该变量。这也允许您使用Rscript
,而不是R-e“来源…
以下是对我有效的方法:
script.R
cat(Sys.getenv('SCRIPT'), '\n');
cat(Sys.getenv('DATA'), '\n')
Dockerfile
FROM r-base:latest
ENV SCRIPT="script.R"
ENV DATA="data.csv"
WORKDIR /workspace
CMD R -q -e "source('$SCRIPT')"
# alternative: CMD Rscript $SCRIPT
daniel@nuest /tmp/stackoverflow []$ docker build --tag stackoverflow .
Sending build context to Docker daemon 4.608kB
Step 1/5 : FROM r-base:latest
---> 46edce0e80af
Step 2/5 : ENV SCRIPT="script.R"
---> Using cache
---> 8f26d34d9c0a
Step 3/5 : ENV DATA="data.csv"
---> Using cache
---> 16c83c16a4c8
Step 4/5 : WORKDIR /workspace
---> Running in fce8619af30b
Removing intermediate container fce8619af30b
---> a8278f609d9a
Step 5/5 : CMD R -q -e "source('$SCRIPT')"
---> Running in 765bafeb8681
Removing intermediate container 765bafeb8681
---> ff7d7b09dffb
Successfully built ff7d7b09dffb
Successfully tagged stackoverflow:latest
daniel@nuest /tmp/stackoverflow []$ docker run --rm -it -v $(pwd):/workspace stackoverflow
> source('script.R')
script.R
data.csv
>
>
用法
FROM r-base:latest
ENV SCRIPT="script.R"
ENV DATA="data.csv"
WORKDIR /workspace
CMD R -q -e "source('$SCRIPT')"
# alternative: CMD Rscript $SCRIPT
daniel@nuest /tmp/stackoverflow []$ docker build --tag stackoverflow .
Sending build context to Docker daemon 4.608kB
Step 1/5 : FROM r-base:latest
---> 46edce0e80af
Step 2/5 : ENV SCRIPT="script.R"
---> Using cache
---> 8f26d34d9c0a
Step 3/5 : ENV DATA="data.csv"
---> Using cache
---> 16c83c16a4c8
Step 4/5 : WORKDIR /workspace
---> Running in fce8619af30b
Removing intermediate container fce8619af30b
---> a8278f609d9a
Step 5/5 : CMD R -q -e "source('$SCRIPT')"
---> Running in 765bafeb8681
Removing intermediate container 765bafeb8681
---> ff7d7b09dffb
Successfully built ff7d7b09dffb
Successfully tagged stackoverflow:latest
daniel@nuest /tmp/stackoverflow []$ docker run --rm -it -v $(pwd):/workspace stackoverflow
> source('script.R')
script.R
data.csv
>
>
或者,您可以尝试将数据路径作为参数传递给脚本文件,请参见 顺便说一句:为了再现性,最好固定一个特定的R版本,而不是使用
:latest