Docker(HDFS、Spark、Shining R)
我在同一个网络中有3个容器:一个Hadoop容器、一个Spark容器和一个闪亮的R容器 我想从我的闪亮应用程序中读取HDFS上的文件夹。 如果Hadoop、Spark和Shinny R在同一台服务器上(没有Docker容器),我可以使用:Docker(HDFS、Spark、Shining R),r,docker,hadoop,shiny,hdfs,R,Docker,Hadoop,Shiny,Hdfs,我在同一个网络中有3个容器:一个Hadoop容器、一个Spark容器和一个闪亮的R容器 我想从我的闪亮应用程序中读取HDFS上的文件夹。 如果Hadoop、Spark和Shinny R在同一台服务器上(没有Docker容器),我可以使用: system(paste0("hdfs dfs -ls ", "/"), intern = TRUE) 如果我使用docker容器,Hadoop和Shinny R在不同的容器中,我不能这样做: system(paste0("hdfs dfs -ls ", "
system(paste0("hdfs dfs -ls ", "/"), intern = TRUE)
如果我使用docker容器,Hadoop和Shinny R在不同的容器中,我不能这样做:
system(paste0("hdfs dfs -ls ", "/"), intern = TRUE)
因为他们是独立的
你知道我怎么做吗
我试图使用Sparkyr中的invoke函数,但它不起作用
> library(sparklyr)
>
> conf = spark_config()
>
> sc <- spark_connect(master = "local[*]", config = conf)
Re-using existing Spark connection to local[*]
>
> hconf <- sc %>% spark_context() %>% invoke("hadoopConfiguration")
>
> path <- 'hdfs://namenode:9000/user/root/input2/'
>
> spath <- sparklyr::invoke_new(sc, 'org.apache.hadoop.fs.Path', path)
> spath
<jobj[30]>
org.apache.hadoop.fs.Path
hdfs://namenode:9000/user/root/input2
> fs <- invoke_static(sc, "org.apache.hadoop.fs.FileSystem", "get", hconf)
> fs
<jobj[32]>
org.apache.hadoop.fs.LocalFileSystem
org.apache.hadoop.fs.LocalFileSystem@788cf1b0
> lls <- invoke(fs, "globStatus", spath)
Error: java.lang.IllegalArgumentException: Wrong FS: hdfs://namenode:9000/user/root/input2, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>库(sparklyr)
>
>conf=spark_config()
>
>sc
>hconf%spark_context()%%>%invoke(“hadoopConfiguration”)
>
>路径
>斯帕斯帕斯
org.apache.hadoop.fs.Path
hdfs://namenode:9000/user/root/input2
>财政司司长
org.apache.hadoop.fs.LocalFileSystem
org.apache.hadoop.fs。LocalFileSystem@788cf1b0
>lls我用这个:/var/run/docker.sock解决了这个问题。
所以我换了我的码头工人。我的服务是:
shiny:
image: anaid/shiny:1.1
volumes:
- 'shiny_logs:/var/log/shiny-server'
- '/var/run/docker.sock:/var/run/docker.sock'
ports:
- "3838:3838"
我的完整docker组合是:
version: "2"
services:
namenode:
image: anaid/hadoop-namenode:1.1
container_name: namenode
volumes:
- hadoop_namenode:/hadoop/dfs/name
- hadoop_namenode_files:/hadoop/dfs/files
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop.env
ports:
- 9899:9870
datanode:
image: anaid/hadoop-datanode:1.1
container_name: datanode
depends_on:
- namenode
environment:
SERVICE_PRECONDITION: "namenode:9870"
volumes:
- hadoop_datanode1:/hadoop/dfs/data
- hadoop_namenode_files1:/hadoop/dfs/files
env_file:
- ./hadoop.env
mongodb:
image: mongo
container_name: mongodb
ports:
- "27020:27017"
shiny:
image: anaid/shiny:1.1
volumes:
- 'shiny_logs:/var/log/shiny-server'
- /Users/anaid/Docker/hadoop_spark/hadoop-spark-master/shiny:/srv/shiny-server/
- '/var/run/docker.sock:/var/run/docker.sock'
ports:
- "3838:3838"
nodemanager:
image: anaid/hadoop-nodemanager:1.1
container_name: nodemanager
depends_on:
- namenode
- datanode
env_file:
- ./hadoop.env
historyserver:
image: anaid/hadoop-historyserver:1.1
container_name: historyserver
depends_on:
- namenode
- datanode
volumes:
- hadoop_historyserver:/hadoop/yarn/timeline
env_file:
- ./hadoop.env
spark-master:
image: anaid/spark-master:1.1
container_name: spark-master
ports:
- "9090:8080"
- "7077:7077"
volumes:
- ./apps:/opt/spark-apps
- ./data:/opt/spark-data
environment:
- "SPARK_LOCAL_IP=spark-master"
spark-worker-1:
image: anaid/spark-worker:1.1
container_name: spark-worker-1
depends_on:
- spark-master
environment:
- SPARK_MASTER=spark://spark-master:7077
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=30G
- SPARK_DRIVER_MEMORY=15G
- SPARK_EXECUTOR_MEMORY=15G
volumes:
- ./apps:/opt/spark-apps
- ./data:/opt/spark-data
ports:
- "8083:8081"
spark-worker-2:
image: anaid/spark-worker:1.1
container_name: spark-worker-2
depends_on:
- spark-master
environment:
- SPARK_MASTER=spark://spark-master:7077
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=30G
- SPARK_DRIVER_MEMORY=15G
- SPARK_EXECUTOR_MEMORY=15G
volumes:
- ./apps:/opt/spark-apps
- ./data:/opt/spark-data
ports:
- "8084:8081"
volumes:
hadoop_namenode:
hadoop_datanode1:
hadoop_namenode_files:
hadoop_namenode_files1:
hadoop_historyserver:
shiny_logs:
mongo-config:
然后我不得不在我闪亮的容器中安装docker。我在dockerfile上添加了命令。
我的闪亮dockerfile是:
# get shiny serves plus tidyverse packages image
FROM rocker/shiny:3.6.1
# system libraries of general use
RUN apt-get update && apt-get install -y \
sudo
# Anaid added for V8 and sparklyr library
RUN apt-get install -y \
r-cran-xml \
openjdk-8-jdk \
libv8-dev \
libxml2 \
libxml2-dev \
libssl-dev \
libcurl4-openssl-dev \
libcairo2-dev \
libsasl2-dev \
libssl-dev \
vim
RUN sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
gnupg2 \
software-properties-common
# For docker inside the container
# Add Docker’s official GPG key:
RUN curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
RUN sudo apt-key fingerprint 0EBFCD88
RUN sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/debian \
$(lsb_release -cs) \
stable"
RUN sudo apt-get update
# Install the latest version of Docker Engine
RUN sudo apt-get install -y \
docker-ce \
docker-ce-cli \
containerd.io
RUN sudo apt-get install -y \
docker-ce=5:19.03.2~3-0~debian-stretch \
docker-ce-cli=5:19.03.2~3-0~debian-stretch \
containerd.io
# Download and install library. They are saved here /usr/local/lib/R/site-library
RUN R -e "install.packages(c('shiny', 'Rcpp' ,'pillar', 'git2r', 'compiler', 'dbplyr', 'r2d3', 'base64enc', 'devtools', 'zeallot', 'digest', 'jsonlite', 'tibble', 'pkgconfig', 'rlang', 'DBI', 'cli', 'rstudioapi', 'yaml', 'arallel', 'withr', 'dplyr', 'httr_1.4.0', 'generics', 'htmlwidgets', 'vctrs', 'askpass', 'rprojroot', 'tidyselect', 'glue', 'forge', 'R6', 'fansi', 'purrr', 'magrittr', 'backports', 'htmltools', 'ellipsis', 'assertthat', 'config', 'utf8', 'openssl', 'crayon', 'shinydashboard', 'BBmisc', 'ggfortify', 'cluster','stringr', 'DT', 'plotly', 'ggplot2', 'shinyjs', 'dplyr', 'stats', 'graphics', 'grDevices', 'utils', 'datasets', 'methods', 'base', 'Rtools', 'XML', 'data.table', 'jsonlite', 'yaml'))"
RUN R -e "install.packages(c('devtools', 'XML', 'data.table', 'jsonlite', 'yaml', 'rlist', 'V8', 'sparklyr'), repos='http://cran.rstudio.com/')"
RUN R -e "install.packages(c('lattice', 'nlme', 'broom', 'sparklyr', 'shinyalert', 'mongolite', 'jtools'), repos='http://cran.rstudio.com/')"
## create directories
## RUN mkdir -p /myScripts
## copy files
## COPY /myScripts/installMissingPkgs.R /myScripts/installMissingPkgs.R
## COPY /myScripts/packageList /myScripts/packageList
## install R-packages
## RUN Rscript /myScripts/installMissingPkgs.R
# copy the app to the image
COPY app.R /srv/shiny-server/
# select port
EXPOSE 3838
# allow permission
RUN sudo chown -R shiny:shiny /srv/shiny-server
# run app
CMD ["/usr/bin/shiny-server.sh"]
然后我在应用程序中使用R系统功能时遇到了一些问题。这就是错误:
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/namenode/json: dial unix /var/run/docker.sock: connect: permission denied
Warning in system(paste0("docker exec -it namenode hdfs dfs -ls ", dir), :
running command 'docker exec -it namenode hdfs dfs -ls /' had status 1
我通过以下操作解决了这个问题(Shinny的容器):
然后,我在我的应用程序USER=root上添加了:
system("USER=root")
system("docker exec namenode hdfs dfs -ls /", intern = TRUE)
使用system()的我的简单应用程序的代码:
库(闪亮)
图书馆(工具)
图书馆(stringi)
用户界面
system("USER=root")
system("docker exec namenode hdfs dfs -ls /", intern = TRUE)
library(shiny)
library(tools)
library(stringi)
ui <- fluidPage(
h3(textOutput("system"))
)
server <- function(input, output, session) {
rv <- reactiveValues(syst = NULL)
observe({
# pwd
# docker ps working
system("USER=root")
rv$syst <- paste(system("docker exec namenode hdfs dfs -ls /", intern = TRUE), system("ls", intern = TRUE) )
})
output$system <- renderText({
rv$syst
})
}
shinyApp(ui, server)