Docker(HDFS、Spark、Shining R)

Docker(HDFS、Spark、Shining R),r,docker,hadoop,shiny,hdfs,R,Docker,Hadoop,Shiny,Hdfs,我在同一个网络中有3个容器:一个Hadoop容器、一个Spark容器和一个闪亮的R容器 我想从我的闪亮应用程序中读取HDFS上的文件夹。 如果Hadoop、Spark和Shinny R在同一台服务器上(没有Docker容器),我可以使用: system(paste0("hdfs dfs -ls ", "/"), intern = TRUE) 如果我使用docker容器,Hadoop和Shinny R在不同的容器中,我不能这样做: system(paste0("hdfs dfs -ls ", "

我在同一个网络中有3个容器:一个Hadoop容器、一个Spark容器和一个闪亮的R容器

我想从我的闪亮应用程序中读取HDFS上的文件夹。 如果Hadoop、Spark和Shinny R在同一台服务器上(没有Docker容器),我可以使用:

system(paste0("hdfs dfs -ls ", "/"), intern = TRUE)
如果我使用docker容器,Hadoop和Shinny R在不同的容器中,我不能这样做:

system(paste0("hdfs dfs -ls ", "/"), intern = TRUE)
因为他们是独立的

你知道我怎么做吗

我试图使用Sparkyr中的invoke函数,但它不起作用

> library(sparklyr)
>
> conf = spark_config()
>
> sc <- spark_connect(master = "local[*]", config = conf)
Re-using existing Spark connection to local[*]
>
> hconf <- sc %>% spark_context() %>% invoke("hadoopConfiguration")
>
> path <- 'hdfs://namenode:9000/user/root/input2/'
>
> spath <- sparklyr::invoke_new(sc, 'org.apache.hadoop.fs.Path', path)
> spath
<jobj[30]>
  org.apache.hadoop.fs.Path
  hdfs://namenode:9000/user/root/input2
> fs <- invoke_static(sc, "org.apache.hadoop.fs.FileSystem", "get",  hconf)
> fs
<jobj[32]>
  org.apache.hadoop.fs.LocalFileSystem
  org.apache.hadoop.fs.LocalFileSystem@788cf1b0
> lls <- invoke(fs, "globStatus", spath)
Error: java.lang.IllegalArgumentException: Wrong FS: hdfs://namenode:9000/user/root/input2, expected: file:///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
        at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>库(sparklyr)
>
>conf=spark_config()
>
>sc
>hconf%spark_context()%%>%invoke(“hadoopConfiguration”)
>
>路径
>斯帕斯帕斯
org.apache.hadoop.fs.Path
hdfs://namenode:9000/user/root/input2
>财政司司长
org.apache.hadoop.fs.LocalFileSystem
org.apache.hadoop.fs。LocalFileSystem@788cf1b0

>lls我用这个:/var/run/docker.sock解决了这个问题。 所以我换了我的码头工人。我的服务是:


      shiny:
        image: anaid/shiny:1.1
        volumes:
          - 'shiny_logs:/var/log/shiny-server'
          - '/var/run/docker.sock:/var/run/docker.sock'
        ports:
          - "3838:3838"

我的完整docker组合是:


    version: "2"

    services:
      namenode:
        image: anaid/hadoop-namenode:1.1
        container_name: namenode
        volumes:
          - hadoop_namenode:/hadoop/dfs/name
          - hadoop_namenode_files:/hadoop/dfs/files
        environment:
          - CLUSTER_NAME=test
        env_file:
          - ./hadoop.env
        ports:
          - 9899:9870

      datanode:
        image: anaid/hadoop-datanode:1.1
        container_name: datanode
        depends_on:
          - namenode
        environment:
          SERVICE_PRECONDITION: "namenode:9870"
        volumes:
          - hadoop_datanode1:/hadoop/dfs/data
          - hadoop_namenode_files1:/hadoop/dfs/files
        env_file:
          - ./hadoop.env      

      mongodb:
        image: mongo
        container_name: mongodb
        ports:
          - "27020:27017"

      shiny:
        image: anaid/shiny:1.1
        volumes:
          - 'shiny_logs:/var/log/shiny-server'
          - /Users/anaid/Docker/hadoop_spark/hadoop-spark-master/shiny:/srv/shiny-server/
          - '/var/run/docker.sock:/var/run/docker.sock'
        ports:
          - "3838:3838"

      nodemanager:
        image: anaid/hadoop-nodemanager:1.1
        container_name: nodemanager
        depends_on:
          - namenode
          - datanode
        env_file:
          - ./hadoop.env

      historyserver:
        image: anaid/hadoop-historyserver:1.1
        container_name: historyserver
        depends_on:
          - namenode
          - datanode
        volumes:
          - hadoop_historyserver:/hadoop/yarn/timeline
        env_file:
          - ./hadoop.env

      spark-master:
        image: anaid/spark-master:1.1
        container_name: spark-master
        ports:
          - "9090:8080"
          - "7077:7077"
        volumes:
           - ./apps:/opt/spark-apps
           - ./data:/opt/spark-data
        environment:
          - "SPARK_LOCAL_IP=spark-master"

      spark-worker-1:
        image: anaid/spark-worker:1.1
        container_name: spark-worker-1
        depends_on:
          - spark-master
        environment:
          - SPARK_MASTER=spark://spark-master:7077
          - SPARK_WORKER_CORES=1
          - SPARK_WORKER_MEMORY=30G
          - SPARK_DRIVER_MEMORY=15G
          - SPARK_EXECUTOR_MEMORY=15G
        volumes:
           - ./apps:/opt/spark-apps
           - ./data:/opt/spark-data
        ports:
          - "8083:8081"

      spark-worker-2:
        image: anaid/spark-worker:1.1
        container_name: spark-worker-2
        depends_on:
          - spark-master
        environment:
          - SPARK_MASTER=spark://spark-master:7077
          - SPARK_WORKER_CORES=1
          - SPARK_WORKER_MEMORY=30G
          - SPARK_DRIVER_MEMORY=15G
          - SPARK_EXECUTOR_MEMORY=15G
        volumes:
           - ./apps:/opt/spark-apps
           - ./data:/opt/spark-data
        ports:
          - "8084:8081"

    volumes:
      hadoop_namenode:
      hadoop_datanode1:
      hadoop_namenode_files:
      hadoop_namenode_files1:
      hadoop_historyserver:
      shiny_logs:
      mongo-config:

然后我不得不在我闪亮的容器中安装docker。我在dockerfile上添加了命令。 我的闪亮dockerfile是:

# get shiny serves plus tidyverse packages image
FROM rocker/shiny:3.6.1

# system libraries of general use
RUN apt-get update && apt-get install -y \
    sudo


#  Anaid added for V8 and sparklyr library
RUN apt-get install -y \ 
        r-cran-xml \
        openjdk-8-jdk \
        libv8-dev \ 
        libxml2 \ 
        libxml2-dev \ 
        libssl-dev \
        libcurl4-openssl-dev \
        libcairo2-dev \
        libsasl2-dev \
        libssl-dev \
        vim 

RUN sudo apt-get install -y \ 
         apt-transport-https \
         ca-certificates \
         curl \
         gnupg2 \
         software-properties-common

# For docker inside the container
# Add Docker’s official GPG key:
RUN curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -

RUN sudo apt-key fingerprint 0EBFCD88

RUN sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/debian \
   $(lsb_release -cs) \
   stable"

RUN sudo apt-get update

# Install the latest version of Docker Engine  
RUN sudo apt-get install -y \ 
        docker-ce \ 
        docker-ce-cli \ 
        containerd.io

RUN sudo apt-get install -y \  
        docker-ce=5:19.03.2~3-0~debian-stretch \ 
        docker-ce-cli=5:19.03.2~3-0~debian-stretch \ 
        containerd.io

# Download and install library. They are saved here /usr/local/lib/R/site-library

RUN R -e "install.packages(c('shiny', 'Rcpp' ,'pillar', 'git2r', 'compiler',  'dbplyr',   'r2d3', 'base64enc', 'devtools',    'zeallot',    'digest',    'jsonlite',  'tibble',     'pkgconfig',  'rlang',   'DBI',   'cli',   'rstudioapi',   'yaml',   'arallel',   'withr',   'dplyr',   'httr_1.4.0',       'generics',   'htmlwidgets',   'vctrs',   'askpass',   'rprojroot',   'tidyselect',   'glue',   'forge',   'R6',   'fansi',   'purrr',   'magrittr',   'backports',   'htmltools',   'ellipsis',   'assertthat',   'config',   'utf8',   'openssl',   'crayon', 'shinydashboard',  'BBmisc', 'ggfortify', 'cluster','stringr', 'DT', 'plotly', 'ggplot2', 'shinyjs', 'dplyr', 'stats', 'graphics', 'grDevices', 'utils', 'datasets', 'methods', 'base', 'Rtools', 'XML', 'data.table', 'jsonlite', 'yaml'))"

RUN R -e "install.packages(c('devtools', 'XML', 'data.table', 'jsonlite', 'yaml', 'rlist', 'V8', 'sparklyr'), repos='http://cran.rstudio.com/')"

RUN R -e "install.packages(c('lattice', 'nlme', 'broom', 'sparklyr', 'shinyalert', 'mongolite', 'jtools'), repos='http://cran.rstudio.com/')"

## create directories
## RUN mkdir -p /myScripts

## copy files
## COPY /myScripts/installMissingPkgs.R /myScripts/installMissingPkgs.R
## COPY /myScripts/packageList /myScripts/packageList

## install R-packages
## RUN Rscript /myScripts/installMissingPkgs.R

# copy the app to the image
COPY app.R /srv/shiny-server/

# select port
EXPOSE 3838

# allow permission
RUN sudo chown -R shiny:shiny /srv/shiny-server

# run app
CMD ["/usr/bin/shiny-server.sh"]

然后我在应用程序中使用R系统功能时遇到了一些问题。这就是错误:


    Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/namenode/json: dial unix /var/run/docker.sock: connect: permission denied
    Warning in system(paste0("docker exec -it namenode hdfs dfs -ls ", dir),  :
      running command 'docker exec -it namenode hdfs dfs -ls /' had status 1

我通过以下操作解决了这个问题(Shinny的容器):

然后,我在我的应用程序USER=root上添加了:


    system("USER=root")
    system("docker exec namenode hdfs dfs -ls /", intern = TRUE)

使用system()的我的简单应用程序的代码:

库(闪亮)
图书馆(工具)
图书馆(stringi)
用户界面

    system("USER=root")
    system("docker exec namenode hdfs dfs -ls /", intern = TRUE)

library(shiny)
library(tools)
library(stringi)

ui <- fluidPage(

  h3(textOutput("system"))

)

server <- function(input, output, session) {

  rv <- reactiveValues(syst = NULL)

  observe({
    # pwd
    # docker ps working
      system("USER=root")
      rv$syst <- paste(system("docker exec namenode hdfs dfs -ls /", intern = TRUE), system("ls", intern = TRUE) ) 
    })

  output$system <- renderText({ 
    rv$syst
  })
}

shinyApp(ui, server)