Hadoop集群中的密钥文件分发

Hadoop集群中的密钥文件分发,hadoop,google-cloud-storage,distcp,Hadoop,Google Cloud Storage,Distcp,我想把很多文件从HDFS发送到Google存储(GS)。所以我想在这种情况下使用distcp命令 hadoop distcp -libjars <full path to connector jar> -m <amount of mappers> hdfs://<host>:<port(default 8020)>/<hdfs path> gs://<backet name>/ hadoop distcp-libjars-

我想把很多文件从HDFS发送到Google存储(GS)。所以我想在这种情况下使用distcp命令

hadoop distcp -libjars <full path to connector jar> -m <amount of mappers> hdfs://<host>:<port(default 8020)>/<hdfs path> gs://<backet name>/
hadoop distcp-libjars-m hdfs://:/gs:/// 我还需要在core-site.xml中指定*.p12密钥文件以访问GS。我需要将此文件分发到集群中的所有节点

<property>
    <name>google.cloud.auth.service.account.keyfile</name>
    <value>/opt/hadoop/conf/gcskey.p12</value>
</property>

google.cloud.auth.service.account.keyfile
/opt/hadoop/conf/gcskey.p12

我不想手动操作。分发密钥文件的最佳做法是什么?

有一个通用参数

-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-文件指定要复制到map reduce群集的逗号分隔文件
命令将是

hadoop distcp -libjars <full path to connector jar> -files /etc/hadoop/conf/gcskey.p12 -m <amount of mappers>  hdfs://<host>:<port(default 8020)>/<hdfs path> gs://<backet name>/
hadoop distcp-libjars-files/etc/hadoop/conf/gcskey.p12-m hdfs://:/gs:///

注1在这种情况下,我们需要在core-site.xml上设置密钥路径(google.cloud.auth.service.account.keyfile),如下例所示

NOTE2您需要在当前目录中有.p12密钥文件,因为haddop在启动时检查核心站点的路径

<property>
    <name>fs.gs.impl</name>
    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
</property>
<property>
    <name>fs.AbstractFileSystem.gs.impl</name>
    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
    <description>
        The AbstractFileSystem for gs: (GCS) uris. Only necessary for use with Hadoop 2.
    </description>
</property>
<property>
    <name>fs.gs.project.id</name>
    <value>google project id</value>
    <description>Google Project Id</description>
</property>
<property>
    <name>google.cloud.auth.service.account.enable</name>
    <value>true</value>
</property>
<property>
    <name>google.cloud.auth.service.account.email</name>
    <value>google service account email</value>
    <description>Project service account email</description>
</property>
<property>
    <name>google.cloud.auth.service.account.keyfile</name>
    <value>gcskey.p12</value>
    <description>Local path to .p12 key at each node</description>
</property>

fs.gs.impl
com.google.cloud.hadoop.fs.gcs.googlehadoop文件系统
fs.AbstractFileSystem.gs.impl
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS
gs:(GCS)URI的抽象文件系统。仅适用于Hadoop 2。
fs.gs.project.id
谷歌项目id
谷歌项目Id
google.cloud.auth.service.account.enable
真的
google.cloud.auth.service.account.email
谷歌服务帐户电子邮件
项目服务帐户电子邮件
google.cloud.auth.service.account.keyfile
gcskey.p12
每个节点上.p12键的本地路径