Amazon web services AWS EMR PySpark作业中的机密管理
我有一份EMR PySpark工作,需要访问第三方拥有的s3存储桶 PySpark作业存储在Amazon web services AWS EMR PySpark作业中的机密管理,amazon-web-services,amazon-emr,Amazon Web Services,Amazon Emr,我有一份EMR PySpark工作,需要访问第三方拥有的s3存储桶 PySpark作业存储在s3://mybucket/job.py上,并作为一个步骤提交 { "Name": "Process promo_regs", "ActionOnFailure": "TERMINATE_CLUSTER", "HadoopJarStep": { "Jar": "command-runne
s3://mybucket/job.py
上,并作为一个步骤提交
{
"Name": "Process promo_regs",
"ActionOnFailure": "TERMINATE_CLUSTER",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": ["spark-submit", "--master", "yarn", "--deploy-mode cluster", "s3://mybucket.job.py"],
}
}
在job.py
中,我配置了一个boto3 s3客户端
from pyspark.sql import SparkSession
import boto3
# How to inject this?
env = {
'AWS_ACCESS_KEY_ID': '',
'#AWS_SECRET_ACCESS_KEY': '',
'AWS_REGION_NAME': ''
}
s3 = boto3.client(
's3',
aws_access_key_id=env['AWS_ACCESS_KEY_ID'],
aws_secret_access_key=env['#AWS_SECRET_ACCESS_KEY'],
region_name=env['AWS_REGION_NAME'],
spark = (SparkSession
.builder
.appName("Test processing dummy data")
.getOrCreate())
我安全地将访问密钥注入脚本的选项是什么
我正在启动集群并使用boto3.client('emr')提交作业。如果有问题,请运行作业流()我可以想到两种方法:
{
“版本”:“2012-10-17”,
“声明”:[
{
“行动”:[
“secretsmanager:GetSecretValue”
],
“资源”:“arn:aws:secretsmanager:us-east-1::secret:*”,
“效果”:“允许”,
“Sid”:“VisualEditor0”
}
]
}
谢谢,两者都有道理!我们将看到第三方可以接受哪一种。以下是链接到
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": "arn:aws:secretsmanager:us-east-1:<account-no>:secret:<Secret prefix if you have any>*",
"Effect": "Allow",
"Sid": "VisualEditor0"
}
]
}