Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/amazon-s3/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Amazon web services 在Apache Spark中。如何设置worker/executor';什么是环境变量?_Amazon Web Services_Amazon S3_Apache Spark_Distributed Computing - Fatal编程技术网

Amazon web services 在Apache Spark中。如何设置worker/executor';什么是环境变量?

Amazon web services 在Apache Spark中。如何设置worker/executor';什么是环境变量?,amazon-web-services,amazon-s3,apache-spark,distributed-computing,Amazon Web Services,Amazon S3,Apache Spark,Distributed Computing,我在EMR上的spark程序经常出现以下错误: Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:421) at org.apache.http.conn.ssl.AbstractVerifier.verify(AbstractVe

我在EMR上的spark程序经常出现以下错误:

Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
    at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:421)
    at org.apache.http.conn.ssl.AbstractVerifier.verify(AbstractVerifier.java:128)
    at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:397)
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
    at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
    at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:573)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:334)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:942)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2148)
    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2075)
    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1093)
    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:548)
    at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:172)
    at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
    at org.apache.hadoop.fs.s3native.$Proxy8.retrieveMetadata(Unknown Source)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:414)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:341)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
我做了一些研究,发现通过设置环境变量,可以在低安全性情况下禁用此身份验证:

com.amazonaws.sdk.disableCertChecking=true
但是我只能用spark-submit.sh--conf来设置它,它只影响驱动程序,而大多数错误都发生在worker上

有没有办法把它们传播给工人


非常感谢。

刚刚在

spark.ExecuteEnv.[EnvironmentVariableName]

将EnvironmentVariableName指定的环境变量添加到 执行程序。用户可以指定要设置的多个 多个环境变量


因此,在您的情况下,我将Spark配置选项
Spark.executenv.com.amazonaws.sdk.disableCertChecking
设置为
true
,看看这是否有帮助。

为现有答案添加更多内容

导入pyspark
def get_spark_上下文(应用程序名称):
#配置
conf=pyspark.SparkConf()
conf.set('spark.app.name',app_name)
#初始化和返回
sc=pyspark.SparkContext.getOrCreate(conf=conf)
#配置特定于应用程序的设置
#为执行器设置环境值
conf.set(f'spark.executorEnv.SOME_ENVIRONMENT_VALUE','I_AM_PRESENT')
返回pyspark.SQLContext(sparkContext=sc)
执行器/工作器中将提供一些环境值
环境变量

在spark应用程序中,您可以如下方式访问它们:

导入操作系统
some\u environment\u value=os.environment.get('some\u environment\u value'))

在其他答案的基础上,下面是一个完整的示例(PySpark 2.4.1)。在本例中,我强制所有工作线程在“英特尔MKL内核库”中每个内核只生成一个线程:

import pyspark

conf = pyspark.conf.SparkConf().setAll([
                                   ('spark.executorEnv.OMP_NUM_THREADS', '1'),
                                   ('spark.workerEnv.OMP_NUM_THREADS', '1'),
                                   ('spark.executorEnv.OPENBLAS_NUM_THREADS', '1'),
                                   ('spark.workerEnv.OPENBLAS_NUM_THREADS', '1'),
                                   ('spark.executorEnv.MKL_NUM_THREADS', '1'),
                                   ('spark.workerEnv.MKL_NUM_THREADS', '1'),
                                   ])

spark = pyspark.sql.SparkSession.builder.config(conf=conf).getOrCreate()

# print current PySpark configuration to be sure
print("Current PySpark settings: ", spark.sparkContext._conf.getAll())

对于spark 2.4,@Amit Kushwaha的方法不起作用

我测试过:

1.集群模式 2.客户端模式 以上任何一项都不能将环境变量设置到executor系统中(也就是说,
os.environ.get('DEBUG')
)无法读取)


唯一的方法是从spark.conf获取: 提交:

spark-submit --conf DEBUG=1 main.py
获取变量:

DEBUG = spark.conf.get('DEBUG')

谢谢!我只是在官方文件中发现,不知道为什么它在被轻易忽略之前就被忽略了。这帮我省去了一些麻烦,谢谢。我也需要同样的东西,但是在
工作人员的
:去看看是否有。。
spark-submit --conf DEBUG=1 main.py
DEBUG = spark.conf.get('DEBUG')