Python 使用Spark Sklearn的GridSearch抛出multiprocessing.pool.maybeencoding错误:
我正在尝试使用Spark Sklearn模块并行化gridsearchcv操作。我正在使用具有以下配置的EMR群集 主节点-c4.4XL(30克,8芯/16个VCPU) 从属节点(3个)-c4.8XL(60克,18芯/36个VCPU) 编辑:我不能粘贴代码,但它相当简单Python 使用Spark Sklearn的GridSearch抛出multiprocessing.pool.maybeencoding错误:,python,apache-spark,scikit-learn,pyspark,Python,Apache Spark,Scikit Learn,Pyspark,我正在尝试使用Spark Sklearn模块并行化gridsearchcv操作。我正在使用具有以下配置的EMR群集 主节点-c4.4XL(30克,8芯/16个VCPU) 从属节点(3个)-c4.8XL(60克,18芯/36个VCPU) 编辑:我不能粘贴代码,但它相当简单 from spark_sklearn import GridSearchCV as sp_GridSearchCV grid_search = sp_GridSearchCV(sc, clf, parameters, n_job
from spark_sklearn import GridSearchCV as sp_GridSearchCV
grid_search = sp_GridSearchCV(sc, clf, parameters, n_jobs=-1, cv=cv, scoring='f1_macro',verbose=verbose)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, stratify=y)
grid_search.fit(X_train, y_train)
这里有更多[
我正在运行以下spark提交
spark-submit --executor-cores 7 --num-executors 4 --executor-memory 9G --master yarn --deploy-mode client <program_name.py>
spark submit--executor cores 7--num executors 4--executor memory 9G--master warn--deploy mode客户端
数据大小:6000行文本数据,仅此而已。它不够大
但我一直在下面这个错误中运行,这个错误源于标准的sklearn的gridsearch.py
以前有没有人遇到过这个错误,或者知道我做错了什么
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/filecache/11/__spark_libs__7590692825730242151.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/11/30 22:53:46 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 18282@ip-10-17-36-80
17/11/30 22:53:46 INFO SignalUtils: Registered signal handler for TERM
17/11/30 22:53:46 INFO SignalUtils: Registered signal handler for HUP
17/11/30 22:53:46 INFO SignalUtils: Registered signal handler for INT
17/11/30 22:53:47 INFO SecurityManager: Changing view acls to: yarn,hadoop
17/11/30 22:53:47 INFO SecurityManager: Changing modify acls to: yarn,hadoop
17/11/30 22:53:47 INFO SecurityManager: Changing view acls groups to:
17/11/30 22:53:47 INFO SecurityManager: Changing modify acls groups to:
17/11/30 22:53:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set()
17/11/30 22:53:47 INFO TransportClientFactory: Successfully created connection to /10.17.36.61:36396 after 53 ms (0 ms spent in bootstraps)
17/11/30 22:53:47 INFO SecurityManager: Changing view acls to: yarn,hadoop
17/11/30 22:53:47 INFO SecurityManager: Changing modify acls to: yarn,hadoop
17/11/30 22:53:47 INFO SecurityManager: Changing view acls groups to:
17/11/30 22:53:47 INFO SecurityManager: Changing modify acls groups to:
17/11/30 22:53:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set()
17/11/30 22:53:47 INFO TransportClientFactory: Successfully created connection to /10.17.36.61:36396 after 0 ms (0 ms spent in bootstraps)
17/11/30 22:53:47 INFO DiskBlockManager: Created local directory at /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/blockmgr-9390d584-49ab-407b-ab0d-c98f5cae1bfe
17/11/30 22:53:47 INFO MemoryStore: MemoryStore started with capacity 7.5 GB
17/11/30 22:53:47 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.17.36.61:36396
17/11/30 22:53:47 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
17/11/30 22:53:47 INFO Executor: Starting executor ID 1 on host ip-10-17-36-80.us-west-2.compute.internal
17/11/30 22:53:48 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44955.
17/11/30 22:53:48 INFO NettyBlockTransferService: Server created on ip-10-17-36-80.us-west-2.compute.internal:44955
17/11/30 22:53:48 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/11/30 22:53:48 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(1, ip-10-17-36-80.us-west-2.compute.internal, 44955, None)
17/11/30 22:53:48 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(1, ip-10-17-36-80.us-west-2.compute.internal, 44955, None)
17/11/30 22:53:48 INFO BlockManager: external shuffle service port = 7337
17/11/30 22:53:48 INFO BlockManager: Registering executor with local external shuffle service.
17/11/30 22:53:48 INFO TransportClientFactory: Successfully created connection to ip-10-17-36-80.us-west-2.compute.internal/10.17.36.80:7337 after 0 ms (0 ms spent in bootstraps)
17/11/30 22:53:48 INFO BlockManager: Initialized BlockManager: BlockManagerId(1, ip-10-17-36-80.us-west-2.compute.internal, 44955, None)
17/11/30 22:53:50 INFO CoarseGrainedExecutorBackend: Got assigned task 5
17/11/30 22:53:50 INFO CoarseGrainedExecutorBackend: Got assigned task 14
17/11/30 22:53:50 INFO CoarseGrainedExecutorBackend: Got assigned task 23
17/11/30 22:53:50 INFO CoarseGrainedExecutorBackend: Got assigned task 32
17/11/30 22:53:50 INFO CoarseGrainedExecutorBackend: Got assigned task 41
17/11/30 22:53:50 INFO Executor: Running task 5.0 in stage 0.0 (TID 5)
17/11/30 22:53:50 INFO Executor: Running task 23.0 in stage 0.0 (TID 23)
17/11/30 22:53:50 INFO Executor: Running task 14.0 in stage 0.0 (TID 14)
17/11/30 22:53:50 INFO Executor: Running task 32.0 in stage 0.0 (TID 32)
17/11/30 22:53:50 INFO Executor: Running task 41.0 in stage 0.0 (TID 41)
17/11/30 22:53:50 INFO Executor: Fetching spark://10.17.36.61:36396/files/main.py with timestamp 1512082429416
17/11/30 22:53:50 INFO TransportClientFactory: Successfully created connection to /10.17.36.61:36396 after 1 ms (0 ms spent in bootstraps)
17/11/30 22:53:50 INFO Utils: Fetching spark://10.17.36.61:36396/files/main.py to /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/spark-2016730f-a9a7-42f6-af43-e214e17a799c/fetchFileTemp5747466536720143705.tmp
17/11/30 22:53:50 INFO Utils: Copying /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/spark-2016730f-a9a7-42f6-af43-e214e17a799c/-7964204541512082429416_cache to /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/container_1512081909079_0001_01_000002/./main.py
17/11/30 22:53:50 INFO Executor: Fetching spark://10.17.36.61:36396/files/config.py with timestamp 1512082429407
17/11/30 22:53:50 INFO Utils: Fetching spark://10.17.36.61:36396/files/config.py to /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/spark-2016730f-a9a7-42f6-af43-e214e17a799c/fetchFileTemp1456561651760694351.tmp
17/11/30 22:53:50 INFO Utils: Copying /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/spark-2016730f-a9a7-42f6-af43-e214e17a799c/-13639922071512082429407_cache to /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/container_1512081909079_0001_01_000002/./config.py
17/11/30 22:53:50 INFO Executor: Fetching spark://10.17.36.61:36396/files/helper.py with timestamp 1512082429375
17/11/30 22:53:50 INFO Utils: Fetching spark://10.17.36.61:36396/files/helper.py to /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/spark-2016730f-a9a7-42f6-af43-e214e17a799c/fetchFileTemp3268157159972473452.tmp
17/11/30 22:53:50 INFO Utils: Copying /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/spark-2016730f-a9a7-42f6-af43-e214e17a799c/7433209651512082429375_cache to /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/container_1512081909079_0001_01_000002/./helper.py
17/11/30 22:53:50 INFO Executor: Fetching spark://10.17.36.61:36396/files/transformers.py with timestamp 1512082429412
17/11/30 22:53:50 INFO Utils: Fetching spark://10.17.36.61:36396/files/transformers.py to /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/spark-2016730f-a9a7-42f6-af43-e214e17a799c/fetchFileTemp5944429394790398969.tmp
17/11/30 22:53:50 INFO Utils: Copying /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/spark-2016730f-a9a7-42f6-af43-e214e17a799c/-3422273991512082429412_cache to /mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/container_1512081909079_0001_01_000002/./transformers.py
17/11/30 22:53:50 INFO TorrentBroadcast: Started reading broadcast variable 2
17/11/30 22:53:50 INFO TransportClientFactory: Successfully created connection to /10.17.36.61:42897 after 1 ms (0 ms spent in bootstraps)
17/11/30 22:53:50 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 9.9 KB, free 7.5 GB)
17/11/30 22:53:50 INFO TorrentBroadcast: Reading broadcast variable 2 took 99 ms
17/11/30 22:53:50 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 14.3 KB, free 7.5 GB)
17/11/30 22:53:51 INFO TorrentBroadcast: Started reading broadcast variable 0
17/11/30 22:53:51 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 226.1 KB, free 7.5 GB)
17/11/30 22:53:51 INFO TorrentBroadcast: Reading broadcast variable 0 took 9 ms
17/11/30 22:53:51 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 368.0 B, free 7.5 GB)
17/11/30 22:53:51 INFO TorrentBroadcast: Started reading broadcast variable 1
17/11/30 22:53:51 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 6.7 KB, free 7.5 GB)
17/11/30 22:53:51 INFO TorrentBroadcast: Reading broadcast variable 1 took 6 ms
17/11/30 22:53:51 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 368.0 B, free 7.5 GB)
/usr/local/lib64/python3.5/site-packages/sklearn/linear_model/logistic.py:1228: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = -1.
" = {}.".format(self.n_jobs))
17/11/30 23:35:37 ERROR Executor: Exception in task 32.0 in stage 0.0 (TID 32)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/container_1512081909079_0001_01_000002/pyspark.zip/pyspark/worker.py", line 177, in main
process()
File "/mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/container_1512081909079_0001_01_000002/pyspark.zip/pyspark/worker.py", line 172, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/mnt/yarn/usercache/hadoop/appcache/application_1512081909079_0001/container_1512081909079_0001_01_000002/pyspark.zip/pyspark/serializers.py", line 268, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/usr/local/lib/python3.5/site-packages/spark_sklearn/grid_search.py", line 319, in fun
return_parameters=True, error_score=error_score)
File "/usr/local/lib64/python3.5/site-packages/sklearn/model_selection/_validation.py", line 437, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/usr/local/lib64/python3.5/site-packages/sklearn/pipeline.py", line 257, in fit
Xt, fit_params = self._fit(X, y, **fit_params)
File "/usr/local/lib64/python3.5/site-packages/sklearn/pipeline.py", line 222, in _fit
**fit_params_steps[name])
File "/usr/local/lib64/python3.5/site-packages/sklearn/externals/joblib/memory.py", line 362, in __call__
return self.func(*args, **kwargs)
File "/usr/local/lib64/python3.5/site-packages/sklearn/pipeline.py", line 589, in _fit_transform_one
res = transformer.fit_transform(X, y, **fit_params)
File "/usr/local/lib64/python3.5/site-packages/sklearn/pipeline.py", line 746, in fit_transform
for name, trans, weight in self._iter())
File "/usr/local/lib64/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 789, in __call__
self.retrieve()
File "/usr/local/lib64/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 699, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/usr/lib64/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[(array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]]), Pipeline(memory=None,
steps=[('features', CustomFeatures()), ('tfidf', TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
lowercase=True, max_df=1.0, max_features=None, min_df=1,
ngram_range=(1, 2), norm='l2', prepr...ac3c18>), ('kbest', SelectPercentile(percentile=100, score_func=<function chi2 at 0x7f38fbf23f28>))]))]'. Reason: 'MemoryError()'
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
SLF4J:类路径包含多个SLF4J绑定。
SLF4J:在[jar:file:/mnt/thread/usercache/hadoop/filecache/11/\uuuu spark\u libs\uuuu 759069282573024151.zip/SLF4J-log4j12-1.7.16.jar!/org/SLF4J/impl/StaticLoggerBinder.class]中找到绑定
SLF4J:在[jar:file:/usr/lib/hadoop/lib/SLF4J-log4j12-1.7.10.jar!/org/SLF4J/impl/StaticLoggerBinder.class]中找到绑定
SLF4J:参见http://www.slf4j.org/codes.html#multiple_bindings 我需要一个解释。
SLF4J:实际绑定的类型为[org.SLF4J.impl.Log4jLoggerFactory]
17/11/30 22:53:46信息GrassedExecutorBackend:已启动进程名为的守护程序:18282@ip-10-17-36-80
17/11/30 22:53:46信息信号提示:术语的注册信号处理程序
17/11/30 22:53:46信息信号提示:HUP的注册信号处理程序
17/11/30 22:53:46信息信号处理器:INT的注册信号处理器
17/11/30 22:53:47信息安全管理器:将视图ACL更改为:Thread,hadoop
17/11/30 22:53:47信息安全管理器:将修改ACL更改为:yarn,hadoop
17/11/30 22:53:47信息安全管理器:将视图ACL组更改为:
17/11/30 22:53:47信息安全管理器:将修改ACL组更改为:
17/11/30 22:53:47信息安全管理器:安全管理器:禁用身份验证;禁用ui ACL;具有查看权限的用户:Set(纱线,hadoop);具有查看权限的组:Set();具有修改权限的用户:Set(纱线,hadoop);具有修改权限的组:Set()
17/11/30 22:53:47信息传输客户端工厂:53毫秒后成功创建到/10.17.36.61:36396的连接(在引导过程中花费0毫秒)
17/11/30 22:53:47信息安全管理器:将视图ACL更改为:Thread,hadoop
17/11/30 22:53:47信息安全管理器:将修改ACL更改为:yarn,hadoop
17/11/30 22:53:47信息安全管理器:将视图ACL组更改为:
17/11/30 22:53:47信息安全管理器:将修改ACL组更改为:
17/11/30 22:53:47信息安全管理器:安全管理器:禁用身份验证;禁用ui ACL;具有查看权限的用户:Set(纱线,hadoop);具有查看权限的组:Set();具有修改权限的用户:Set(纱线,hadoop);具有修改权限的组:Set()
17/11/30 22:53:47信息传输客户端工厂:0毫秒后成功创建到/10.17.36.61:36396的连接(在引导过程中花费0毫秒)
17/11/30 22:53:47信息DiskBlockManager:已在/mnt/warn/usercache/hadoop/appcache/application_15120819079_0001/blockmgr-9390d584-49ab-407b-ab0d-c98f5cae1bfe创建本地目录
17/11/30 22:53:47信息MemoryStore:MemoryStore以7.5 GB的容量启动
17/11/30 22:53:47信息粗粒度执行器后端:连接到驱动程序:spark://CoarseGrainedScheduler@10.17.36.61:36396
17/11/30 22:53:47信息GrossGrainedExecutorBackend:已成功向驱动程序注册
17/11/30 22:53:47信息执行器:正在主机ip-10-17-36-80.us-west-2.compute.internal上启动执行器ID 1
17/11/30 22:53:48信息实用程序:已在端口44955上成功启动服务“org.apache.spark.network.netty.NettyBlockTransferService”。
17/11/30 22:53:48信息NettyBlockTransferService:在ip-10-17-36-80上创建的服务器。us-west-2。计算。内部:44955
17/11/30 22:53:48信息块管理器:使用org.apache.spark.storage.RandomBlockReplicationPolicy执行块复制策略
17/11/30 22:53:48信息BlockManagerMaster:注册BlockManager BlockManagerId(1,ip-10-17-36-80.us-west-2.compute.internal,44955,无)
17/11/30 22:53:48信息BlockManagerMaster:Registered BlockManager BlockManagerRid(1,ip-10-17-36-80.us-west-2.compute.internal,44955,无)
17/11/30 22:53:48信息块管理器:外部随机播放服务端口=7337
17/11/30 22:53:48信息区块管理器:向本地外部洗牌服务注册执行者。
17/11/30 22:53:48信息传输客户端工厂:0毫秒后成功创建到ip-10-17-36-80.us-west-2.compute.internal/10.17.36.80:7337的连接(引导过程中花费0毫秒)
17/11/30 22:53:48信息区块管理器:初始化区块管理器:区块管理器ID(1,ip-10-17-36-80.us-west-2.compute.internal,44955,无)
17/11/30 22:53:50信息粗粒度执行器后端:已分配任务5
17/11/30 22:53:50信息粗粒度执行器后端:已分配任务14
17/11/30 22:53:50信息粗粒度执行器后端:已分配任务23
17/11/30 22:53:50信息粗粒度执行器后端:已分配任务32
17/11/30 22:53:50信息粗粒度执行器后端:已分配任务41
17/11/30 22:53:50信息执行者:在阶段0.0(TID 5)中运行任务5.0
17/11/30 22:53:50信息执行者:在阶段0.0(TID 23)中运行任务23.0
17/11/30 22:53:50信息执行者:在阶段0.0(TID 14)中运行任务14.0
17/11/30 22:53:50信息执行者:在阶段0.0(TID 32)中运行任务32.0
17/11/30 22:53:50信息执行者:在阶段0.0(TID 41)中运行任务41.0
17/11/30 22:53:50信息执行者:获取spark://10.17.36.61:36396/files/main.py 时间戳为1512082429416
17/11/30 22:53:50信息传输客户端工厂:1毫秒后成功创建到/10.17.36.61:36396的连接