Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Amazon web services Glue-调用getDynamicFrame时出错_Amazon Web Services_Apache Spark_Pyspark_Apache Spark Sql_Aws Glue - Fatal编程技术网

Amazon web services Glue-调用getDynamicFrame时出错

Amazon web services Glue-调用getDynamicFrame时出错,amazon-web-services,apache-spark,pyspark,apache-spark-sql,aws-glue,Amazon Web Services,Apache Spark,Pyspark,Apache Spark Sql,Aws Glue,我正在使用Glue将数据从Glue目录中的一个表传输到RDS实例中的另一个表。以下是用于连接到Glue catalog表的代码段 import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.jo

我正在使用Glue将数据从Glue目录中的一个表传输到RDS实例中的另一个表。以下是用于连接到Glue catalog表的代码段

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "dev", table_name = "tbl", transformation_ctx = "datasource0")
............
job.commit()
请注意,glue目录表中有数据,甚至这些数据都是由雅典娜验证的。但我一再犯下错误

File "script_2019-05-16-16-17-26.py", line 20, in <module>
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "dev", table_name = "tbl", transformation_ctx = "datasource0")
File "/mnt/yarn/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/PyGlue.zip/awsglue/dynamicframe.py", line 570, in from_catalog
File "/mnt/yarn/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/PyGlue.zip/awsglue/context.py", line 138, in create_dynamic_frame_from_catalog
File "/mnt/yarn/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/PyGlue.zip/awsglue/data_source.py", line 36, in getFrame
File "/mnt/yarn/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/mnt/yarn/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/mnt/yarn/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o63.getDynamicFrame.
: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:540)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:374)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:316)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:187)
at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$getFileReader$2.apply(OrcFileOperator.scala:68)
文件“script_2019-05-16-16-17-26.py”,第20行,在
datasource0=glueContext。从目录(database=“dev”,table=“tbl”,transformation\u ctx=“datasource0”)创建动态框架
文件“/mnt/thread/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/PyGlue.zip/awsglue/dynamicframe.py”,第570行,在from_目录中
文件“/mnt/thread/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/PyGlue.zip/awsglue/context.py”,第138行,从目录创建动态框架
文件“/mnt/thread/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/PyGlue.zip/awsglue/data_source.py”,第36行,在getFrame中
文件“/mnt/thread/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py”,第1133行,在u调用中__
文件“/mnt/thread/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/pyspark.zip/pyspark/sql/utils.py”,第63行,装饰
文件“/mnt/thread/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py”,第319行,在get_return_值中
py4j.protocol.Py4JJavaError:调用o63.getDynamicFrame时出错。
:java.lang.IndexOutOfBoundsException
位于java.nio.Buffer.checkIndex(Buffer.java:540)
位于java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
位于org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:374)
位于org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:316)
位于org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:187)
位于org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$getFileReader$2.apply(OrcFileOperator.scala:68)

glue job的IAM角色具有与S3FullAccess、GlueFullAccess和CloudWatchLogFullAccess相关的策略

我在连接到RDS时遇到类似的问题,解决方案在此处“”。AWS Glue支持每个作业或开发端点一个连接。如果在作业中指定多个连接,AWS Glue将仅使用第一个连接