Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/360.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将数据帧转换为动态帧时发生AWS粘合错误_Python_Pyspark_Apache Spark Sql_Pyspark Sql_Aws Glue - Fatal编程技术网

Python 将数据帧转换为动态帧时发生AWS粘合错误

Python 将数据帧转换为动态帧时发生AWS粘合错误,python,pyspark,apache-spark-sql,pyspark-sql,aws-glue,Python,Pyspark,Apache Spark Sql,Pyspark Sql,Aws Glue,下面是我的代码,我试图从其他两个数据帧的左连接结果集中创建一个新的数据帧,然后尝试将其转换为动态帧 dfs = sqlContext.read.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option("query", "SELECT hashkey as hash From randomtable").load() #Source datasource0 = glueContext.create_dynamic_frame.from_

下面是我的代码,我试图从其他两个数据帧的左连接结果集中创建一个新的数据帧,然后尝试将其转换为动态帧

dfs = sqlContext.read.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option("query", "SELECT hashkey as hash From randomtable").load()

#Source
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "test", table_name = "randomtable", transformation_ctx = "datasource0")

#add hash value
df = datasource0.toDF()
df.cache()
df = df.withColumn("hashkey", sha2(concat_ws("||", *df.columns), 256))

#drop dupes
df1 = df.dropDuplicates(subset=['hashkey'])

#read incremental data
inc = df1.join(dfs, df1["hashkey"] == dfs["hash"], how='left').filter(col('hash').isNull())

#convert it back to glue context
datasource1 = DynamicFrame.fromDF(inc, glueContext, "datasource1")
下面是我在尝试将数据帧转换为动态帧时遇到的错误

dfs = sqlContext.read.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option("query", "SELECT hashkey as hash From randomtable").load()

#Source
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "test", table_name = "randomtable", transformation_ctx = "datasource0")

#add hash value
df = datasource0.toDF()
df.cache()
df = df.withColumn("hashkey", sha2(concat_ws("||", *df.columns), 256))

#drop dupes
df1 = df.dropDuplicates(subset=['hashkey'])

#read incremental data
inc = df1.join(dfs, df1["hashkey"] == dfs["hash"], how='left').filter(col('hash').isNull())

#convert it back to glue context
datasource1 = DynamicFrame.fromDF(inc, glueContext, "datasource1")
datasource1=DynamicFrame.fromDF(inc,glueContext,“datasource1”) 文件 “/mnt/warn/usercache/root/appcache/application_1560272525947_0002/container_1560272525947_0002_01_000001/PyGlue.zip/awsglue/dynamicframe.py”

第150行,fromDF格式 文件“/mnt/thread/usercache/root/appcache/application_1560272525947_0002/container_1560272525947_0002_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py”, 第1133行,在调用中 文件“/mnt/thread/usercache/root/appcache/application\u 1560272525947\u 0002/container\u 1560272525947\u 0002\u 01\u000001/pyspark.zip/pyspark/sql/utils.py”, 63号线,装饰风格 文件“/mnt/thread/usercache/root/appcache/application_1560272525947_0002/container_1560272525947_0002_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py”, 第319行,在get_return_值中 py4j.protocol.Py4JJavaError:调用z:com.amazonaws.services.glue.DynamicFrame.apply时出错。 :java.lang.NoSuchMethodError:org.apache.spark.sql.catalyst.expressions.AttributeReference.(Ljava/lang/String;Lorg/apache/spark/sql/types/DataType;ZLorg/apache/spark/sql/types/Metadata;Lorg/apache/spark/sql/catalyst/expressions/ExprId;Lscala/collection/Seq;)V 在net.snowflake.spark.snowflake.pushdowns.querygeneration.queryhelp$$anonfun$8.apply(queryhelp.scala:66) 在net.snowflake.spark.snowflake.pushdowns.querygeneration.queryhelp$$anonfun$8.apply(queryhelp.scala:65) 在scala.collection.TraversableLike$$anonfun$map$1.apply处(TraversableLike.scala:234) 在scala.collection.TraversableLike$$anonfun$map$1.apply处(TraversableLike.scala:234) 位于scala.collection.immutable.List.foreach(List.scala:381) 位于scala.collection.TraversableLike$class.map(TraversableLike.scala:234) 位于scala.collection.immutable.List.map(List.scala:285) at net.snowflake.spark.snowflake.pushdown.querygeneration.QueryHelper.(QueryHelper.scala:64) 在net.snowflake.spark.snowflake.pushdowns.querygeneration.SourceQuery.(SnowflakeQuery.scala:100) 在net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder.net$snowflake$spark$snowflake$pushdowns$querygeneration$QueryBuilder$$generateSeries(QueryBuilder.scala:98) net.snowflake.spark.snowflake.pushdown.querygeneration.QueryBuilder.liftedTree1$1(QueryBuilder.scala:63) net.snowflake.spark.snowflake.pushdown.querygeneration.QueryBuilder.treeRoot$lzycompute(QueryBuilder.scala:61) at net.snowflake.spark.snowflake.pushdown.querygeneration.QueryBuilder.treeRoot(QueryBuilder.scala:60) net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder.tryBuild$lzycompute(QueryBuilder.scala:34) at net.snowflake.spark.snowflake.pushdown.querygeneration.QueryBuilder.tryBuild(QueryBuilder.scala:33) net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder$.getRDDFromPlan(QueryBuilder.scala:179) at net.snowflake.spark.snowflake.pushdowns.SnowflakeStrategy.buildQueryRDD(SnowflakeStrategy.scala:42) at net.snowflake.spark.snowflake.pushdowns.SnowflakeStrategy.apply(SnowflakeStrategy.scala:24) 在org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply上(QueryPlanner.scala:62) 在org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply上(QueryPlanner.scala:62) 位于scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) 位于scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) 位于scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) 位于org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92) 在org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77) 在org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74) 在scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply处(TraversableOnce.scala:157) 在scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply处(TraversableOnce.scala:157) 位于scala.collection.Iterator$class.foreach(Iterator.scala:893) 位于scala.collection.AbstractIterator.foreach(迭代器.scala:1336) 位于scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) 位于scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336) 在org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply上(QueryPlanner.scala:74) 在org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply上(QueryPlanner.scala:66) 位于scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) 位于scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) 位于org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92) 在org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77) 在org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74) 在scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply处(TraversableOnce.scala:157) 在scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply处(TraversableOnce.scala:157) 位于scala.collection.Iterator$class.foreach(Iterator.scala:893) 位于scala.collection.AbstractIterator.foreach(迭代器.scala:1336) 位于scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) 位于scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336) 在org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply上(QueryPlanner.scala:74) 在org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryP