Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark AvroDeSerialization在使用sum派生列时失败,但在使用count派生同一列时成功。序列化数据在kafka中_Apache Spark_Apache Kafka_Avro_Spark Avro - Fatal编程技术网

Apache spark AvroDeSerialization在使用sum派生列时失败,但在使用count派生同一列时成功。序列化数据在kafka中

Apache spark AvroDeSerialization在使用sum派生列时失败,但在使用count派生同一列时成功。序列化数据在kafka中,apache-spark,apache-kafka,avro,spark-avro,Apache Spark,Apache Kafka,Avro,Spark Avro,下面是我的SQL,它可以工作: select hostnetworkid,roamertype,carrierid, total_failure,total_count,date_format(timestamp(unix_timestamp(window.start)),\"yyyyMMdd\") as eventdate, date_format(timestamp(unix_timestamp(window.start)),\"HH:mm\")

下面是我的SQL,它可以工作:

  select hostnetworkid,roamertype,carrierid, total_failure,total_count,date_format(timestamp(unix_timestamp(window.start)),\"yyyyMMdd\") as eventdate, date_format(timestamp(unix_timestamp(window.start)),\"HH:mm\") as start from (select hostnetworkid,roamertype,carrierid,window(event_timestamp, '#window'),count(case when status=1 then 1 else 0 end) as total_failure ,count(*) as total_count from #kpi group by hostnetworkid,roamertype,carrierid,window(event_timestamp, '#window'))
以下是我在avro中提供ArrayIndexOutOfBoundsException的SQL:

select hostnetworkid,roamertype,carrierid, total_failure,total_count,date_format(timestamp(unix_timestamp(window.start)),\"yyyyMMdd\") as eventdate, date_format(timestamp(unix_timestamp(window.start)),\"HH:mm\") as start from (select hostnetworkid,roamertype,carrierid,window(event_timestamp, '#window'),sum(case when status=1 then 1 else 0 end) as total_failure ,count(*) as total_count from #kpi group by hostnetworkid,roamertype,carrierid,window(event_timestamp, '#window'))
有人能帮上忙吗?为什么用下面的avro模式反序列化count可以工作,但对sum不起作用。这是我的avro模式文件

{"record","name":"MapKpi7","namespace":"com.mobileum",
              "fields":[{"name":"hostnetworkid","type":["int","null"]},{"name":"roamertype","type":["int","null"]}, {"name":"carrierid","type":["int","null"]}, {"name":"total_failure","type":"long"},{"name":"total_count","type":"long"},{"name":"eventdate","type":["string","null"]},{"name":"start","type":["string","null"]}]}
下面是堆栈跟踪:

java.lang.ArrayIndexOutOfBoundsException: 3
        at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:402)
        at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)

通过将total_failure schema定义为一个联合来解决此问题: {“name”:“total_failure”,“type”:[“long”,“null”]} 而不是 {“name”:“total_failure”,“type”:“long”}