Java Spark在kafka中写入数据集,启用KryoSerializer
我想写一个卡夫卡主题数据集json 我有数据集对象,我将其转换为数据集字符串,其中字符串包含json对象。这是我在一个主题中记录下来的。以前,所有内容都记录得很好,但在添加一个字段后,开始出现异常。 我想尝试连接KryoSerializer,但我做不到 型号: SparkConf:Java Spark在kafka中写入数据集,启用KryoSerializer,java,json,apache-spark,serialization,apache-kafka,Java,Json,Apache Spark,Serialization,Apache Kafka,我想写一个卡夫卡主题数据集json 我有数据集对象,我将其转换为数据集字符串,其中字符串包含json对象。这是我在一个主题中记录下来的。以前,所有内容都记录得很好,但在添加一个字段后,开始出现异常。 我想尝试连接KryoSerializer,但我做不到 型号: SparkConf: new SparkConf() .setMaster("local[*]") .set("spark.executor.memory", "2G")
new SparkConf()
.setMaster("local[*]")
.set("spark.executor.memory", "2G")
.set("spark.driver.memory", "2G")
.set("spark.sql.shuffle.partitions", "20")
.set("spark.files.maxPartitionBytes", "64000000")
.set("spark.kryo.registrationRequired", "true")
.set("spark.serializer", KryoSerializer.class.getCanonicalName())
.set("es.batch.size.entries", "1500")
.set("spark.kryo.registrator", "net.***.core.configuration.CustomKryoRegistrator")
客户登记员:
public void registerClasses(Kryo kryo) {
kryo.register(StructType[].class);
kryo.register(StructType.class);
kryo.register(StructField[].class);
kryo.register(StructField.class);
kryo.register(IntegerType$.class);
kryo.register(Metadata.class);
kryo.register(StringType$.class);
kryo.register(LongType$.class);
kryo.register(BooleanType$.class);
kryo.register(ArrayType.class);
kryo.register(BooleanWritable.class);
kryo.register(ByteWritable.class);
kryo.register(DoubleWritable.class);
kryo.register(FloatWritable.class);
kryo.register(IntWritable.class);
kryo.register(LongWritable.class);
kryo.register(NullWritable.class);
kryo.register(ArrayWritable.class);
kryo.register(Text.class);
kryo.register(CounterObject.class);
kryo.register(ViewabilityObject.class);
kryo.register(ViewabilityObjectCH.class);
kryo.register(ViewabilityAggregatedObjectCH.class);
}
例外情况
ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 11)
java.lang.NegativeArraySizeException
at org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:297)
at org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:1214)
at org.apache.spark.sql.catalyst.json.JacksonGenerator$$anonfun$org$apache$spark$sql$catalyst$json$JacksonGenerator$$makeWriter$9.apply(JacksonGenerator.scala:112)
at org.apache.spark.sql.catalyst.json.JacksonGenerator$$anonfun$org$apache$spark$sql$catalyst$json$JacksonGenerator$$makeWriter$9.apply(JacksonGenerator.scala:111)
at org.apache.spark.sql.catalyst.json.JacksonGenerator.org$apache$spark$sql$catalyst$json$JacksonGenerator$$writeFields(JacksonGenerator.scala:176)
at org.apache.spark.sql.catalyst.json.JacksonGenerator$$anonfun$write$1.apply$mcV$sp(JacksonGenerator.scala:228)
at org.apache.spark.sql.catalyst.json.JacksonGenerator.org$apache$spark$sql$catalyst$json$JacksonGenerator$$writeObject(JacksonGenerator.scala:165)
at org.apache.spark.sql.catalyst.json.JacksonGenerator.write(JacksonGenerator.scala:228)
at org.apache.spark.sql.Dataset$$anonfun$toJSON$1$$anon$1.next(Dataset.scala:3203)
at org.apache.spark.sql.Dataset$$anonfun$toJSON$1$$anon$1.next(Dataset.scala:3200)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.kafka010.KafkaWriteTask.execute(KafkaWriteTask.scala:45)
at org.apache.spark.sql.kafka010.KafkaWriter$$anonfun$write$1$$anonfun$apply$1.apply$mcV$sp(KafkaWriter.scala:89)
at org.apache.spark.sql.kafka010.KafkaWriter$$anonfun$write$1$$anonfun$apply$1.apply(KafkaWriter.scala:89)
at org.apache.spark.sql.kafka010.KafkaWriter$$anonfun$write$1$$anonfun$apply$1.apply(KafkaWriter.scala:89)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.sql.kafka010.KafkaWriter$$anonfun$write$1.apply(KafkaWriter.scala:89)
at org.apache.spark.sql.kafka010.KafkaWriter$$anonfun$write$1.apply(KafkaWriter.scala:87)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
更新:
在这个方法中是负数,不清楚在哪里
public byte[] getBytes() {
// avoid copy if `base` is `byte[]`
if (offset == BYTE_ARRAY_OFFSET && base instanceof byte[]
&& ((byte[]) base).length == numBytes) {
return (byte[]) base;
} else {
byte[] bytes = new byte[numBytes];
copyMemory(base, offset, bytes, BYTE_ARRAY_OFFSET, numBytes);
return bytes;
}
}
调试:
this = Method threw 'java.lang.NegativeArraySizeException' exception. Cannot evaluate org.apache.spark.unsafe.types.UTF8String.toString()
numBytes = -84627042
offset = 378
((byte[]) base).length = 2424
base
public byte[] getBytes() {
// avoid copy if `base` is `byte[]`
if (offset == BYTE_ARRAY_OFFSET && base instanceof byte[]
&& ((byte[]) base).length == numBytes) {
return (byte[]) base;
} else {
byte[] bytes = new byte[numBytes];
copyMemory(base, offset, bytes, BYTE_ARRAY_OFFSET, numBytes);
return bytes;
}
}
this = Method threw 'java.lang.NegativeArraySizeException' exception. Cannot evaluate org.apache.spark.unsafe.types.UTF8String.toString()
numBytes = -84627042
offset = 378
((byte[]) base).length = 2424
base