Apache kafka 卡夫卡+；Spark Streaming-Spark.Streaming.kafka.maxRatePerPartition已被破坏_Apache Kafka_Spark Streaming

Apache kafka 卡夫卡+；Spark Streaming-Spark.Streaming.kafka.maxRatePerPartition已被破坏

apache-kafka

Apache kafka 卡夫卡+；Spark Streaming-Spark.Streaming.kafka.maxRatePerPartition已被破坏,apache-kafka,spark-streaming,Apache Kafka,Spark Streaming,我有一个流媒体应用程序，已经运行了大约一周了。我设置了spark.streaming.kafka.maxRatePerPartition来限制记录消解的速率我理解应用程序应有效地接收最多批间隔秒*spark.streaming.kafka.maxRatePerPartition*每个批的分区这在几天内工作得很好，负载经常达到最大值。然后，在一系列0记录的批处理之后，应用程序突然超出了最大值50%左右那么，有几个问题：在什么条件下spark/kafka可以忽略spark.streaming

我有一个流媒体应用程序，已经运行了大约一周了。我设置了

spark.streaming.kafka.maxRatePerPartition

来限制记录消解的速率

我理解应用程序应有效地接收最多

批间隔秒*spark.streaming.kafka.maxRatePerPartition*每个批的分区
这在几天内工作得很好，负载经常达到最大值。然后，在一系列0记录的批处理之后，应用程序突然超出了最大值50%左右
那么，有几个问题：
在什么条件下spark/kafka可以忽略spark.streaming.kafka.maxRatePerPartition
背压是否有助于缓解此问题？我很难理解背压，就像你有一个最大速率spark.streaming.kafka.maxRatePerPartition

你不应该有你没有准备好的峰值

我有完全相同的问题，你找到了如何防止吗？我从来没有发现为什么这是可能的，但在调整后，问题消失了，使更多的执行器具有更少的内核和内存，这使进程通常更快、更稳定。我还设置了一个检测机制，它将调用纱线api来终止作业，并在作业出现问题时重新启动它。