Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/apache-kafka/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache kafka 2019年8月-卡夫卡消费者程序滞后_Apache Kafka_Spring Kafka_Kafka Python - Fatal编程技术网

Apache kafka 2019年8月-卡夫卡消费者程序滞后

Apache kafka 2019年8月-卡夫卡消费者程序滞后,apache-kafka,spring-kafka,kafka-python,Apache Kafka,Spring Kafka,Kafka Python,我们是否有任何方法可以通过编程在卡夫卡消费者中发现滞后。 我不想在仪表板上安装和检查外部Kafka Manager工具 我们可以列出所有消费群体,并检查每个群体的滞后性 目前我们确实有命令来检查滞后,它需要卡夫卡所在的相对路径 Spring Kafka、Kafka python、Kafka管理客户端或使用JMX—我们是否有任何方法可以编写代码并找出滞后 我们粗心大意,没有监控过程,消费者处于僵尸状态,延迟达到50000,这导致了很多混乱 只有当问题出现时,我们才会考虑这些情况,因为我们正在监视脚

我们是否有任何方法可以通过编程在卡夫卡消费者中发现滞后。 我不想在仪表板上安装和检查外部Kafka Manager工具

我们可以列出所有消费群体,并检查每个群体的滞后性

目前我们确实有命令来检查滞后,它需要卡夫卡所在的相对路径

Spring Kafka、Kafka python、Kafka管理客户端或使用JMX—我们是否有任何方法可以编写代码并找出滞后

我们粗心大意,没有监控过程,消费者处于僵尸状态,延迟达到50000,这导致了很多混乱

只有当问题出现时,我们才会考虑这些情况,因为我们正在监视脚本,但不知道它会导致僵尸进程


任何想法都是非常受欢迎的

是的。我们可以在kafka python中获得消费者延迟。不确定这是否是最好的方法。但这是可行的

目前我们正在手动提供我们的消费者,您也可以从kafka python获得消费者,但它只提供活动消费者的列表。所以,如果你的一个消费者情绪低落。它可能不会出现在列表中

首先建立客户端连接

from kafka import BrokerConnection
from kafka.protocol.commit import *
import socket

#This takes in only one broker at a time. So to use multiple brokers loop through each one by giving broker ip and port.

def establish_broker_connection(server, port, group):
    '''
    Client Connection to each broker for getting consumer offset info
    '''
    bc = BrokerConnection(server, port, socket.AF_INET)
    bc.connect_blocking()
    fetch_offset_request = OffsetFetchRequest_v3(group, None)
    future = bc.send(fetch_offset_request)
接下来,我们需要获得消费者订阅的每个主题的当前偏移量。在这里传递上述未来和bc

from kafka import SimpleClient
from kafka.protocol.offset import OffsetRequest, OffsetResetStrategy
from kafka.common import OffsetRequestPayload

def _get_client_connection():
    '''
    Client Connection to the cluster for getting topic info
    '''
    # Give comma seperated info of kafka broker "broker1:port1, broker2:port2'
    client = SimpleClient(BOOTSTRAP_SEREVRS)
    return client

def get_latest_offset_for_topic(self, topic):
    '''
    To get latest offset for a topic
    '''
    partitions = self.client.topic_partitions[topic]
    offset_requests = [OffsetRequestPayload(topic, p, -1, 1) for p in partitions.keys()]
    client = _get_client_connection()
    offsets_responses = client.send_offset_request(offset_requests)
    latest_offset = offsets_responses[0].offsets[0]
    return latest_offset # Gives latest offset for topic

def get_current_offset_for_consumer_group(future, bc):
    '''
    Get current offset info for a consumer group
    '''
    while not future.is_done:
        for resp, f in bc.recv():
            f.success(resp)

    # future.value.topics -- This will give all the topics in the form of a list.
    for topic in self.future.value.topics:
        latest_offset = self.get_latest_offset_for_topic(topic[0])
        for partition in topic[1]:
            offset_difference = latest_offset - partition[1]
offset_difference给出主题中生成的最后一个偏移量与消费者使用的最后一个偏移量(或消息)之间的差异

如果您没有获得某个主题的消费者的当前偏移量,则意味着您的消费者可能处于下降状态


因此,如果偏移量差异超过您想要的阈值,或者如果您为您的消费者获得空偏移量,您可以发出警报或发送邮件

java客户机暴露了其消费者相对于JMX的延迟;在这个例子中,我们有5个分区


Spring Boot可以将这些发布到千分尺。

您可以使用kafka python获得这些,在每个代理上运行这些,或者循环遍历代理列表,它将给出所有主题分区

BOOTSTRAP_SERVERS = '{}'.format(socket.gethostbyname(socket.gethostname()))
client = BrokerConnection(BOOTSTRAP_SERVERS, 9092, socket.AF_INET)
client.connect_blocking()
list_groups_request = ListGroupsRequest_v1()
future = client.send(list_groups_request)
while not future.is_done:
    for resp, f in client.recv():
      f.success(resp)
for group in future.value.groups:
    if group[1] == 'consumer':
      #print(group[0])
      list_mebers_in_groups = DescribeGroupsRequest_v1(groups=[(group[0])])
      future = client.send(list_mebers_in_groups)
      while not future.is_done:
        for resp, f in client.recv():
          #print resp
          f.success(resp)
          (error_code, group_id, state, protocol_type, protocol, members) = future.value.groups[0]
          if len(members) !=0:
            for member in members:
              (member_id, client_id, client_host, member_metadata, member_assignment) = member
              member_topics_assignment = []
              for (topic, partitions) in MemberAssignment.decode(member_assignment).assignment:
                member_topics_assignment.append(topic)

              for topic in member_topics_assignment:
                consumer = KafkaConsumer(
                          bootstrap_servers=BOOTSTRAP_SERVERS,
                          group_id=group[0],
                          enable_auto_commit=False
                          )
                consumer.topics()

                for p in consumer.partitions_for_topic(topic):
                  tp = TopicPartition(topic, p)
                  consumer.assign([tp])
                  committed = consumer.committed(tp)
                  consumer.seek_to_end(tp)
                  last_offset = consumer.position(tp)
                  if last_offset != None and committed != None:
                    lag = last_offset - committed
                    print "group: {} topic:{} partition: {} lag: {}".format(group[0], topic, p, lag)

                consumer.close(autocommit=False)

我正在用scala编写代码,但只使用来自
KafkaConsumer
KafkaProducer
的本机java API

您只需要知道消费者组的名称和主题。 可以避免预定义的主题,但这样您将只得到存在且状态为
稳定的消费者组的延迟,而不是重新平衡,这可能是一个警报问题。
因此,您真正需要了解和使用的是:

  • KafkaConsumer.committed
    -返回主题分区的最新提交偏移量
  • KafkaConsumer.assign
    -不要使用subscribe,因为它会导致CG重新平衡。您绝对不希望您的监控过程影响监控主题
  • kafkaConsumer.endOffsets
    -返回最新生成的偏移量
  • 消费群体滞后
    -是最新提交和最新生产的产品之间的差异

  • 嗨,加里,如果你能在卡夫卡春季讲座/视频中展示这一点,那就太好了!!这与春天无关;MBean由卡夫卡客户端导出。Spring Boot只有钩子来读取这些MBean并将它们发布到千分尺。但是python消费者会发生什么呢?然后我的jconsole没有显示kafka.consumer,只是在我的Spring kafka消费者显示的情况下……我遗漏了什么吗?没有;java客户端将只为其自己的使用者导出MBean。那么我们可以为这些使用者做什么,我已经看到了您在java程序中调用使用者组descripe命令并加载滞后值的答案……但是我们没有其他方法可以实现它???嗨,我已经编写了一个python代码,它将执行Kafka命令(来自Kafka目录)并将获得消费者组,然后下一段代码是遍历每个消费者组并使用python为每个组执行descripe命令,然后获取Lag列并根据主题打印。[缺陷代码需要是Kafka目录的本机代码,需要执行shell命令以获取详细信息!]@ArpanSharma
    kafka消费组.sh
    有问题。此命令仅返回
    stable
    消费组的结果。如果
    rebalance
    中的消费组或所有实例关闭,该命令将不返回任何内容。在当前版本中,
    SimpleClient
    已弃用
    
    
    import java.util.{Properties, UUID}
    
    import org.apache.kafka.clients.consumer.KafkaConsumer
    import org.apache.kafka.clients.producer.KafkaProducer
    import org.apache.kafka.common.TopicPartition
    import org.apache.kafka.common.serialization.{StringDeserializer, StringSerializer}
    
    import scala.collection.JavaConverters._
    import scala.util.Try
    
    case class TopicPartitionInfo(topic: String, partition: Long, currentPosition: Long, endOffset: Long) {
      val lag: Long = endOffset - currentPosition
    
      override def toString: String = s"topic=$topic,partition=$partition,currentPosition=$currentPosition,endOffset=$endOffset,lag=$lag"
    }
    
    case class ConsumerGroupInfo(consumerGroup: String, topicPartitionInfo: List[TopicPartitionInfo]) {
      override def toString: String = s"ConsumerGroup=$consumerGroup:\n${topicPartitionInfo.mkString("\n")}"
    }
    
    object ConsumerLag {
    
      def consumerGroupInfo(bootStrapServers: String, consumerGroup: String, topics: List[String]) = {
        val properties = new Properties()
        properties.put("bootstrap.servers", bootStrapServers)
        properties.put("auto.offset.reset", "latest")
        properties.put("group.id", consumerGroup)
        properties.put("key.deserializer", classOf[StringDeserializer])
        properties.put("value.deserializer", classOf[StringDeserializer])
        properties.put("key.serializer", classOf[StringSerializer])
        properties.put("value.serializer", classOf[StringSerializer])
        properties.put("client.id", UUID.randomUUID().toString)
    
        val kafkaProducer = new KafkaProducer[String, String](properties)
        val kafkaConsumer = new KafkaConsumer[String, String](properties)
        val assignment = topics
          .map(topic => kafkaProducer.partitionsFor(topic).asScala)
          .flatMap(partitions => partitions.map(p => new TopicPartition(p.topic, p.partition)))
          .asJava
        kafkaConsumer.assign(assignment)
    
        ConsumerGroupInfo(consumerGroup,
          kafkaConsumer.endOffsets(assignment).asScala
            .map { case (tp, latestOffset) =>
              TopicPartitionInfo(tp.topic,
                tp.partition,
                Try(kafkaConsumer.committed(tp)).map(_.offset).getOrElse(0), // TODO Warn if Null, Null mean Consumer Group not exist
                latestOffset)
            }
            .toList
        )
    
      }
    
      def main(args: Array[String]): Unit = {
        println(
          consumerGroupInfo(
            bootStrapServers = "kafka-prod:9092",
            consumerGroup = "not-exist",
            topics = List("events", "anotherevents")
          )
        )
    
        println(
          consumerGroupInfo(
            bootStrapServers = "kafka:9092",
            consumerGroup = "consumerGroup1",
            topics = List("events", "anotehr events")
          )
        )
      }
    }