Java 性能问题：卡夫卡&x2B；风暴&x2B；三叉戟&x2B；卡夫卡普特乳白色_Java_Apache Kafka_Apache Storm_Trident

Java 性能问题：卡夫卡&x2B；风暴&x2B；三叉戟&x2B；卡夫卡普特乳白色

java apache-kafka apache-storm

Java 性能问题：卡夫卡&x2B；风暴&x2B；三叉戟&x2B；卡夫卡普特乳白色,java,apache-kafka,apache-storm,trident,Java,Apache Kafka,Apache Storm,Trident,我们看到卡夫卡+风暴+三叉戟+不透明三叉戟卡夫卡的一些性能问题以下是我们的设置详细信息：风暴拓扑： Broker broker = Broker.fromString("localhost:9092") GlobalPartitionInformation info = new GlobalPartitionInformation() if(args[4]){ int partitionCount = args[4].toInteger() f

我们看到卡夫卡+风暴+三叉戟+不透明三叉戟卡夫卡的一些性能问题

以下是我们的设置详细信息：

风暴拓扑：

Broker broker = Broker.fromString("localhost:9092")
    GlobalPartitionInformation info = new GlobalPartitionInformation()
    if(args[4]){
        int partitionCount = args[4].toInteger()
        for(int i =0;i<partitionCount;i++){
            info.addPartition(i, broker)
        }
    }
    StaticHosts hosts = new StaticHosts(info)
    TridentKafkaConfig tridentKafkaConfig = new TridentKafkaConfig(hosts,"test")
    tridentKafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme())


    OpaqueTridentKafkaSpout kafkaSpout = new OpaqueTridentKafkaSpout(tridentKafkaConfig)
    TridentTopology topology = new TridentTopology()
    Stream st  = topology.newStream("spout1", kafkaSpout).parallelismHint(args[2].toInteger())
            .each(kafkaSpout.getOutputFields(), new NEO4JTridentFunction(), new Fields("status"))
            .parallelismHint(args[1].toInteger())
    Map conf = new HashMap()
    conf.put(Config.TOPOLOGY_WORKERS, args[3].toInteger())
    conf.put(Config.TOPOLOGY_DEBUG, false)

    if (args[0] == "local") {
        LocalCluster cluster = new LocalCluster()
        cluster.submitTopology("mytopology", conf, topology.build())
    } else {
        StormSubmitter.submitTopology("mytopology", conf, topology.build())
        NEO4JTridentFunction.getGraphDatabaseService().shutdown()
    }

用卡夫卡生成的每条消息的大小：11KB
每个螺栓（NEO4JTridentFunction）处理数据的执行时间：500ms
风暴工人人数：1
喷口的平行度提示（OpaqueTridentKafkaSpout）：1
螺栓/功能的平行度提示（NEO4JTridentFunction）：50
我们看到喷口的吞吐量约为12msgs/sec
在卡夫卡中生成的消息速率：150msgs/sec

Storm和Kafka都是单节点部署。我们已经从Storm中了解到更高的吞吐量，但无法生产相同的吞吐量。请建议如何调整Storm+Kafka+OpaqueTridentKafkaSpout配置以实现更高的吞吐量。这方面的任何帮助都将对我们大有裨益

谢谢，

请根据您的系统配置设置worker.childopts。使用SpoutConfig.fetchSizeBytes可增加拉入拓扑的字节数。增加并行度提示。

您应该将喷口并行度设置为与上述主题的分区计数相同。默认情况下，trident每次执行都接受一个批，您应该通过更改topology.max.spout.pending属性来增加此计数。由于Trident强制执行有序事务管理，所以您的执行方法（NEO4JTridentFunction）必须快速才能达到所需的解决方案

此外，您还可以使用“tridentConfig.fetchSizeBytes”，通过更改它，您可以为喷口中的每个新发出调用接收更多数据

另外，你们必须检查你们的垃圾收集日志，它会给你们关于真正意义的线索

您可以通过在worker配置的worker.childopts设置中添加

“-XX:+PrintGCDetails-XX:+PrintGCTimeStamps-verbose:gc-Xloggc:{path}/gc storm worker-%ID%.log”

”来启用垃圾收集日志

最后但并非最不重要的一点是，如果您的年轻一代比率高于正常情况，您可以使用G1GC。

我的计算：如果每个螺栓有8个核和500毫秒->~16条消息/秒。 如果您优化了螺栓，您将看到改进。

另外，对于CPU绑定的螺栓，请尝试并行提示='AmountofTotal cores' 并将topology.trident.batch.emit.interval.millis增加到处理整个批次所需的时间除以2。将topology.max.spout.pending设置为1

########### These MUST be filled in for a storm configuration
storm.zookeeper.servers:
     - "localhost"
#     - "server2"
# 
storm.zookeeper.port : 2999


storm.local.dir: "/opt/mphrx/neo4j/stormdatadir"

nimbus.childopts: "-Xms2048m"
ui.childopts: "-Xms1024m"
logviewer.childopts: "-Xmx512m"
supervisor.childopts: "-Xms1024m"
worker.childopts: "-Xms2600m -Xss256k -XX:MaxPermSize=128m -XX:PermSize=96m
    -XX:NewSize=1000m -XX:MaxNewSize=1000m -XX:MaxTenuringThreshold=1 -XX:SurvivorRatio=6
    -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
    -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
    -server -XX:+AggressiveOpts -XX:+UseCompressedOops -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true
    -Xloggc:logs/gc-worker-%ID%.log -verbose:gc
    -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m
    -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram
    -XX:+PrintTenuringDistribution -XX:-PrintGCApplicationStoppedTime -XX:-PrintGCApplicationConcurrentTime
    -XX:+PrintCommandLineFlags -XX:+PrintFlagsFinal"

java.library.path: "/usr/lib/jvm/jdk1.7.0_25"

supervisor.slots.ports:
    - 6700
    - 6701
    - 6702
    - 6703

topology.trident.batch.emit.interval.millis: 100
topology.message.timeout.secs: 300
#topology.max.spout.pending: 10000