Java 两个喷嘴一个防风雨螺栓
我想创建一个包含2个Kafkaspout和2个不同主题的拓扑,并基于bolt中的sourceComponent将这2个喷口合并到一个流中Java 两个喷嘴一个防风雨螺栓,java,apache-kafka,apache-storm,Java,Apache Kafka,Apache Storm,我想创建一个包含2个Kafkaspout和2个不同主题的拓扑,并基于bolt中的sourceComponent将这2个喷口合并到一个流中 public class Topology { private static final String topic1 = Real2 private static final String topic2 = Real1 public static void main(String[] args) throws AlreadyAliveException,
public class Topology {
private static final String topic1 = Real2
private static final String topic2 = Real1
public static void main(String[] args) throws AlreadyAliveException,
InvalidTopologyException, IOException {
BasicConfigurator.configure();
String zookeeper_root = "";
SpoutConfig kafkaConfig1 = new SpoutConfig(localhost:2181, topic1,
zookeeper_root, "Real1KafkaSpout");
SpoutConfig kafkaConfig2 = new SpoutConfig(localhost:2181, topic2,
zookeeper_root, "Real2KafkaSpout");
kafkaConfigRealTime.scheme = new SchemeAsMultiScheme(new StringScheme());
kafkaConfigRealTime.forceFromStart = true;
kafkaConfigHistorical.scheme = new SchemeAsMultiScheme(
new StringScheme());
kafkaConfigHistorical.forceFromStart = true;
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("Real1", new KafkaSpout(
kafkaConfig1), 2);
builder.setSpout("Real2", new KafkaSpout(
kafkaConfig2), 2);
builder.setBolt("StreamMerging", new StreamMergingBolt(), 2)
.setNumTasks(2).shuffleGrouping("Real1")
.shuffleGrouping("Real2");
Config config = new Config();
config.put("hdfs.config", yamlConf);
config.setDebug(false);
config.setMaxSpoutPending(10000);
if (args.length == 0) {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("Topology", config,
builder.createTopology());
cluster.killTopology("Topology");
cluster.shutdown();
} else {
StormSubmitter.submitTopology(args[0], config,
builder.createTopology());
}
try {
Thread.sleep(6000);
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
}
在螺栓执行方法中,我正在执行
public void execute(Tuple input, BasicOutputCollector collector) {
String id = input.getSourceComponent();
System.out.println("Stream Id in StreamMergingBolt is " + "---->" + id);
}
所以我想将来自每个流的元组存储到单独的文件中
也就是说,我想将Real1KafkaSpout的元组存储到file1,Real2KafkaSpout存储到file2。我怎么能做到这一点呢?我在这一点上被打动了。你可以这样做:
public void execute(Tuple input, BasicOutputCollector collector) {
String id = input.getSourceComponent();
if(id.equals("Real1")) {
// store into file1
} else {
assert (id.equals("Real2");
// store in file2
}
}
您将在Bolt.open(…)
中打开这两个文件
但是,我想知道为什么要使用单个拓扑来实现这一点。如果只将Kafka源1中的数据写入文件1,将Kafka源2中的数据写入文件2,则可以简单地创建两个拓扑。。。(当然,您只需对其进行一次编程,并且只需对这两种情况进行不同的配置即可)。当我通过以下代码执行此操作时,会产生有线结果
public void execute(Tuple input, BasicOutputCollector collector) {
String id = input.getSourceComponent();
if(id.equals("Real1")) {
String json = input.getString(0);
//writetoHDFS1(json)
} else {
assert (id.equals("Real2");
String json = input.getString(0);
//writetoHDFS2(json)
}
}
你是对的。。但对于我的拓扑结构,元组来自两个不同的源,需要在螺栓中处理。。。但是在处理之前,我需要分别存储这些元组,在处理之后,来自2个源的元组的处理是相同的。。数据是不同的,但数据的结构是相同的。你们所说的“有线结果”是什么意思?或者是打字错误,你的意思是“奇怪的结果”?您所说的“上述代码在大多数情况下都有效”是什么意思?什么时候不起作用?为什么不呢?@Matthias J.Sax但拓扑结构中有这么多失败的元组。。。如何丢弃失败的元组..请参见此处