Java Storm Word Count拓扑-执行次数的概念问题
下午好,我正在跟踪风暴。以下是Java文件供参考 这是主文件:Java Storm Word Count拓扑-执行次数的概念问题,java,apache-storm,word-count,Java,Apache Storm,Word Count,下午好,我正在跟踪风暴。以下是Java文件供参考 这是主文件: public class WordCountTopology { public static class SplitSentence extends ShellBolt implements IRichBolt { public SplitSentence() { super("python", "splitsentence.py"); } @Override public void declareOutputFields(
public class WordCountTopology {
public static class SplitSentence extends ShellBolt implements IRichBolt {
public SplitSentence() {
super("python", "splitsentence.py");
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
public static class WordCount extends BaseBasicBolt {
Map<String, Integer> counts = new HashMap<String, Integer>();
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if (count == null)
count = 0;
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new TextFileSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
conf.setDebug(true);
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
}
else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
Thread.sleep(10000);
cluster.shutdown();
}
}
}
这段代码运行时输出大量线程/发射。问题是,程序重复执行一个句子85次,而不是一次。我猜这是因为原始代码多次执行新的随机语句
是什么导致NextTuple被调用如此多次?您应该使用in-open方法移动文件初始化代码,否则每次调用NextTuple时,您的文件处理程序都将被初始化 编辑:
在open方法中,执行如下操作
br = new BufferedReader(new FileReader(csvFileToRead));
然后读取文件的逻辑应该在nextTuple方法中
while ((line = br.readLine()) != null) {
// your logic
}
你能和我共用你的壶嘴吗code@user2720864共享喷口代码。很抱歉,我已将文件初始化移到“打开”状态。生成的句子将被更正,文件中的所有单词用空格分隔。然而,nextTuple的调用次数是86倍,因此我的计数是它们应该的86倍。我想这会把我的问题缩小到如何只调用一次nextTuple。非常感谢您的时间。阅读逻辑应该在nextTuple方法中,更新我的答案谢谢您的回答。即使我删除了整个文件读取部分,只把句子变成一个单词,nextTuple也会被重复调用85次。在本例中,您知道Storm是如何决定下一次运行多少次的吗?也许我错过了某个配置。谢谢。我把代码简化为一句话。我浏览了代码,不知道下一步调用什么。我需要喷口运行一次,并返回wordOne:1和Word2:1,而不是85和85。Thank youStorm专为在数据可用时发出数据的流媒体源而设计。nextTuple()是在无限循环中调用的,因此对于您的情况,它需要跟踪它在数据源中的位置。如果希望至少处理一次,它还应该跟踪ack()和fail()调用。
while ((line = br.readLine()) != null) {
// your logic
}