Apache storm 如何在计数器更新之前获取以前的状态

Apache storm 如何在计数器更新之前获取以前的状态,apache-storm,trident,Apache Storm,Trident,例如,我有一批大小为第5批的元组,其中包含用户的印象: Batch 1: [UUID1, clientId1] [UUID2, clientId1] [UUID2, clientId1] [UUID2, clientId1] [UUID3, clientId2] Batch 2: [UUID4, clientId1] [UUID5, clientId1] [UUID5, clientId1] [UUID6, clientId2] [UUID6, clientId2] 这是我保存计数状态的示例

例如,我有一批大小为第5批的元组,其中包含用户的印象:

Batch 1:
[UUID1, clientId1]
[UUID2, clientId1]
[UUID2, clientId1]
[UUID2, clientId1]
[UUID3, clientId2]

Batch 2:
[UUID4, clientId1]
[UUID5, clientId1]
[UUID5, clientId1]
[UUID6, clientId2]
[UUID6, clientId2]
这是我保存计数状态的示例:

TridentState ClientState = impressionStream
    .groupBy(new Fields("clientId"))
    .persistentAggregate(getCassandraStateFactory("users", "DataComputation",
        "UserImpressionCounter"), new Count(), new Fields("count));

Stream ClientStream = ClientState.newValuesStream();
我有清晰的数据库并运行我的拓扑。按clientId对流进行分组后,我使用persistentAggregate函数和Count aggregator保存状态。 对于第一批,是newValuesStream方法之后的结果:
[client1,4]
[client2,1]
。 对于第二批:
[client1,7]
[client2,3]
,如预期

ClientStream在两个分支和一个分支中使用 在这些分支中,我需要处理元组,以便拥有大小为1的批处理,因为我需要关于每个分支的计数的信息 元组。 大小为1的批处理显然是垃圾,所以我必须在更新计数器并发出之前找出计数器的先前状态 元组中的此信息已更新计数器,例如,对于第二批
[client1,7,4]


有人知道怎么做吗?

我通过添加新的聚合器并加入持久聚合解决了这个问题:

TridentState ClientState = impressionStream
    .groupBy(new Fields("clientId"))
    .persistentAggregate(getCassandraStateFactory("users", "DataComputation",
        "UserImpressionCounter"), new Count(), new Fields("count));

Stream ClientBatchAggregationStream = impressionStream
    .groupBy(new Fields("clientId"))
    .aggregate(new SumCountAggregator(), new Fields("batchCount"));

Stream GroupingPeriodCounterStateStream = topology
    .join(ClientState.newValuesStream(), new Fields("clientId"),
        ClientBatchAggregationStream, new Fields("clientId"), 
        new Fields("clientId", "count", "batchCount"));
聚合器:

public class SumCountAggregator extends BaseAggregator<SumCountAggregator.CountState> {

    static class CountState {
        long count = 0;
    }

    @Override
    public CountState init(Object batchId, TridentCollector collector) {
        return new CountState();
    }

    @Override
    public void aggregate(CountState state, TridentTuple tuple, TridentCollector collector)            {
        state.count += 1;
    }

    @Override
    public void complete(CountState state, TridentCollector collector) {
        collector.emit(new Values(state.count));
    }

}
公共类SumCountAggregator扩展了BaseAggregator{
静态类计数状态{
长计数=0;
}
@凌驾
public CountState init(对象batchId,TridentCollector){
返回新CountState();
}
@凌驾
公共void聚合(CountState、TridentTuple、TridentCollector){
state.count+=1;
}
@凌驾
公共无效完成(CountState、TridentCollector){
emit(新值(state.count));
}
}