Asynchronous Cassandra异步读写，最佳实践_Asynchronous_Cassandra_Datastax Enterprise_Datastax Java Driver

Asynchronous Cassandra异步读写，最佳实践

asynchronous cassandra

Asynchronous Cassandra异步读写，最佳实践,asynchronous,cassandra,datastax-enterprise,datastax-java-driver,Asynchronous,Cassandra,Datastax Enterprise,Datastax Java Driver,要设置上下文，我们在cassandra中有4个表，其中一个是数据表，其余的是搜索表（让我们汇总数据，SEARCH1、SEARCH2和SEARCH3就是这些表）对于数据表，我们有一个初始加载要求，一个req中最多有15k行，因此搜索表需要保持同步。为了保持一致性，我们以批量插入的方式将每个bacth作为4个查询（每个表一个查询）但对于每一批，我们都需要读取数据。如果存在，只需更新数据表的LastUpdateDate列，否则插入所有4个表下面是我们的代码片段： public List<

要设置上下文，我们在cassandra中有4个表，其中一个是数据表，其余的是搜索表（让我们汇总数据，SEARCH1、SEARCH2和SEARCH3就是这些表）

对于数据表，我们有一个初始加载要求，一个req中最多有15k行，因此搜索表需要保持同步。为了保持一致性，我们以批量插入的方式将每个bacth作为4个查询（每个表一个查询）

但对于每一批，我们都需要读取数据。如果存在，只需更新数据表的LastUpdateDate列，否则插入所有4个表

下面是我们的代码片段：

public List<Items> loadData(List<Items> items) {
    CountDownLatch latch = new CountDownLatch(items.size());
    ForkJoinPool pool = new ForkJoinPool(6);
    pool.submit(() -> items.parallelStream().forEach(item -> {
      BatchStatement batch = prepareBatchForCreateOrUpdate(item);
      batch.setConsistencyLevel(ConsistencyLevel.LOCAL_ONE);
      ResultSetFuture future = getSession().executeAsync(batch);
      Futures.addCallback(future, new AsyncCallBack(latch), pool);
    }));

    try {
      latch.await();
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
    }

    //TODO Consider what to do with the failed Items, Retry? or remove from the items in the return type
    return items;
}

private BatchStatement prepareBatchForCreateOrUpdate(Item item) {
    BatchStatement batch = new BatchStatement();
    Item existingItem = getExisting(item) //synchronous read
    if (null != data) {
      existingItem.setLastUpdatedDateTime(new Timestamp(System.currentTimeMillis()));
      batch.add(existingItem));
      return batch;
    }

    batch.add(item);
    batch.add(convertItemToSearch1(item));
    batch.add(convertItemToSearch2(item));
    batch.add(convertItemToSearch3(item));

    return batch;
  }

class AsyncCallBack implements FutureCallback<ResultSet> {
    private CountDownLatch latch;

    AsyncCallBack(CountDownLatch latch) {
      this.latch = latch;
    }

    // Cooldown the latch for either success or failure so that the thread that is waiting on latch.await() will know when all the asyncs are completed.
    @Override
    public void onSuccess(ResultSet result) {
      latch.countDown();
    }

    @Override
    public void onFailure(Throwable t) {
      LOGGER.warn("Failed async query execution, Cause:{}:{}", t.getCause(), t.getMessage());
      latch.countDown();
    }
  }

公共列表加载数据（列表项）{
CountDownLatch latch=新的CountDownLatch（items.size（））；
ForkJoinPool池=新的ForkJoinPool池（6）；
pool.submit（（）->items.parallelStream（）.forEach（item->{
BatchStatement batch=prepareBatchForCreateOrUpdate（项目）；
batch.SetConsistenceLevel（ConsistenceLevel.LOCAL_ONE）；
ResultSetFuture=getSession（）.executeAsync（批处理）；
addCallback（future，new-AsyncCallBack（闩锁），pool）；
}));
试一试{
satch.wait（）；
}捕捉（中断异常e）{
Thread.currentThread（）.interrupt（）；
}
//toDo考虑如何处理失败的项目、重试或从返回类型中的项移除
退货项目；
}
专用批处理语句prepareBatchForCreateOrUpdate（项）{
BatchStatement batch=新的BatchStatement（）；
Item existingItem=getExisting（Item）//同步读取
if（null！=数据）{
existingItem.SetLastUpdateDateTime（新的时间戳（System.currentTimeMillis（））；
批量添加（现有项））；
退货批次；
}
批次。添加（项目）；
batch.add（convertItemToSearch1（项））；
批量添加（convertItemToSearch2（项目））；
批量添加（convertItemToSearch3（项目））；
退货批次；
}
类AsyncCallBack实现FutureCallback{
私人倒计时闩锁；
异步回调（倒计时闩锁闩锁）{
this.latch=闩锁；
}
//冷却闩锁的成功或失败，以便等待闩锁的线程知道所有异步何时完成。
@凌驾
成功时公共无效（结果集结果）{
倒计时（）；
}
@凌驾
失效时的公共无效（可丢弃的t）{
warn（“异步查询执行失败，原因：{}:{}”，t.getCause（），t.getMessage（））；
倒计时（）；
}
}

考虑到网络往返b/w应用程序和cassandra群集（两者都位于相同的DNS上，但kubernetes上的POD不同），执行15k个项目大约需要1.5到2分钟

我们甚至可以让读取调用getExisting（item）也异步，但是处理失败案例变得越来越复杂。

cassandra是否有更好的数据加载方法（仅考虑通过datastax企业java驱动程序的异步wites）。

首先，cassandra中的批处理与关系数据库中的批处理不同。通过使用它们，您可以增加集群的负载

关于使所有内容都异步，我考虑了以下可能性：

对数据库进行查询，获取一个

未来

并向其添加侦听器-该侦听器将在查询完成时执行（覆盖

onSuccess

）

通过该方法，您可以根据从Cassandra获得的结果安排下一个操作的执行

您需要确保检查的一件事是，您不会在同一时间发出太多的同时请求。在协议版本3中，每个连接最多可以有32k个飞行中请求，但在您的情况下，最多可以发出60k（4x15k）请求。我用它来限制飞行中请求的数量。

谢谢Alex的回答，是的，cassandra中的批是另一回事，这就是为什么我们对15k记录使用单独的批，每个批带有一个查询来更新每个表以保持数据同步。我喜欢您的会话限制示例。：）我意识到这个评论太晚了——不过，假设您的插入是幂等的，我将执行单独的异步写入，并保持QUORUM/ALL一致性，并在失败时重试。就数据一致性而言，这将为您提供可接受的容错程度。是。幂等查询+重试是有用的。另一方面，SessionLimiter代码很好——我们必须处理Cassandra集群中限制飞行中消息的问题，我们通过批处理（原语，但有效）@user1694845解决了这一问题。批处理的主要问题是，如果它们不在同一分区中，它们的速度非常慢。SessionLimiter的一个问题是它不是最优的——它不知道每个连接的限制，但知道每个集群的限制。Java驱动程序4.x内置了节流功能。。。