Cassandra 卡桑德拉读取过程_Cassandra_Cassandra 3.0

Cassandra 卡桑德拉读取过程

cassandra

Cassandra 卡桑德拉读取过程,cassandra,cassandra-3.0,Cassandra,Cassandra 3.0,比如说，我有一个表，有4列。我在里面写了一些数据。如果我尝试读取数据，过程如下。我想了解一个特定的场景，其中（我试图读取的行的）所有列都存在于memtable中。是否会检查SSTables中此类行的数据？我认为，在这种情况下，没有必要检查SSTables，因为memtable中存在的数据显然是最新的副本。因此，与memtable没有行或只包含部分数据时相比，这种情况下的读取速度应该更快我创建了一个表（user_data），并输入了一些数据，从而创建了2个SSTables。在此之后，我插入了一个

比如说，我有一个表，有4列。我在里面写了一些数据。如果我尝试读取数据，过程如下。我想了解一个特定的场景，其中（我试图读取的行的）所有列都存在于memtable中。是否会检查SSTables中此类行的数据？我认为，在这种情况下，没有必要检查SSTables，因为memtable中存在的数据显然是最新的副本。因此，与memtable没有行或只包含部分数据时相比，这种情况下的读取速度应该更快

我创建了一个表（user_data），并输入了一些数据，从而创建了2个SSTables。在此之后，我插入了一个新行。我检查了数据目录并确保SSTable计数仍然是2。这意味着我输入的新数据位于Memtable中。我在cqlsh中设置了“跟踪打开”，然后选择了同一行。以下是输出：

Tracing session: de2e8ce0-cf1e-11e6-9318-a131a78ce29a

 activity                                                                                     | timestamp                  | source        | source_elapsed | client
----------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+---------------
                                                                           Execute CQL3 query | 2016-12-31 11:33:36.494000 | 172.16.129.67 |              0 | 172.16.129.67
 Parsing select address,age from user_data where name='Kishan'; [Native-Transport-Requests-1] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            182 | 172.16.129.67
                                            Preparing statement [Native-Transport-Requests-1] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            340 | 172.16.129.67
                                  Executing single-partition query on user_data [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            693 | 172.16.129.67
                                                   Acquiring sstable references [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            765 | 172.16.129.67
                                                      Merging memtable contents [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            821 | 172.16.129.67
                                         Read 1 live rows and 0 tombstone cells [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |           1028 | 172.16.129.67
                                                                             Request complete | 2016-12-31 11:33:36.495225 | 172.16.129.67 |           1225 | 172.16.129.67

我不明白这里“获取sstable引用”的含义。由于完整的数据位于Memtable中，因此，据我所知，没有必要检查SSTables。那么，这些参考文献到底是为了什么

memtable中存在（我试图读取的行的）所有列。是否会检查SSTables中此类行的数据

在这种特殊情况下，它还将沿memtable视差检查sstable数据。

它只会转到该列的sstable（实际上首先在行缓存中，然后是bloom filter，然后是sstable），该列不在memtable中
编辑：
要了解更多关于读取过程是如何工作的，让我们深入了解cassandra源代码。让我们从跟踪日志开始，逐行浏览步骤：
让我们从这里开始：

对用户数据执行单分区查询[ReadStage-2]
您的select查询是一个很明显的单分区行查询。Cassandra只需要从单个分区读取数据。让我们跳到相应的方法和java文档，这里是自我解释的：

/** * Queries both memtable and sstables to fetch the result of this query. * <p> * Please note that this method: * 1) does not check the row cache. * 2) does not apply the query limit, nor the row filter (and so ignore 2ndary indexes). * Those are applied in {@link ReadCommand#executeLocally}. * 3) does not record some of the read metrics (latency, scanned cells histograms) nor * throws TombstoneOverwhelmingException. * It is publicly exposed because there is a few places where that is exactly what we want, * but it should be used only where you know you don't need thoses things. * <p> * Also note that one must have created a {@code ReadExecutionController} on the queried table and we require it as * a parameter to enforce that fact, even though it's not explicitlly used by the method. */ public UnfilteredRowIterator queryMemtableAndDisk(ColumnFamilyStore cfs, ReadExecutionController executionController) { assert executionController != null && executionController.validForReadOn(cfs); Tracing.trace("Executing single-partition query on {}", cfs.name); return queryMemtableAndDiskInternal(cfs); }

在这里，我们从以下评论中找到了答案：
我们有两种主要策略： 1）我们同时查询memtables和sstables。这是我们最通用的策略，也是我们使用的策略。…。
Cassandra同时查询memtables和sstables
之后，如果我们跳转到
queryMetableAndssTablesIntimestampOrder
方法，我们会发现：

/** * Do a read by querying the memtable(s) first, and then each relevant sstables sequentially by order of the sstable * max timestamp. * * This is used for names query in the hope of only having to query the 1 or 2 most recent query and then knowing nothing * more recent could be in the older sstables (which we can only guarantee if we know exactly which row we queries, and if * no collection or counters are included). * This method assumes the filter is a {@code ClusteringIndexNamesFilter}. */ private UnfilteredRowIterator queryMemtableAndSSTablesInTimestampOrder(ColumnFamilyStore cfs, ClusteringIndexNamesFilter filter) { Tracing.trace("Acquiring sstable references"); ColumnFamilyStore.ViewFragment view = cfs.select(View.select(SSTableSet.LIVE, partitionKey())); ImmutableBTreePartition result = null; Tracing.trace("Merging memtable contents"); .... // then it also looks into sstable on timestamp order.

在上面的部分中，我们已经找到了最后两个跟踪日志：

获取sstable引用[ReadStage-2]

合并memtable内容[ReadStage-2]

希望这有帮助
相关链接：
为了让您的听力和学习愉快，这里有泰勒·霍布斯（Tyler Hobbs）所解释的阅读路径“获取memtable内容”或应该是“获取sstable引用”？是的，您是对的。这是一个错误。你能用你提到的4列添加表定义吗？我尤其对分区键和集群列感兴趣。还有集群和RF中的节点数，以及最终用于此查询的一致性？实际上，这只是一个用于说明的随机表。共有4列，全部为文本，并且使用了一个简单的分区键（一列，没有集群键）。您能解释一下跟踪输出中的“获取sstable引用”（在编辑的问题中提到）吗？如果它真的命中sstable，那么应该有这样一个日志：
合并来自memtables和0 sstables的数据
。不过我不确定。稍后我将返回u了解更多详细信息。此外，我检查了两种情况下的读取吞吐量，即情况1：正在获取的完整行位于memtable中，情况2：Memtables没有正在查询的行。我发现两种情况下的读取吞吐量几乎没有差别。我认为Cassandra在从Memtables获取数据时，甚至不知道它是否获得了整行或部分行。它将始终检查bloom过滤器（行缓存包含被请求的行的情况除外），因此，您现在是否与前面所说的相矛盾？你是说在这种情况下，是的，它将检查SSTables？另外，我对卡桑德拉所遵循的阅读路径感到困惑。我曾认为，每当它为读取请求提供服务时，它都会遵循一条固定的、明确的路径。但在阅读了你的答案后，我觉得卡桑德拉的阅读路径可能会因查询的不同而有所不同。是的，似乎是这样。他们已经优化了许多关于查询的内容。根据服务器端（启用/禁用行缓存）和客户端配置，不同的查询遵循不同的读取路径。
/** * Do a read by querying the memtable(s) first, and then each relevant sstables sequentially by order of the sstable * max timestamp. * * This is used for names query in the hope of only having to query the 1 or 2 most recent query and then knowing nothing * more recent could be in the older sstables (which we can only guarantee if we know exactly which row we queries, and if * no collection or counters are included). * This method assumes the filter is a {@code ClusteringIndexNamesFilter}. */ private UnfilteredRowIterator queryMemtableAndSSTablesInTimestampOrder(ColumnFamilyStore cfs, ClusteringIndexNamesFilter filter) { Tracing.trace("Acquiring sstable references"); ColumnFamilyStore.ViewFragment view = cfs.select(View.select(SSTableSet.LIVE, partitionKey())); ImmutableBTreePartition result = null; Tracing.trace("Merging memtable contents"); .... // then it also looks into sstable on timestamp order.