Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/cassandra/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Cassandra 卡桑德拉读取过程_Cassandra_Cassandra 3.0 - Fatal编程技术网

Cassandra 卡桑德拉读取过程

Cassandra 卡桑德拉读取过程,cassandra,cassandra-3.0,Cassandra,Cassandra 3.0,比如说,我有一个表,有4列。我在里面写了一些数据。如果我尝试读取数据,过程如下。我想了解一个特定的场景,其中(我试图读取的行的)所有列都存在于memtable中。是否会检查SSTables中此类行的数据?我认为,在这种情况下,没有必要检查SSTables,因为memtable中存在的数据显然是最新的副本。因此,与memtable没有行或只包含部分数据时相比,这种情况下的读取速度应该更快 我创建了一个表(user_data),并输入了一些数据,从而创建了2个SSTables。在此之后,我插入了一个

比如说,我有一个表,有4列。我在里面写了一些数据。如果我尝试读取数据,过程如下。我想了解一个特定的场景,其中(我试图读取的行的)所有列都存在于memtable中。是否会检查SSTables中此类行的数据?我认为,在这种情况下,没有必要检查SSTables,因为memtable中存在的数据显然是最新的副本。因此,与memtable没有行或只包含部分数据时相比,这种情况下的读取速度应该更快

我创建了一个表(user_data),并输入了一些数据,从而创建了2个SSTables。在此之后,我插入了一个新行。我检查了数据目录并确保SSTable计数仍然是2。这意味着我输入的新数据位于Memtable中。我在cqlsh中设置了“跟踪打开”,然后选择了同一行。以下是输出:

Tracing session: de2e8ce0-cf1e-11e6-9318-a131a78ce29a

 activity                                                                                     | timestamp                  | source        | source_elapsed | client
----------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+---------------
                                                                           Execute CQL3 query | 2016-12-31 11:33:36.494000 | 172.16.129.67 |              0 | 172.16.129.67
 Parsing select address,age from user_data where name='Kishan'; [Native-Transport-Requests-1] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            182 | 172.16.129.67
                                            Preparing statement [Native-Transport-Requests-1] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            340 | 172.16.129.67
                                  Executing single-partition query on user_data [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            693 | 172.16.129.67
                                                   Acquiring sstable references [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            765 | 172.16.129.67
                                                      Merging memtable contents [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            821 | 172.16.129.67
                                         Read 1 live rows and 0 tombstone cells [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |           1028 | 172.16.129.67
                                                                             Request complete | 2016-12-31 11:33:36.495225 | 172.16.129.67 |           1225 | 172.16.129.67
我不明白这里“获取sstable引用”的含义。由于完整的数据位于Memtable中,因此,据我所知,没有必要检查SSTables。那么,这些参考文献到底是为了什么

memtable中存在(我试图读取的行的)所有列。是否会检查SSTables中此类行的数据

在这种特殊情况下,它还将沿memtable视差检查sstable数据。

它只会转到该列的sstable(实际上首先在行缓存中,然后是bloom filter,然后是sstable),该列不在memtable中

编辑:

要了解更多关于读取过程是如何工作的,让我们深入了解cassandra源代码。让我们从跟踪日志开始,逐行浏览步骤:

让我们从这里开始:

对用户数据执行单分区查询[ReadStage-2]

您的select查询是一个很明显的单分区行查询。Cassandra只需要从单个分区读取数据。让我们跳到相应的方法和java文档,这里是自我解释的:

/**
 * Queries both memtable and sstables to fetch the result of this query.
 * <p>
 * Please note that this method:
 *   1) does not check the row cache.
 *   2) does not apply the query limit, nor the row filter (and so ignore 2ndary indexes).
 *      Those are applied in {@link ReadCommand#executeLocally}.
 *   3) does not record some of the read metrics (latency, scanned cells histograms) nor
 *      throws TombstoneOverwhelmingException.
 * It is publicly exposed because there is a few places where that is exactly what we want,
 * but it should be used only where you know you don't need thoses things.
 * <p>
 * Also note that one must have created a {@code ReadExecutionController} on the queried table and we require it as
 * a parameter to enforce that fact, even though it's not explicitlly used by the method.
 */
public UnfilteredRowIterator queryMemtableAndDisk(ColumnFamilyStore cfs, ReadExecutionController executionController)
{
    assert executionController != null && executionController.validForReadOn(cfs);
    Tracing.trace("Executing single-partition query on {}", cfs.name);

    return queryMemtableAndDiskInternal(cfs);
}

在这里,我们从以下评论中找到了答案:

我们有两种主要策略:
1) 我们同时查询memtables和sstables。这是我们最通用的策略,也是我们使用的策略。…。

Cassandra同时查询memtables和sstables

之后,如果我们跳转到
queryMetableAndssTablesIntimestampOrder
方法,我们会发现:

/**
 * Do a read by querying the memtable(s) first, and then each relevant sstables sequentially by order of the sstable
 * max timestamp.
 *
 * This is used for names query in the hope of only having to query the 1 or 2 most recent query and then knowing nothing
 * more recent could be in the older sstables (which we can only guarantee if we know exactly which row we queries, and if
 * no collection or counters are included).
 * This method assumes the filter is a {@code ClusteringIndexNamesFilter}.
 */
private UnfilteredRowIterator queryMemtableAndSSTablesInTimestampOrder(ColumnFamilyStore cfs, ClusteringIndexNamesFilter filter)
{
    Tracing.trace("Acquiring sstable references");
    ColumnFamilyStore.ViewFragment view = cfs.select(View.select(SSTableSet.LIVE, partitionKey()));

    ImmutableBTreePartition result = null;

    Tracing.trace("Merging memtable contents");
    .... // then it also looks into sstable on timestamp order.

在上面的部分中,我们已经找到了最后两个跟踪日志:

获取sstable引用[ReadStage-2]

合并memtable内容[ReadStage-2]


希望这有帮助

相关链接:

为了让您的听力和学习愉快,这里有泰勒·霍布斯(Tyler Hobbs)所解释的阅读路径“获取memtable内容”或应该是“获取sstable引用”?是的,您是对的。这是一个错误。你能用你提到的4列添加表定义吗?我尤其对分区键和集群列感兴趣。还有集群和RF中的节点数,以及最终用于此查询的一致性?实际上,这只是一个用于说明的随机表。共有4列,全部为文本,并且使用了一个简单的分区键(一列,没有集群键)。您能解释一下跟踪输出中的“获取sstable引用”(在编辑的问题中提到)吗?如果它真的命中sstable,那么应该有这样一个日志:
合并来自memtables和0 sstables的数据
。不过我不确定。稍后我将返回u了解更多详细信息。此外,我检查了两种情况下的读取吞吐量,即情况1:正在获取的完整行位于memtable中,情况2:Memtables没有正在查询的行。我发现两种情况下的读取吞吐量几乎没有差别。我认为Cassandra在从Memtables获取数据时,甚至不知道它是否获得了整行或部分行。它将始终检查bloom过滤器(行缓存包含被请求的行的情况除外),因此,您现在是否与前面所说的相矛盾?你是说在这种情况下,是的,它将检查SSTables?另外,我对卡桑德拉所遵循的阅读路径感到困惑。我曾认为,每当它为读取请求提供服务时,它都会遵循一条固定的、明确的路径。但在阅读了你的答案后,我觉得卡桑德拉的阅读路径可能会因查询的不同而有所不同。是的,似乎是这样。他们已经优化了许多关于查询的内容。根据服务器端(启用/禁用行缓存)和客户端配置,不同的查询遵循不同的读取路径。
/**
 * Do a read by querying the memtable(s) first, and then each relevant sstables sequentially by order of the sstable
 * max timestamp.
 *
 * This is used for names query in the hope of only having to query the 1 or 2 most recent query and then knowing nothing
 * more recent could be in the older sstables (which we can only guarantee if we know exactly which row we queries, and if
 * no collection or counters are included).
 * This method assumes the filter is a {@code ClusteringIndexNamesFilter}.
 */
private UnfilteredRowIterator queryMemtableAndSSTablesInTimestampOrder(ColumnFamilyStore cfs, ClusteringIndexNamesFilter filter)
{
    Tracing.trace("Acquiring sstable references");
    ColumnFamilyStore.ViewFragment view = cfs.select(View.select(SSTableSet.LIVE, partitionKey()));

    ImmutableBTreePartition result = null;

    Tracing.trace("Merging memtable contents");
    .... // then it also looks into sstable on timestamp order.