理解cassandra存储的内部数据_Cassandra_Cql_Cql3_Cassandra Cli

理解cassandra存储的内部数据

cassandra

理解cassandra存储的内部数据,cassandra,cql,cql3,cassandra-cli,Cassandra,Cql,Cql3,Cassandra Cli,我有这张桌子 create table comment_by_post ( postId uuid, userId uuid, cmntId timeuuid, cmntTxt text, cmntBy text, time bigint, primary key ((postId, userId),cmntId) ) RowKey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f

我有这张桌子

create table comment_by_post
(
    postId uuid,
    userId uuid,
    cmntId timeuuid,
    cmntTxt text,   
    cmntBy text,
    time bigint, 
    primary key ((postId, userId),cmntId)
)

RowKey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f96-12e5-a6c0-1697f92e537a
=> (name=d3f02a30-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270721107000)
=> (name=d3f02a30-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e743434, timestamp=1434270721107000)
-------------------
RowKey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f96-12e5-a6c0-1697f92eec7a
=> (name=465fee30-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270483603000)
=> (name=465fee30-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7432, timestamp=1434270483603000)
=> (name=4ba89f40-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270492468000)
=> (name=4ba89f40-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7431, timestamp=1434270492468000)
=> (name=504a61f0-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270500239000)
=> (name=504a61f0-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7433, timestamp=1434270500239000)
-------------------
RowKey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f96-12e5-a6c0-1697f92e237a
=> (name=cd1e8f30-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270709667000)
=> (name=cd1e8f30-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7433, timestamp=1434270709667000)

这是这个表中的内部数据

create table comment_by_post
(
    postId uuid,
    userId uuid,
    cmntId timeuuid,
    cmntTxt text,   
    cmntBy text,
    time bigint, 
    primary key ((postId, userId),cmntId)
)

RowKey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f96-12e5-a6c0-1697f92e537a
=> (name=d3f02a30-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270721107000)
=> (name=d3f02a30-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e743434, timestamp=1434270721107000)
-------------------
RowKey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f96-12e5-a6c0-1697f92eec7a
=> (name=465fee30-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270483603000)
=> (name=465fee30-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7432, timestamp=1434270483603000)
=> (name=4ba89f40-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270492468000)
=> (name=4ba89f40-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7431, timestamp=1434270492468000)
=> (name=504a61f0-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270500239000)
=> (name=504a61f0-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7433, timestamp=1434270500239000)
-------------------
RowKey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f96-12e5-a6c0-1697f92e237a
=> (name=cd1e8f30-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270709667000)
=> (name=cd1e8f30-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7433, timestamp=1434270709667000)

如果我使用

主键（postId、userId、cmntId）

，那么它类似于：

RowKey: 4978f728-0f96-11e5-a6c0-1697f925ec7b
=> (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:971da150-1260-11e5-879b-e700f669bcfc:, value=, timestamp=1434264176613000)

=> (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:971da150-1260-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7431, timestamp=1434264176613000)

=> (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:a0d4a900-1260-11e5-879b-e700f669bcfc:, value=, timestamp=1434264192912000)

=> (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:a0d4a900-1260-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7432, timestamp=1434264192912000)

=> (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:a5d94c30-1260-11e5-879b-e700f669bcfc:, value=, timestamp=1434264201331000)

为什么会这样？两者的好处是什么？

第一个主键使用

postId

和

userId

作为分区键，使用

cmntId

作为集群列。注意

行键所用的值包含postId
和userId
中的值，两者之间用：
分隔。接下来，将在行中每个单元格的名称中使用clustering列的值
在第二个示例中，主键缺少分区键周围的括号。它们可以省略，但通常更倾向于出现，因为我们可以明确地确定主键的哪些部分用于分区和集群。当不包括额外的括号时，仅第一列用作分区键（在cassandra cli的行键
值中可见）。所有后续列都假定为集群列，我们可以通过查看单元格名称来验证这些列。
Christopher已经解释了如何将分区键连接在一起以生成用于存储的行键，因此我不会对此进行重新哈希（没有双关语）。但我将解释这两种方法的优缺点
PRIMARY KEY (postId, userId,cmntId)

使用此主键，您的数据将按postId
进行分区，并按userId
和cmntId
进行聚类。这意味着，在帖子上发表的所有评论将通过postId
一起存储在磁盘上，然后分别按照userId
和cmntId
进行排序
这里的优点是，您具有查询灵活性。您可以查询某篇文章的所有评论，也可以查询特定用户对某篇文章的所有评论
缺点是，与其他解决方案相比，您有更高的无限行增长机会。如果每个postId
的总列数超过20亿，那么每个postId
最多可以存储多少数据。但是你在每篇文章中存储那么多评论数据的几率很低，所以你应该没问题
PRIMARY KEY ((postId, userId),cmntId)

此解决方案通过连接postId
和userId
（按cmntId
排序）的行键将注释数据存储在一起，有助于消除行无限增长的可能性。这是与其他解决方案相比的优势
缺点是失去了查询的灵活性，因为现在您需要为每个查询提供postId
和userId
。此主键定义不支持仅使用postId
的注释查询，因为Cassandra CQL要求您为查询提供整个分区键。
是否有parti这是您想要回答的一个问题，还是您对主键的构造如何与底层存储交互感到好奇？有关更多信息，请查看这篇博客文章，它展示了CQL3如何映射到Cassandra内部数据结构。@ChristopherBradford这是一篇很棒的文章。John Berryman在解释了解所有这些；特别是集群键如何“在引擎盖下”工作。我在这里回答了一个类似的问题：两者的优点和缺点是什么？可能值得注意的是，使用主键（（postId，userId），cmntId）
表单帖子的注释将被放置在集群中的多个节点上，而不仅仅是使用不太具体的分区键主键（postId，usedId，cmndId）
。我对主键（postId，userId，cmndId）的理解是什么
列计数可能会快速增长，但查询会很容易。在主键（（postId，userId），cmntId）的情况下，
列计数会得到控制，但从查询的角度来看，它不会那么容易。对吗？