cassandra中的instagram时间线数据模型_Cassandra_Schema_Data Modeling_Nosql

cassandra中的instagram时间线数据模型

cassandra nosql

cassandra中的instagram时间线数据模型,cassandra,schema,data-modeling,nosql,Cassandra,Schema,Data Modeling,Nosql,我想要像instagram一样的设计时间线（主页），但最常见的是使用以下模式： -- Users user is following CREATE TABLE following ( username text, followed text, PRIMARY KEY(username, followed) ); -- Users who follow user CREATE TABLE followers ( username text, f

我想要像instagram一样的设计时间线（主页），但最常见的是使用以下模式：

     -- Users user is following
CREATE TABLE following (
    username text,
    followed text,
    PRIMARY KEY(username, followed)
);

-- Users who follow user
CREATE TABLE followers (
    username  text,
    following text,
    PRIMARY KEY(username, following)
);

-- Materialized view of tweets created by user
CREATE TABLE userline (
    tweetid  timeuuid,
    username text,
    body     text,
    PRIMARY KEY(username, tweetid)
);

-- Materialized view of tweets created by user, and users she follows
CREATE TABLE timeline (
    username  text,
    tweetid   timeuuid,
    posted_by text,
    body      text,
    PRIMARY KEY(username, tweetid)
);

在这个设计中，每插入一篇新帖子，每个关注者都会在时间线上插入一条新记录。若一个用户有10k追随者和1000个用户使用应用程序，程序失败，有更好的方法吗

// Insert the tweet into follower timelines
    for (String follower : getFollowers(username)) {
        execute("INSERT INTO timeline (username, tweetid, posted_by, body) VALUES ('%s', %s, '%s', '%s')",
                follower,
                id.toString(),
                username,
                body);

我想，这两个解决方案/建议中的一个可能会有所帮助：

1） -第一个建议，例如，以1000条inserts语句的批处理模式插入时间线

execute("
  BEGIN BATCH
    INSERT INTO timeline (username, tweetid, posted_by, body) VALUES ('%s', %s, '%s', '%s')", follower, id.toString(), username, body);
    INSERT INTO timeline (username, tweetid, posted_by, body) VALUES ('%s', %s, '%s', '%s')", follower, id.toString(), username, body);
    INSERT INTO timeline (username, tweetid, posted_by, body) VALUES ('%s', %s, '%s', '%s')", follower, id.toString(), username, body);
    ...
    // n statements
APPLY BATCH");

批处理多个语句可以保存客户端/服务器和服务器协调器/副本之间的网络交换
还有一件事，批处理在默认情况下是原子的（在Cassandra 1.2和更高版本中）。在Cassandra批处理操作的上下文中，atomic意味着如果任何批处理成功，则所有批处理都将成功，否则将没有

2） -第二个建议，在异步模式下实现插入时间线（前端有成功回调函数）：

当然，也许你可以将两者结合起来。

与Eric Evans（Debian开发者、Apache Cassandra提交人和Rackspace Cloud的系统架构师）就Twissandra中的数据建模问题交换意见：。