Nosql 如何构造DynamoDB数据库以允许查询趋势帖子？_Nosql_Amazon Dynamodb

Nosql 如何构造DynamoDB数据库以允许查询趋势帖子？

nosql amazon-dynamodb

Nosql 如何构造DynamoDB数据库以允许查询趋势帖子？,nosql,amazon-dynamodb,Nosql,Amazon Dynamodb,我计划使用以下公式计算“趋势”帖子： p=来自用户的投票（点数）。 t=自提交以来的时间，单位为小时我正在寻找关于如何构造数据库表的建议，以便可以使用DynamoDB（来自Amazon的nosql数据库服务）查询趋势帖子 DynamoDB要求表中的每个项都有一个主键。主键可以由两部分组成：散列属性（字符串或数字）和范围属性（字符串或数字）。哈希属性对于每个项都必须是唯一的，并且是必需的。Range属性是可选的，但如果使用，DynamoDB将在Range属性上构建排序范围索引我心目中的结构如下

我计划使用以下公式计算“趋势”帖子：

p=来自用户的投票（点数）。 t=自提交以来的时间，单位为小时

我正在寻找关于如何构造数据库表的建议，以便可以使用DynamoDB（来自Amazon的nosql数据库服务）查询趋势帖子

DynamoDB要求表中的每个项都有一个主键。主键可以由两部分组成：散列属性（字符串或数字）和范围属性（字符串或数字）。哈希属性对于每个项都必须是唯一的，并且是必需的。Range属性是可选的，但如果使用，DynamoDB将在Range属性上构建排序范围索引

我心目中的结构如下：

TableName:Users

HashAttribute:  user_id
RangeAttribute: NONE
OtherFields: first_name, last_name

HashAttribute:  post_id
RangeAttribute: NONE
OtherFields: user_id,title, content, points, categories[ ]

HashAttribute:  category_name
RangeAttribute: post_id
OtherFields: title, content, points

HashAttribute:  counter_name
RangeAttribute: NONE
OtherFields: counter_value

TableName:Posts

HashAttribute:  user_id
RangeAttribute: NONE
OtherFields: first_name, last_name

HashAttribute:  post_id
RangeAttribute: NONE
OtherFields: user_id,title, content, points, categories[ ]

HashAttribute:  category_name
RangeAttribute: post_id
OtherFields: title, content, points

HashAttribute:  counter_name
RangeAttribute: NONE
OtherFields: counter_value

表名：类别

HashAttribute:  user_id
RangeAttribute: NONE
OtherFields: first_name, last_name

HashAttribute:  post_id
RangeAttribute: NONE
OtherFields: user_id,title, content, points, categories[ ]

HashAttribute:  category_name
RangeAttribute: post_id
OtherFields: title, content, points

HashAttribute:  counter_name
RangeAttribute: NONE
OtherFields: counter_value

表名：计数器

HashAttribute:  user_id
RangeAttribute: NONE
OtherFields: first_name, last_name

HashAttribute:  post_id
RangeAttribute: NONE
OtherFields: user_id,title, content, points, categories[ ]

HashAttribute:  category_name
RangeAttribute: post_id
OtherFields: title, content, points

HashAttribute:  counter_name
RangeAttribute: NONE
OtherFields: counter_value

下面是我将使用下表设置进行的请求类型示例（示例：user_id=100）：

用户操作1:

用户创建一个新帖子，并为帖子添加两个类别（棒球、足球）的标签

查询（1）：

检查计数器\u name='post\u id'和increment+1的当前值，并使用新的post\u id

查询（2）：在Posts表中插入以下内容：

post_id=value_from_query_1, user_id=100, title=user_generated, content=user_generated, points=0, categories=['baseball','soccer']

category_name='baseball', post_id=value_from_query_1, title=user_generated, content=user_generated, points=0

category_name='soccer', post_id=value_from_query_1, title=user_generated, content=user_generated, points=0

查询（3）：

在类别表中插入以下内容：

post_id=value_from_query_1, user_id=100, title=user_generated, content=user_generated, points=0, categories=['baseball','soccer']

category_name='baseball', post_id=value_from_query_1, title=user_generated, content=user_generated, points=0

category_name='soccer', post_id=value_from_query_1, title=user_generated, content=user_generated, points=0

查询（4）：

在类别表中插入以下内容：

post_id=value_from_query_1, user_id=100, title=user_generated, content=user_generated, points=0, categories=['baseball','soccer']

category_name='baseball', post_id=value_from_query_1, title=user_generated, content=user_generated, points=0

category_name='soccer', post_id=value_from_query_1, title=user_generated, content=user_generated, points=0

最终目标是能够执行以下类型的查询：

1.查询趋势帖子

2.查询特定类别的帖子

3.查询具有最高分值的帖子

有人知道我如何构造我的表，以便查询趋势帖子吗？或者这是我通过切换到DynamoDB而放弃的功能吗？

我从你的评论开始，用时间戳vs post\u id.
由于您将使用DynamoDB作为post_id生成器，因此存在可伸缩性问题。这些数字本质上是不可缩放的，最好使用日期对象。如果你需要以疯狂的速度创建帖子，你可以开始阅读twitter是如何做到这一点的

现在让我们回到您的趋势检查：
我相信您的场景是误用了DynamoDB。
假设你有一个热门类别，其中帖子最多。基本上，您必须扫描整个帖子（因为数据分布不均匀），并且每次开始都要查看点并在服务器中进行比较。这将不起作用或非常昂贵，因为每次您都可能会使用所有保留的读取单元容量。

对于这些类型的趋势检查，DynamoDB方法使用MapReduce
在此阅读如何实现这些目标：

我不能指定时间，但我相信您会发现这种方法是可伸缩的，尽管您不能经常使用它。

另一方面，你可以保留“前10/100”热门问题的列表

当一篇文章被升级时，你会“实时”更新它们——你会得到列表，检查是否需要用新升级的问题进行更新，并在需要时将其保存回数据库。

最好具体说明你正在使用的数据库。不同的“NoSQL”数据库有很大的不同。你打算多久重新计算一次趋势帖子？上面的时间戳存储在哪里？你愿意在多长时间内使帖子不符合趋势分析的条件？@Layble我计划将帖子id用作递增计数器（因此按降序排序帖子id将显示最新的帖子）。我之所以考虑使用post_id而不是时间戳，是为了避免在categories表中出现重复范围属性的可能性（例如：如果两个不同的用户同时发布了一篇关于足球的帖子）。我想我至少每分钟都要重新计算趋势帖子。这是遥不可及的。去