C# 这是否可以使用Azure表？_C#_Azure_Group By_Azure Table Storage

C# 这是否可以使用Azure表？

c# azure

C# 这是否可以使用Azure表？,c#,azure,group-by,azure-table-storage,C#,Azure,Group By,Azure Table Storage,我在下面的linq查询中收到一条错误消息“不支持方法“join”： tableServiceContext = new CustomTableServiceContext(storageAccount.TableEndpoint.AbsoluteUri, storageAccount.Credentials); tableServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1)); var result

我在下面的linq查询中收到一条错误消息“不支持方法“join”：

tableServiceContext = new CustomTableServiceContext(storageAccount.TableEndpoint.AbsoluteUri, storageAccount.Credentials);
tableServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1));
var results = (from c in tableServiceContext.CreateQuery<ChannelEntry>("Channels").AsTableServiceQuery<ChannelEntry>()
    join v in tableServiceContext.CreateQuery<VideoEntry>("Videos").AsTableServiceQuery<VideoEntry>() on c.PartitionKey equals v.ChannelID
    join h in tableServiceContext.CreateQuery<HitEntry>("Hits").AsTableServiceQuery<HitEntry>() on v.PartitionKey equals h.VideoID
    where c.RowKey.Equals(UserID)
    group h by h.RowKey into g
    select new BiggestFan { UserID = g.Key, Hits = g.Count() }).AsTableServiceQuery().Execute().OrderByDescending(b => b.Hits).Take(1);

tableServiceContext=新的CustomTableServiceContext（storageAccount.TableEndpoint.AbsoluteUri，storageAccount.Credentials）；
tableServiceContext.RetryPolicy=RetryPolicys.Retry（3，TimeSpan.FromSeconds（1））；
var results=（来自表servicecontext.CreateQuery（“通道”）.AsTableServiceQuery（）中的c）
在tableServiceContext.CreateQuery（“视频”）中加入v。c.PartitionKey上的AsTableServiceQuery（）等于v.ChannelID
在tableServiceContext.CreateQuery（“Hits”）中加入h。v.PartitionKey上的AsTableServiceQuery（）等于h.VideoID
其中c.RowKey.Equals（UserID）
按h.RowKey将h分组为g
选择new BiggestFan{UserID=g.Key，Hits=g.Count（）}）.AsTableServiceQuery（）.Execute（）.OrderByDescending（b=>b.Hits）.Take（1）；

如果在此上下文中不支持“join”，那么执行查询的最有效方法是什么

我有由视频组成的频道，而这些视频又有点击率。我试图找到当前登录用户的最大粉丝（点击率最高）

在不使用联接的情况下，执行此类操作的最有效方法是什么？我是否必须先抓取所有频道，然后是视频，然后作为3个单独的呼叫点击到表存储，然后再进行连接？

是的，您不能加入。你有几个选择

1）多次扫描-在加入之前，先执行几个.ToArray（）语句，以便在应用程序的内存中进行加入。这不是性能，但表存储速度相当快。实际上，这取决于将产生多少行

2）对表进行非规范化，以便在单个表中引用所需的所有键。这将允许您在1个查询中获得结果，但意味着所有插入/更新逻辑都需要更新。

Azure Table Storage（AZT，我的缩写，其他人通常不使用）查询不支持查询中的3项内容

加入

分组

聚合函数

简短的版本是，如果您想在AZT中运行一个高效的查询，那么只需要针对一个表运行它，并针对分区键或分区键和行键进行查询

这并不意味着您的基础数据必须存储在这一个表中，您可以保持当前的结构，但您可能需要构建一个表，该表基本上是一个索引，以允许您获取所需的信息。它的结构可能与此类似：

PartitionKey = ChannelUserId.PadWithLeadingZeros() + "-" + (int.MaxValue - NumberOfHits).PadWithLeadingZeros();
RowKey = Fan User Id;

您的查询将如下所示：

tableServiceContext = new CustomTableServiceContext(storageAccount.TableEndpoint.AbsoluteUri, storageAccount.Credentials);
tableServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1));
var results = (from i in tableServiceContext.CreateQuery<BiggestFansIndex>("BiggestFansIndex").AsTableServiceQuery<BiggestFansIndex>()
    where i.PartitionKey.CompareTo(UserId.PaddedWithLeadingZeros()) >= 0
        && i.PartitionKey.CompareTo((UserId + 1).PaddedWithLeadingZeros()) < 0
    select i}).Take(1).Execute();

tableServiceContext=新的CustomTableServiceContext（storageAccount.TableEndpoint.AbsoluteUri，storageAccount.Credentials）；
tableServiceContext.RetryPolicy=RetryPolicys.Retry（3，TimeSpan.FromSeconds（1））；
var results=（来自tableServiceContext.CreateQuery（“BiggestFansIndex”）.AsTableServiceQuery（）中的i）
其中i.PartitionKey.CompareTo（UserId.PaddedWithLeadingZeros（））>=0
&&i.PartitionKey.CompareTo（（UserId+1）.PaddedWithLeadingZeros（））<0
选择i}）.Take（1.Execute（）；

我怀疑您最大的问题是使此索引表保持最新，因为我确信命中率将以合理的规律性变化。

Azure表存储不适合此类聚合查询。我建议您研究一些非SQL文档数据库，如CouchDB、MongoDB和RavenDB。但是，如果您仍然想使用它，则需要对数据进行非规范化。

其他人关于无法在Azure表中执行联接的说法是正确的。您可以将其移动到sqlazure，在那里连接可以像您所期望的那样工作，但它比Azure表更昂贵、更慢。但是，假设您坚持使用Azure表：

在查看此特定查询时，您可以将Hits表的分区键设置为：

tableServiceContext = new CustomTableServiceContext(storageAccount.TableEndpoint.AbsoluteUri, storageAccount.Credentials);
tableServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1));
var results = (from i in tableServiceContext.CreateQuery<BiggestFansIndex>("BiggestFansIndex").AsTableServiceQuery<BiggestFansIndex>()
    where i.PartitionKey.CompareTo(UserId.PaddedWithLeadingZeros()) >= 0
        && i.PartitionKey.CompareTo((UserId + 1).PaddedWithLeadingZeros()) < 0
    select i}).Take(1).Execute();

点击表：
PartitionKey=用户ID（频道所有者的）
RowKey=时间戳（或其他唯一的东西）
用户ID（执行点击的用户的）
ChannelID
VideoID
（以及希望在点击表中显示的其他字段）

正如其他人所说，您无法在Azure表存储查询上进行聚合，因此必须将所有数据拉回到本地内存（通过调用Execute），然后才能在内存中进行聚合。以下是如何从表存储中提取数据（此查询在Azure表存储服务器上运行）：

下面是如何聚合它（此查询在本地内存中运行）：

这在技术上可行，但无法扩展。一旦不同的用户变得流行起来，将用户的所有点击都拉入本地内存以运行此查询是不切实际的。另外，一旦数据变得太大而无法一次全部向下拉时，您可能最终不得不对其进行分页

您可以进一步反规范化数据，并在运行时计算和存储各种总计，以便在需要运行此最大风扇查询时，只需检索各种预先计算的总计

然而，这只是一个查询。在设计Azure表结构时，需要考虑所有可能要对它们进行的查询，它们将运行多长时间，以及它们将运行多少数据。然后，您可以为Azure表中的数据找出最佳结构。我建议不要围绕单个查询设计Azure表，因为将来可能需要更多查询。

对于您建议的非规范化选项2，这会是什么样的结果。。。？如果频道有视频，视频有点击率，那么我仍然会有3个相应的表，但将所有的视频ID存储到频道表中，同样地将HITID存储到视频表中？是的。复制这些ID，这样就不需要一次跨这么多表进行连接。我希望Hits表是这样更新的，因为您是基于Hits进行查询的，而Hits表是

var result = 
    (
      from h in allHits
      group h by h.UserId into g  // The User that performed the Hit
      select new BiggestFan { UserID = g.Key, Hits = g.Count() }
    )
    .OrderByDescending(b => b.Hits).FirstOrDefault();