使用python驱动程序计算Cassandra列族中的“行”_Python_Count_Cassandra

使用python驱动程序计算Cassandra列族中的“行”

python cassandra

使用python驱动程序计算Cassandra列族中的“行”,python,count,cassandra,Python,Count,Cassandra,如何使用python驱动程序更有效地计算Cassandra列族中的行数？我使用以下代码： from cassandra.cluster import Cluster from sys import stdout servers = ['server1', 'server2'] cluster = Cluster(servers) session = cluster.connect() result = session.execute('select * from ks1.t1') cou

如何使用python驱动程序更有效地计算Cassandra列族中的行数？我使用以下代码：

from cassandra.cluster import Cluster
from sys import stdout

servers = ['server1', 'server2']
cluster = Cluster(servers)
session = cluster.connect()

result = session.execute('select * from ks1.t1')

count = 0

for i in result:
    count += 1

print count

数一排的糟糕方法。基本上你在做一个完整的表格扫描

在分布式系统中精确计算行数是很困难的

如果您的表中没有使用nodetool tablestats/cfstats的集群列，那么您可以估计分区partition==行的数量

如果您确实需要精确计算行数，请使用位于同一位置的Spark安装在本地获取Spark内存中的所有数据，然后使用Spark对其进行计数。这样，计数将被分配，而不会使协调员不知所措

scala代码示例：

import com.datastax.spark.connector._

sc.cassandraTable("keyspace", "table_name").count()

Brian Hess有一个独立的“cassandra count”

计算Cassandra表中记录数的简单程序。通过使用numSplits参数分割令牌范围，可以减少每个查询的计数量，并减少超时的概率

的确，火花非常适合这种操作，但是这个程序的目标是成为一个简单的实用程序，不需要火花

要在Python中实现这一点，为什么不执行以下操作：

from cassandra.cluster import Cluster

servers = ['server1', 'server2']
cluster = Cluster(servers)
session = cluster.connect()

result = session.execute('select count(*) from ks1.t1')

count = 0
for row in result: # will only be 1 row
    count += row.count

print(count)

是的，这很糟糕，很长一段路，但我需要知道确切的数量。我不确定对象结果是什么类型，但请尝试count=lenresult。为什么不能使用“select count*from ks1.t1”？谢谢。我看过这个实用程序，我想找到一个使用Python的解决方案。@FarisRayhan如果需要任何帮助，请说出您尝试了什么以及发生了什么！