Python 如何提高向Scyllab添加数据的性能？_Python_Cassandra_Nosql_Cql_Scylla

Python 如何提高向Scyllab添加数据的性能？

python cassandra nosql

Python 如何提高向Scyllab添加数据的性能？,python,cassandra,nosql,cql,scylla,Python,Cassandra,Nosql,Cql,Scylla,我尝试使用预先准备好的语句，正如在中所描述的，但是100000条消息的性能仍然在30秒左右。有什么办法可以改进吗 query = "INSERT INTO message (id, message) VALUES (?, ?)" prepared = session.prepare(query) for key in range(100000): try: session.execute_async(prepared, (0, "my example message")

我尝试使用预先准备好的语句，正如在中所描述的，但是100000条消息的性能仍然在30秒左右。有什么办法可以改进吗

query = "INSERT INTO message (id, message) VALUES (?, ?)"
prepared = session.prepare(query)
for key in range(100000):

    try:
        session.execute_async(prepared, (0, "my example message"))
    except Exception as e:
        print("An error occured : " + str(e))
        pass

更新

我发现信息表明，强烈建议使用批次来提高性能，因此我根据官方文档使用了准备好的报表和批次。目前我的代码如下所示：

print("time 0: " + str(datetime.now()))
query = "INSERT INTO message (id, message) VALUES (uuid(), ?)"
prepared = session.prepare(query)

for key in range(100):

    print(key)

    try:

        batch = BatchStatement(consistency_level=ConsistencyLevel.QUORUM)
        for key in range(100):

            batch.add(prepared, ("example message",))

        session.execute(batch)

    except Exception as e:
        print("An error occured : " + str(e))
        pass

print("time 1: " + str(datetime.now()))

您知道为什么性能如此之慢吗？运行此源代码后，结果如下所示

test 0: 2018-06-19 11:10:13.990691
0
1
...
41
cAn error occured : Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out for messages.message - received only 1 responses from 2 CL=QUORUM." info={'write_type': 'BATCH', 'required_responses': 2, 'consistency': 'QUORUM', 'received_responses': 1}
42
...
52                                                                                                                                                                             An error occured : errors={'....0.3': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=.....0.3
53
An error occured : Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out for messages.message - received only 1 responses from 2 CL=QUORUM." info={'write_type': 'BATCH', 'required_responses': 2, 'consistency': 'QUORUM', 'received_responses': 1}
54
...
59
An error occured : Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out for messages.message - received only 1 responses from 2 CL=QUORUM." info={'write_type': 'BATCH', 'required_responses': 2, 'consistency': 'QUORUM', 'received_responses': 1}
60
61
62
...
69
70
71
An error occured : errors={'.....0.2': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=.....0.2
72
An error occured : errors={'....0.1': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=....0.1
73
74
...
98
99
test 1: 2018-06-19 11:11:03.494957

在我的机器上，通过大量并行插入，我使用本地机器获得此类问题的次秒执行时间

➜  loadz ./loadz
execution time: 951.701622ms

恐怕我不知道如何在Python中实现，但在Go中，它可能看起来像这样：

package main

import (
  "fmt"
  "sync"
  "time"

  "github.com/gocql/gocql"
)

func main() {
  cluster := gocql.NewCluster("127.0.0.1")
  cluster.Keyspace = "mykeyspace"

  session, err := cluster.CreateSession()
  if err != nil {
      panic(err)
  }
  defer session.Close()

  workers := 1000
  ch := make(chan *gocql.Query, 100001)
  wg := &sync.WaitGroup{}
  wg.Add(workers)

  for i := 0; i < workers; i++ {
      go func() {
          defer wg.Done()
          for q := range ch {
              if err := q.Exec(); err != nil {
                  fmt.Println(err)
              }
          }
      }()
  }

  start := time.Now()
  for i := 0; i < 100000; i++ {
      ch <- session.Query("INSERT INTO message (id,message) VALUES (uuid(),?)", "the message")
  }
  close(ch)
  wg.Wait()
  dur := time.Since(start)
  fmt.Printf("execution time: %s\n", dur)
}

主程序包
进口(
“fmt”
“同步”
“时间”
“github.com/gocql/gocql”
)
func main（）{
集群：=gocql.NewCluster（“127.0.0.1”）
cluster.Keyspace=“mykeyspace”
会话，错误：=cluster.CreateSession（）
如果错误！=零{
恐慌（错误）
}
延迟会话。关闭（）
工人：=1000
ch:=make（chan*gocql.Query，100001）
wg:=&sync.WaitGroup{}
工作组.增加（工人）
对于i:=0；ich在我的机器上，通过大量并行插入，我使用本地机器获得这种类型问题的次秒执行时间
➜  loadz ./loadz
execution time: 951.701622ms

恐怕我不知道如何在Python中实现，但在Go中，它可能看起来像这样：
package main

import (
  "fmt"
  "sync"
  "time"

  "github.com/gocql/gocql"
)

func main() {
  cluster := gocql.NewCluster("127.0.0.1")
  cluster.Keyspace = "mykeyspace"

  session, err := cluster.CreateSession()
  if err != nil {
      panic(err)
  }
  defer session.Close()

  workers := 1000
  ch := make(chan *gocql.Query, 100001)
  wg := &sync.WaitGroup{}
  wg.Add(workers)

  for i := 0; i < workers; i++ {
      go func() {
          defer wg.Done()
          for q := range ch {
              if err := q.Exec(); err != nil {
                  fmt.Println(err)
              }
          }
      }()
  }

  start := time.Now()
  for i := 0; i < 100000; i++ {
      ch <- session.Query("INSERT INTO message (id,message) VALUES (uuid(),?)", "the message")
  }
  close(ch)
  wg.Wait()
  dur := time.Since(start)
  fmt.Printf("execution time: %s\n", dur)
}

主程序包
进口(
“fmt”
“同步”
“时间”
“github.com/gocql/gocql”
)
func main（）{
集群：=gocql.NewCluster（“127.0.0.1”）
cluster.Keyspace=“mykeyspace”
会话，错误：=cluster.CreateSession（）
如果错误！=零{
恐慌（错误）
}
延迟会话。关闭（）
工人：=1000
ch:=make（chan*gocql.Query，100001）
wg:=&sync.WaitGroup{}
工作组.增加（工人）
对于i:=0；ich“我发现信息表明，强烈建议使用批处理来提高性能。”不，不是。批处理用于确保写入的原子性。但实际上，您的性能至少会从顶部下降30%。性能可能很糟糕，因为您让协调器节点必须在仲裁时管理大量插入。您的群集拓扑是什么？它是单节点？如果是，是吗你被IO束缚了吗？@Aaron你有什么建议可以提高性能吗？@CarlosRolo目前我可以访问单个节点，我希望达到接近每秒100000条记录。有什么想法可以实现吗？@Aaron找到的信息强烈建议使用批处理来提高性能不，不是。批处理用于确保写入的原子性。但实际上，您的性能至少会从顶部下降30%。性能可能很糟糕，因为您让协调器节点必须在仲裁时管理大量插入。您的群集拓扑是什么？它是单节点？如果是，是吗你被IO束缚了吗？@Aaron你有什么建议我可以提高性能吗？@CarlosRolo目前我可以访问单个节点，我希望达到接近每秒100000条记录。你有什么想法吗？