Performance 普雷斯托·卡桑德拉慢性能慢_Performance_Presto_Cassandra 3.0

Performance 普雷斯托·卡桑德拉慢性能慢

performance

Performance 普雷斯托·卡桑德拉慢性能慢,performance,presto,cassandra-3.0,Performance,Presto,Cassandra 3.0,我正在使用presto查询Cassandra记录，大约需要8分钟才能回复结果。需要改进响应时间 Presto配置如下： coordinator=true node-scheduler.include-coordinator=false http-server.http.port=8080 query.max-memory=5GB query.max-memory-per-node=3GB discovery-server.enabled=true dis

我正在使用presto查询Cassandra记录，大约需要8分钟才能回复结果。需要改进响应时间

Presto配置如下：

   coordinator=true
   node-scheduler.include-coordinator=false
   http-server.http.port=8080
   query.max-memory=5GB
   query.max-memory-per-node=3GB
   discovery-server.enabled=true
   discovery.uri=http://URL:8080
   task.max-worker-threads=10
   task.concurrency=32 

   Worker : 4

   coordinator=false
   http-server.http.port=8080
   query.max-memory=5GB
   query.max-memory-per-node=2GB
   discovery.uri=http://URL:8080
   task.max-worker-threads=16
   task.concurrency=32

   Cassandra : 4 NODE

片段2 成本：CPU 1.98m，输入：17833912行（1.49GB），输出：13089502行（1.31GB）
ScanFilterProject[表=cassandra:cassandra:rasapp:raslog，原始约束=（（“bucketid”=CAST（'2017062113' 成本：96.12%，输入：23169736行（22.10MB），输出：17833912行（1.49GB），过滤：23.03%

如何提高presto中的响应时间？我仍然在使用具有2300万条记录的分区密钥

CREATE TABLE TEST.TEST_LOG (
  bucketId              varchar,
  id                    timeuuid,
  transaction_id        varchar,
  ras_transaction_id    varchar,
  msg_seq_id            int,
  host_name             varchar,
  matip_channel_id      varchar,
  hth_id                varchar,
  mq_id                 varchar,
  log_point             varchar,
  entry_time            timestamp,
  exit_time             timestamp,
  source_carrier        varchar,
  destination_carrier   varchar,
  source_dcs            varchar,
  destination_dcs       varchar,
  message_type          varchar,
  message_direction     int,
  error_code_business   varchar,
  exception_code        varchar,
  exception_description varchar,
  scenario              varchar,
  created_date          timestamp,
  huborcar              varchar,
  noof_fanout           varchar,
  flight_date           timestamp,
  route_origin          varchar,
  route_destination     varchar,
  class_service         varchar,
  no_of_seats           varchar,
  ras_host              varchar,
  cp_host               varchar,
  PRIMARY KEY(bucketid, created_date, msg_seq_id,message_direction,scenario,source_dcs,exception_code,log_point,transaction_id,id)
) WITH default_time_to_live = 2851200 and CLUSTERING ORDER BY (created_date ASC, msg_seq_id ASC,message_direction ASC,scenario ASC,source_dcs ASC,exception_code ASC,log_point ASC,transaction_id ASC,id ASC);

质疑

选择
交易id，
信息和方向，
消息类型，
最大值（异常代码）作为异常代码，
最小值（输入时间）作为最小值输入，
最大（输入时间）作为最大输入，
最小（退出时间）作为最小退出，
最大（退出时间）为最大退出时间
从TEST.TEST\u日志
其中bucketid='2017062113'
及(
（（msg_seq_id2和message_type='PAORES'））
按交易编号分组，
信息和方向，
消息类型

所用时间：8分钟

谢谢，

两件事：Presto的0.180版本将包括在集群键上下推不等式谓词，这将有助于您的查询。此外，您的模式与您正在运行的查询不兼容。在Cassandra中，最好是对特定分区（您这样做）进行查询还有，在集群键上按使用顺序使用谓词（因为这是Cassandra使用的排序顺序）。如果主键为（bucketid、message_type、msg_seq_id，…），可能会看到更好的性能

此外，Presto不会将聚合下推到Cassandra（或任何连接器），因此，如果您正在聚合大量数据，并且联邦查询不需要Presto，那么使用Cassandra进行查询可能会更快。

两件事：Presto的0.180版本将包括在聚类键上下推不等式谓词，这将有助于您的查询。此外，您的模式也不能很好地工作对于您正在运行的查询。在Cassandra中，最好是对特定分区（您这样做）进行查询，并按照使用顺序（因为这是Cassandra使用的排序顺序）在集群键上设置谓词。如果主键为（bucketid、消息类型、消息顺序id等）

此外，Presto不会将聚合下推到Cassandra（或任何连接器），因此，如果要聚合大量数据，并且联邦查询不需要Presto，则只使用Cassandra进行查询可能会更快。

仅使用Cassandra进行查询需要多长时间？您正在运行的查询是什么？表架构是什么（包括哪些列是分区/集群键）？请检查，更新后仅使用Cassandra需要多长时间？仅使用Cassandra需要多长时间？您正在运行的查询和表架构是什么（包括哪些列是分区/群集键）？请检查，更新后仅使用Cassandra需要多长时间？

select
transaction_id,
message_direction,
message_type,
max(exception_code) as exception_code,
min(entry_time) as min_entry,
max(entry_time) as max_entry,
min(exit_time) as min_exit,
max(exit_time) as max_exit
from TEST.TEST_LOG
where bucketid='2017062113'
and (
((msg_seq_id<=2 and message_type='PAOREQ'  ) or
( msg_seq_id>2 and message_type='PAORES'  )))
group by transaction_id,
message_direction,
message_type